[
https://issues.apache.org/jira/browse/HDDS-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746894#comment-17746894
]
István Fajth commented on HDDS-9061:
------------------------------------
Reopening this one based on the following findings in the test environment:
Further testing shows that on the cluster the following is the status of the
Ozone cluster:
SCM2 is in some kind of a locked state, where it constantly retries to update
its internal CA certificate list on the main thread, while 2 other threads are
waiting on to get OM certificates. We are not sure what is blocking the main
thread to get the results, but it seems that the main thread tries to connect
to SCM1, which suggests to go to SCM2, and then we do not see the request in
SCM2 happening.
Additional info:
It seems that this started to happen after SCM3 was decommissioned, and a new
SCM was added, the newly added SCM stuck in infinite retries to get its
certificate.
> [ozone-cert-rotation] Certificates for few components were renewed, but not
> for all
> -----------------------------------------------------------------------------------
>
> Key: HDDS-9061
> URL: https://issues.apache.org/jira/browse/HDDS-9061
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Reporter: Soumitra Sulav
> Assignee: Sammi Chen
> Priority: Critical
> Fix For: 1.4.0
>
> Attachments: quasar-kkgvze-1-datanode.log,
> quasar-kkgvze-1-ozone-scm.log, quasar-kkgvze-2-datanode.log,
> quasar-kkgvze-2-ozone-scm.log, quasar-kkgvze-6-ozone-scm.log
>
>
> Current config set under safety valve for ca cert rotation :
> {code:java}
> <property><name>hdds.x509.max.duration</name><value>P5D</value></property>
> <property><name>hdds.x509.default.duration</name><value>PT1H</value></property>
> <property><name>hdds.x509.renew.grace.duration</name><value>PT50M</value></property>
> <property><name>hdds.block.token.expiry.time</name><value>15m</value></property>
> <property><name>ozone.manager.delegation.token.renew-interval</name><value>15m</value></property>
> <property><name>ozone.manager.delegation.token.max-lifetime</name><value>30m</value></property>
> <property><name>hdds.x509.ca.rotation.check.interval</name><value>PT30M</value></property>
> {code}
> based on above config the certificates should renew at 10th minutes from
> start by the grace period coming into play.
> Observation :
> Certificates were renewed for a few roles or instances at random.
> Primordial SCM was stopped and started at the below times :
> 2023-07-21 15:36:03,703 : STOPPED
> 2023-07-21 16:00:25,736 : STARTED
> {code:java}
> [root@quasar-kkgvze-1 ~]# ozone admin cert list | grep "dn@"
> 66603508550073199 Fri Jul 21 15:02:32 UTC 2023 Fri Jul 21 16:02:32 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603509087676398 Fri Jul 21 15:02:33 UTC 2023 Fri Jul 21 16:02:33 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603509333666132 Fri Jul 21 15:02:33 UTC 2023 Fri Jul 21 16:02:33 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603509791941467 Fri Jul 21 15:02:33 UTC 2023 Fri Jul 21 16:02:33 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603510052147817 Fri Jul 21 15:02:34 UTC 2023 Fri Jul 21 16:02:34 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603510180218825 Fri Jul 21 15:02:34 UTC 2023 Fri Jul 21 16:02:34 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603510614053924 Fri Jul 21 15:02:34 UTC 2023 Fri Jul 21 16:02:34 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66603512588421508 Fri Jul 21 15:02:36 UTC 2023 Fri Jul 21 16:02:36 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604108035744075 Fri Jul 21 15:12:32 UTC 2023 Fri Jul 21 16:12:32 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604109383514945 Fri Jul 21 15:12:33 UTC 2023 Fri Jul 21 16:12:33 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604109450607224 Fri Jul 21 15:12:33 UTC 2023 Fri Jul 21 16:12:33 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604109820303351 Fri Jul 21 15:12:33 UTC 2023 Fri Jul 21 16:12:33 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> 66604110216800118 Fri Jul 21 15:12:34 UTC 2023 Fri Jul 21 16:12:34 UTC 2023
>
> [email protected],OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
>
> CN=scm-sub-66603487342057...@quasar-kkgvze-1.quasar-kkgvze.root.hwx.site,OU=907e103a-3636-489a-b9fe-c23f697e1526,O=CID-e8bd6d19-a4ca-49af-8a6b-45c5cee57509
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]