ChenSammi opened a new pull request, #5561:
URL: https://github.com/apache/ozone/pull/5561

   ## What changes were proposed in this pull request?
   Resolve the backward compatibility issue introduced in HDDS-8588. 
   
   The root cause is that the listCA() call during SCM, will try to call SCM's 
SCMSecurityProtocolServer API, but this SCMSecurityProtocolServer is not ready 
at that time. The call has a max retry policy. So SCM will stuck in the retry 
and cannot startup. 
   
   The fix avoids the remote API call, use local on disk info to build the 
TrustChain.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-9420
   
   ## How was this patch tested?
   
   Tested it manually. Here is the step 
   1. enable ozone security,  ozone.security.enabled
   2. enable grpc security, hdds.grpc.tls.enabled
   3. Install a 1.3.0 OM cluster with above properties, do "scm --init", start 
scm, and then stop scm
   4. upgrade the cluster to master branch, start scm, scm hang with following 
stack, stop scm
    
   ```
    "main" #1 prio=5 os_prio=31 tid=0x0000000142009000 nid=0x2203 waiting on 
condition [0x000000016bf51000]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
           at java.lang.Thread.sleep(Native Method)
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:131)
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:108)
           - locked <0x00000005c48670c8> (a 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call)
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
           at com.sun.proxy.$Proxy11.submitRequest(Unknown Source)
           at 
org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:102)
           at 
org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.listCACertificate(SCMSecurityProtocolClientSideTranslatorPB.java:374)
           at 
org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.updateCAList(DefaultCertificateClient.java:952)
           at 
org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.listCA(DefaultCertificateClient.java:940)
           at 
org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getTrustChain(DefaultCertificateClient.java:420)
           - locked <0x00000005c107c2d8> (a 
org.apache.hadoop.hdds.security.x509.certificate.client.SCMCertificateClient)
           at 
org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.loadKeyManager(ReloadingX509KeyManager.java:204)
           at 
org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.<init>(ReloadingX509KeyManager.java:85)
           at 
org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.createKeyManagers(PemFileBasedKeyStoresFactory.java:83)
           at 
org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.init(PemFileBasedKeyStoresFactory.java:104)
           - locked <0x00000005c4698000> (a 
org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory)
           at 
org.apache.hadoop.hdds.security.x509.keys.SecurityUtil.getServerKeyStoresFactory(SecurityUtil.java:103)
           at 
org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getServerKeyStoresFactory(DefaultCertificateClient.java:967)
           - locked <0x00000005c107c2d8> (a 
org.apache.hadoop.hdds.security.x509.certificate.client.SCMCertificateClient)
           at 
org.apache.hadoop.hdds.scm.ha.HASecurityUtils.createSCMRatisTLSConfig(HASecurityUtils.java:341)
           at 
org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.<init>(SCMRatisServerImpl.java:109)
           at 
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.<init>(SCMHAManagerImpl.java:97)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:650)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:403)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:601)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:613)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:171)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:145)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:74)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:48)
           at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
           at picocli.CommandLine.access$1300(CommandLine.java:145)
           at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
           at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
           at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
           at 
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
           at picocli.CommandLine.execute(CommandLine.java:2078)
           at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
           at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
           at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:63)
   ```
   6. upgrade the cluster to master with this patch, start scm successfully.  
There is message "Key manager is loaded with certificate chain" found in the 
SCM log. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to