sarvekshayr opened a new pull request, #8765:
URL: https://github.com/apache/ozone/pull/8765

   ## What changes were proposed in this pull request?
   The method `HAUtils.getCAListWithRetry()` currently uses 
`RetryPolicies.retryForeverWithFixedSleep()`, which causes it to retry 
indefinitely on any failure. When authentication is not set up (i.e., kinit is 
not run), this results in an `AccessControlException`. Since the method retries 
forever without handling this specific exception, commands like `ozone admin 
container create` appear to hang indefinitely.
   Fixed the logic to detect `AccessControlException` in the retry policy and 
fail fast.
   
   ## What is the link to the Apache JIRA
   [HDDS-13405](https://issues.apache.org/jira/browse/HDDS-13405)
   
   ## How was this patch tested?
   Before the fix
   ```
   bash-5.1$ OZONE_LOGLEVEL=INFO ozone admin container create
   2025-07-08 10:44:48,181 [main] INFO 
proxy.SCMContainerLocationFailoverProxyProvider: Created fail-over proxy for 
protocol StorageContainerLocationProtocolPB with 3 nodes: 
[nodeId=scm2,nodeAddress=scm2.org/172.25.0.117:9860, 
nodeId=scm1,nodeAddress=scm1.org/172.25.0.116:9860, 
nodeId=scm3,nodeAddress=scm3.org/172.25.0.118:9860]
   2025-07-08 10:44:48,229 [main] INFO 
proxy.SecretKeyProtocolFailoverProxyProvider: Created fail-over proxy for 
protocol SecretKeyProtocolScmPB with 3 nodes: 
[nodeId=scm2,nodeAddress=scm2.org/172.25.0.117:9961, 
nodeId=scm1,nodeAddress=scm1.org/172.25.0.116:9961, 
nodeId=scm3,nodeAddress=scm3.org/172.25.0.118:9961]
   2025-07-08 10:44:48,402 [main] INFO 
proxy.SCMSecurityProtocolFailoverProxyProvider: Created fail-over proxy for 
protocol SCMSecurityProtocolPB with 3 nodes: 
[nodeId=scm2,nodeAddress=scm2.org/172.25.0.117:9961, 
nodeId=scm1,nodeAddress=scm1.org/172.25.0.116:9961, 
nodeId=scm3,nodeAddress=scm3.org/172.25.0.118:9961]
   2025-07-08 10:44:48,470 [main] WARN ipc.Client: Exception encountered while 
connecting to the server scm1.org/172.25.0.116:9961
   org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[KERBEROS]
           at 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:179)
           at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:399)
           at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:578)
           at 
org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:364)
           at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:799)
           at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:795)
           at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
           at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
           at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
           at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:795)
           at 
org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:364)
           at org.apache.hadoop.ipc.Client.getConnection(Client.java:1649)
           at org.apache.hadoop.ipc.Client.call(Client.java:1473)
           at org.apache.hadoop.ipc.Client.call(Client.java:1426)
           at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:250)
           at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:132)
           at jdk.proxy2/jdk.proxy2.$Proxy22.submitRequest(Unknown Source)
           at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
           at java.base/java.lang.reflect.Method.invoke(Method.java:580)
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437)
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170)
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162)
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100)
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366)
           at jdk.proxy2/jdk.proxy2.$Proxy22.submitRequest(Unknown Source)
           at 
org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:93)
           at 
org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.listCACertificate(SCMSecurityProtocolClientSideTranslatorPB.java:363)
           at 
org.apache.hadoop.hdds.utils.HAUtils.waitForCACerts(HAUtils.java:374)
           at 
org.apache.hadoop.hdds.utils.HAUtils.lambda$buildCAX509List$3(HAUtils.java:401)
           at 
org.apache.hadoop.hdds.utils.RetriableTask.call(RetriableTask.java:55)
           at 
org.apache.hadoop.hdds.utils.HAUtils.getCAListWithRetry(HAUtils.java:360)
           at 
org.apache.hadoop.hdds.utils.HAUtils.buildCAX509List(HAUtils.java:401)
           at 
org.apache.hadoop.hdds.scm.cli.ContainerOperationClient.lambda$newXCeiverClientManager$0(ContainerOperationClient.java:123)
           at 
org.apache.hadoop.hdds.scm.client.ClientTrustManager.loadCerts(ClientTrustManager.java:148)
           at 
org.apache.hadoop.hdds.scm.client.ClientTrustManager.<init>(ClientTrustManager.java:110)
           at 
org.apache.hadoop.hdds.scm.cli.ContainerOperationClient.newXCeiverClientManager(ContainerOperationClient.java:125)
           at 
org.apache.hadoop.hdds.scm.cli.ContainerOperationClient.getXceiverClientManager(ContainerOperationClient.java:91)
           at 
org.apache.hadoop.hdds.scm.cli.ContainerOperationClient.createContainer(ContainerOperationClient.java:212)
           at 
org.apache.hadoop.hdds.scm.cli.container.CreateSubcommand.execute(CreateSubcommand.java:59)
           at 
org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:39)
           at 
org.apache.hadoop.hdds.scm.cli.ScmSubcommand.call(ScmSubcommand.java:29)
           at picocli.CommandLine.executeUserObject(CommandLine.java:2031)
           at picocli.CommandLine.access$1500(CommandLine.java:148)
           at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2469)
           at picocli.CommandLine$RunLast.handle(CommandLine.java:2461)
           at picocli.CommandLine$RunLast.handle(CommandLine.java:2423)
           at 
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
           at picocli.CommandLine$RunLast.execute(CommandLine.java:2425)
           at 
org.apache.hadoop.ozone.shell.Shell.lambda$execute$0(Shell.java:95)
           at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:167)
           at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:157)
           at org.apache.hadoop.ozone.shell.Shell.execute(Shell.java:95)
           at picocli.CommandLine.execute(CommandLine.java:2174)
           at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:89)
           at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:80)
           at org.apache.hadoop.ozone.admin.OzoneAdmin.main(OzoneAdmin.java:36)
   2025-07-08 10:44:48,478 [main] INFO utils.RetriableTask: Execution of task 
getCAList failed, will be retried in 10000 ms
   (retries forever)
   ```
   
   After the fix
   ```
   bash-5.1$ ozone admin container create
   java.security.cert.CertificateException: 
org.apache.hadoop.security.AccessControlException: Permission denied.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to