[ 
https://issues.apache.org/jira/browse/HDDS-13551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012820#comment-18012820
 ] 

Sammi Chen commented on HDDS-13551:
-----------------------------------

[~braul], could you elaborate the problem a bit further?  
ServerNotLeaderException returned by a follower is an expected behavior. 

> ServerNotLeaderException  after SCM leader is stopped and new leader is 
> elected
> -------------------------------------------------------------------------------
>
>                 Key: HDDS-13551
>                 URL: https://issues.apache.org/jira/browse/HDDS-13551
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Bablu Raul
>            Priority: Major
>
> {code:java}
> When I stop the SCM leader(i.e data-10:LEADER), the system correctly triggers 
> a leader election and successfully elects a new SCM leader. This can be 
> verified through the CLI, where the new leader is visible{code}
> {code:java}
> data-17:FOLLOWER 
> data-1:FOLLOWER
> data-10:LEADER{code}
> {code:java}
> data-17:FOLLOWER 
> data-1:LEADER
> data-10:FOLLOWER {code}
> {code:java}
> 2025-08-07 05:15:11,853|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|25/08/07 05:15:11 INFO 
> retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.scm.exceptions.SCMException):
>  Cannot reconcile container #5001 in state CLOSED with replica states: 
> CLOSED, CLOSED, CLOSING
> 2025-08-07 05:15:11,854|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.reconcileContainer(SCMClientProtocolServer.java:1542)
> 2025-08-07 05:15:11,854|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.reconcileContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:1360)
> 2025-08-07 05:15:11,854|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:739)
> 2025-08-07 05:15:11,854|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
> 2025-08-07 05:15:11,855|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:235)
> 2025-08-07 05:15:11,855|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java)
> 2025-08-07 05:15:11,855|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> 2025-08-07 05:15:11,855|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> 2025-08-07 05:15:11,855|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> 2025-08-07 05:15:11,856|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> 2025-08-07 05:15:11,856|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> java.security.AccessController.doPrivileged(Native Method)
> 2025-08-07 05:15:11,856|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> javax.security.auth.Subject.doAs(Subject.java:422)
> 2025-08-07 05:15:11,856|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
> 2025-08-07 05:15:11,856|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> 2025-08-07 05:15:11,856|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|, while invoking 
> $Proxy18.submitRequest over 
> nodeId=node2,nodeAddress=ccycloud-10.newom.root.comops.site/10.140.137.141:9860
>  after 3 failover attempts. Trying to failover after sleeping for 2000ms.
> 2025-08-07 05:15:13,856|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|25/08/07 05:15:13 INFO 
> retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
> java.net.ConnectException: Call From st-ozone-qkgzv8-dbnlt/10.104.11.242 to 
> ccycloud-1.newom.root.comops.site:9860 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> $Proxy18.submitRequest over 
> nodeId=node1,nodeAddress=ccycloud-1.newom.root.comops.site/10.140.132.65:9860 
> after 4 failover attempts. Trying to failover after sleeping for 2000ms.
> 2025-08-07 05:15:15,863|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|25/08/07 05:15:15 INFO 
> retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.scm.exceptions.SCMException):
>  Cannot reconcile container #5001 in state CLOSED with replica states: 
> CLOSED, CLOSED, CLOSING
> 2025-08-07 05:15:15,864|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.reconcileContainer(SCMClientProtocolServer.java:1542)
> 2025-08-07 05:15:15,864|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.reconcileContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:1360)
> 2025-08-07 05:15:15,864|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:739)
> 2025-08-07 05:15:15,865|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
> 2025-08-07 05:15:15,865|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:235)
> 2025-08-07 05:15:15,865|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java)
> 2025-08-07 05:15:15,865|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> 2025-08-07 05:15:15,865|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> 2025-08-07 05:15:15,865|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> 2025-08-07 05:15:15,866|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> 2025-08-07 05:15:15,866|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> java.security.AccessController.doPrivileged(Native Method)
> 2025-08-07 05:15:15,866|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> javax.security.auth.Subject.doAs(Subject.java:422)
> 2025-08-07 05:15:15,866|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
> 2025-08-07 05:15:15,866|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> 2025-08-07 05:15:15,866|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|, while invoking 
> $Proxy18.submitRequest over 
> nodeId=node2,nodeAddress=ccycloud-10.newom.root.comops.site/10.140.137.141:9860
>  after 5 failover attempts. Trying to failover after sleeping for 2000ms.
> 2025-08-07 05:15:17,875|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|25/08/07 05:15:17 INFO 
> retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException):
>  Server:333ad543-2b65-4a7c-b5c1-b8faadc1d7f4 is not the leader. Suggested 
> leader is Server:ccycloud-10.newom.root.comops.site:9860.
> 2025-08-07 05:15:17,876|INFO|MainThread|machine.py:205 - 
> run()||GUID=51debc2b-956b-4e05-b036-ced4aa0547f4|at 
> org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:102)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to