peterxcli opened a new pull request, #8060:
URL: https://github.com/apache/ozone/pull/8060

   ## What changes were proposed in this pull request?
   
   ```
   Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 82.54 s <<< 
FAILURE! -- in org.apache.hadoop.ozone.container.TestContainerReportHandling
   
org.apache.hadoop.ozone.container.TestContainerReportHandling.testDeletingOrDeletedContainerTransitionsToClosedWhenNonEmptyReplicaIsReported(LifeCycleState)[2]
 -- Time elapsed: 33.84 s <<< ERROR!
   org.apache.hadoop.hdds.scm.exceptions.SCMException: 
org.apache.ratis.protocol.exceptions.NotLeaderException: Server 
a4f85781-650a-46e8-940e-a45bfdaa2a01@group-BBAD22E09632 is not the leader
        at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.translateException(SCMHAInvocationHandler.java:164)
        at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:114)
        at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:73)
        at jdk.proxy2/jdk.proxy2.$Proxy42.updateContainerState(Unknown Source)
        at 
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:283)
        at 
org.apache.hadoop.ozone.container.TestContainerReportHandling.testDeletingOrDeletedContainerTransitionsToClosedWhenNonEmptyReplicaIsReported(TestContainerReportHandling.java:100)
   ...
   Caused by: org.apache.ratis.protocol.exceptions.NotLeaderException: Server 
a4f85781-650a-46e8-940e-a45bfdaa2a01@group-BBAD22E09632 is not the leader
        at 
org.apache.ratis.server.impl.RaftServerImpl.generateNotLeaderException(RaftServerImpl.java:780)
        at 
org.apache.ratis.server.impl.LeaderStateImpl.stop(LeaderStateImpl.java:437)
        at 
org.apache.ratis.server.impl.RoleInfo.shutdownLeaderState(RoleInfo.java:104)
        at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$close$1(RaftServerImpl.java:530)
        at 
org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306)
        at 
org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326)
        at 
org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
        at 
org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:512)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:207)
   ```
   Same problem affects TestContainerReportHandlingWithHA
   
   ---
   
   See `Terminating with exit status 1: Invalid event: DELETE at CLOSING 
state.` in test result log.
   
   ```
   2025-03-12 20:19:41,402 [scmNode-3-FixedThreadPoolWithAffinityExecutor-0-0] 
INFO  container.IncrementalContainerReportHandler 
(IncrementalContainerReportHandler.java:onMessage(109)) - Failed to process 
CLOSED container #1: org.apache.ratis.protocol.exceptions.NotLeaderException: 
Server cd03248d-9309-426c-ac05-3168de666b12@group-727E36EF7571 is not the 
leader, suggested leader is: 
5bbab233-360a-4332-b5e3-d5fdfa6c8f19|localhost:15076
   2025-03-12 20:19:41,402 [scmNode-2-FixedThreadPoolWithAffinityExecutor-0-0] 
INFO  container.IncrementalContainerReportHandler 
(IncrementalContainerReportHandler.java:onMessage(109)) - Failed to process 
CLOSED container #1: org.apache.ratis.protocol.exceptions.NotLeaderException: 
Server 416483bd-419d-4dc1-bc92-de8c5f70f57c@group-727E36EF7571 is not the 
leader, suggested leader is: 
5bbab233-360a-4332-b5e3-d5fdfa6c8f19|localhost:15076
   2025-03-12 20:19:41,409 
[5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater] 
ERROR statemachine.StateMachine (ExitUtils.java:terminate(133)) - Terminating 
with exit status 1: Invalid event: DELETE at CLOSING state.
   org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException: 
Invalid event: DELETE at CLOSING state.
       at 
org.apache.hadoop.ozone.common.statemachine.StateMachine.getNextState(StateMachine.java:58)
       at 
org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.updateContainerState(ContainerStateManagerImpl.java:354)
       at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.base/java.lang.reflect.Method.invoke(Method.java:566)
       at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:192)
       at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:155)
       at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1832)
       at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:252)
       at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:193)
       at java.base/java.lang.Thread.run(Thread.java:829)
   2025-03-12 20:19:41,409 
[5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater] 
ERROR impl.StateMachineUpdater (StateMachineUpdater.java:run(206)) - 
5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater 
caught a Throwable.
   org.apache.ratis.server.raftlog.RaftLogIOException: 
org.apache.ratis.util.ExitUtils$ExitException: Invalid event: DELETE at CLOSING 
state.
       at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1835)
       at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:252)
       at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:193)
       at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: org.apache.ratis.util.ExitUtils$ExitException: Invalid event: 
DELETE at CLOSING state.
       at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:141)
       at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:151)
       at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:176)
       at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1832)
       ... 3 more
   Caused by: 
org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException: 
Invalid event: DELETE at CLOSING state.
       at 
org.apache.hadoop.ozone.common.statemachine.StateMachine.getNextState(StateMachine.java:58)
       at 
org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.updateContainerState(ContainerStateManagerImpl.java:354)
       at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.base/java.lang.reflect.Method.invoke(Method.java:566)
       at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:192)
       at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:155)
       ... 4 more 
   ```
   
   ## What has been done?
   
   Wait till scm think that container is closed after datanodes report their 
containers are closed before update container with DELETE event
   
   ## What is the link to the Apache JIRA
   
   CI:
   - build-branch: https://github.com/peterxcli/ozone/actions/runs/13812189282
   - flakey-check
      - TestContainerReportHandling: 
https://github.com/peterxcli/ozone/actions/runs/13812243728
      - TestContainerReportHandlingWithHA: 
https://github.com/peterxcli/ozone/actions/runs/13812233483


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to