peterxcli opened a new pull request, #8060: URL: https://github.com/apache/ozone/pull/8060
## What changes were proposed in this pull request? ``` Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 82.54 s <<< FAILURE! -- in org.apache.hadoop.ozone.container.TestContainerReportHandling org.apache.hadoop.ozone.container.TestContainerReportHandling.testDeletingOrDeletedContainerTransitionsToClosedWhenNonEmptyReplicaIsReported(LifeCycleState)[2] -- Time elapsed: 33.84 s <<< ERROR! org.apache.hadoop.hdds.scm.exceptions.SCMException: org.apache.ratis.protocol.exceptions.NotLeaderException: Server a4f85781-650a-46e8-940e-a45bfdaa2a01@group-BBAD22E09632 is not the leader at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.translateException(SCMHAInvocationHandler.java:164) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:114) at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:73) at jdk.proxy2/jdk.proxy2.$Proxy42.updateContainerState(Unknown Source) at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.updateContainerState(ContainerManagerImpl.java:283) at org.apache.hadoop.ozone.container.TestContainerReportHandling.testDeletingOrDeletedContainerTransitionsToClosedWhenNonEmptyReplicaIsReported(TestContainerReportHandling.java:100) ... Caused by: org.apache.ratis.protocol.exceptions.NotLeaderException: Server a4f85781-650a-46e8-940e-a45bfdaa2a01@group-BBAD22E09632 is not the leader at org.apache.ratis.server.impl.RaftServerImpl.generateNotLeaderException(RaftServerImpl.java:780) at org.apache.ratis.server.impl.LeaderStateImpl.stop(LeaderStateImpl.java:437) at org.apache.ratis.server.impl.RoleInfo.shutdownLeaderState(RoleInfo.java:104) at org.apache.ratis.server.impl.RaftServerImpl.lambda$close$1(RaftServerImpl.java:530) at org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306) at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326) at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304) at org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:512) at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:207) ``` Same problem affects TestContainerReportHandlingWithHA --- See `Terminating with exit status 1: Invalid event: DELETE at CLOSING state.` in test result log. ``` 2025-03-12 20:19:41,402 [scmNode-3-FixedThreadPoolWithAffinityExecutor-0-0] INFO container.IncrementalContainerReportHandler (IncrementalContainerReportHandler.java:onMessage(109)) - Failed to process CLOSED container #1: org.apache.ratis.protocol.exceptions.NotLeaderException: Server cd03248d-9309-426c-ac05-3168de666b12@group-727E36EF7571 is not the leader, suggested leader is: 5bbab233-360a-4332-b5e3-d5fdfa6c8f19|localhost:15076 2025-03-12 20:19:41,402 [scmNode-2-FixedThreadPoolWithAffinityExecutor-0-0] INFO container.IncrementalContainerReportHandler (IncrementalContainerReportHandler.java:onMessage(109)) - Failed to process CLOSED container #1: org.apache.ratis.protocol.exceptions.NotLeaderException: Server 416483bd-419d-4dc1-bc92-de8c5f70f57c@group-727E36EF7571 is not the leader, suggested leader is: 5bbab233-360a-4332-b5e3-d5fdfa6c8f19|localhost:15076 2025-03-12 20:19:41,409 [5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater] ERROR statemachine.StateMachine (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Invalid event: DELETE at CLOSING state. org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException: Invalid event: DELETE at CLOSING state. at org.apache.hadoop.ozone.common.statemachine.StateMachine.getNextState(StateMachine.java:58) at org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.updateContainerState(ContainerStateManagerImpl.java:354) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:192) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:155) at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1832) at org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:252) at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:193) at java.base/java.lang.Thread.run(Thread.java:829) 2025-03-12 20:19:41,409 [5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater] ERROR impl.StateMachineUpdater (StateMachineUpdater.java:run(206)) - 5bbab233-360a-4332-b5e3-d5fdfa6c8f19@group-727E36EF7571-StateMachineUpdater caught a Throwable. org.apache.ratis.server.raftlog.RaftLogIOException: org.apache.ratis.util.ExitUtils$ExitException: Invalid event: DELETE at CLOSING state. at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1835) at org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:252) at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:193) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.ratis.util.ExitUtils$ExitException: Invalid event: DELETE at CLOSING state. at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:141) at org.apache.ratis.util.ExitUtils.terminate(ExitUtils.java:151) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:176) at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1832) ... 3 more Caused by: org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException: Invalid event: DELETE at CLOSING state. at org.apache.hadoop.ozone.common.statemachine.StateMachine.getNextState(StateMachine.java:58) at org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.updateContainerState(ContainerStateManagerImpl.java:354) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:192) at org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:155) ... 4 more ``` ## What has been done? Wait till scm think that container is closed after datanodes report their containers are closed before update container with DELETE event ## What is the link to the Apache JIRA CI: - build-branch: https://github.com/peterxcli/ozone/actions/runs/13812189282 - flakey-check - TestContainerReportHandling: https://github.com/peterxcli/ozone/actions/runs/13812243728 - TestContainerReportHandlingWithHA: https://github.com/peterxcli/ozone/actions/runs/13812233483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org