adoroszlai opened a new pull request #1697: URL: https://github.com/apache/ozone/pull/1697
## What changes were proposed in this pull request? `testWatchForCommitForGroupMismatchException` may fail with `Group ... not found` in `waitForPipelineClose`. The method destroys pipelines via regular SCM -> datanodes path, but then also calls `removeGroup` for each member manually. https://github.com/apache/ozone/blob/f30aba704e4b2929764e04db845d8e0cf54e261b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/ClosePipelineCommandHandler.java#L74 Only one of these can succeed, the group cannot be removed twice. If test code wins, we see this error in the output: ``` 2020-12-11 11:59:14,841 [Command processor thread] ERROR commandhandler.ClosePipelineCommandHandler (ClosePipelineCommandHandler.java:handle(78)) - Can't close pipeline PipelineID=82ba59fa-dd35-4d4e-a61e-59b281828b58 java.io.IOException: 675d35ba-bb57-4ae2-b121-2cd47ec358f4: Group group-59B281828B58 not found. at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.removeGroup(XceiverServerRatis.java:774) at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler.handle(ClosePipelineCommandHandler.java:74) ``` But if `ClosePipelineCommandHandler` is first to remove, the test fails: ``` testWatchForCommitForGroupMismatchException(org.apache.hadoop.ozone.client.rpc.TestWatchForCommit) Time elapsed: 39.399 s <<< ERROR! java.io.IOException: 14058016-3fcf-4920-991e-a5361899d5d7: Group group-59B281828B58 not found. at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.removeGroup(XceiverServerRatis.java:774) at org.apache.hadoop.ozone.container.TestHelper.waitForPipelineClose(TestHelper.java:242) at org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForGroupMismatchException(TestWatchForCommit.java:353) ``` https://issues.apache.org/jira/browse/HDDS-4013 ## How was this patch tested? Ran `TestWatchForCommit` 50x - never failed, only timed out once: https://github.com/adoroszlai/hadoop-ozone/runs/1544890660 Regular CI: https://github.com/adoroszlai/hadoop-ozone/actions/runs/418502626 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
