adoroszlai opened a new pull request #1697:
URL: https://github.com/apache/ozone/pull/1697


   ## What changes were proposed in this pull request?
   
   `testWatchForCommitForGroupMismatchException` may fail with `Group ... not 
found` in `waitForPipelineClose`.  The method destroys pipelines via regular 
SCM -> datanodes path, but then also calls `removeGroup` for each member 
manually.
   
   
https://github.com/apache/ozone/blob/f30aba704e4b2929764e04db845d8e0cf54e261b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/ClosePipelineCommandHandler.java#L74
   
   Only one of these can succeed, the group cannot be removed twice.  If test 
code wins, we see this error in the output:
   
   ```
   2020-12-11 11:59:14,841 [Command processor thread] ERROR 
commandhandler.ClosePipelineCommandHandler 
(ClosePipelineCommandHandler.java:handle(78)) - Can't close pipeline 
PipelineID=82ba59fa-dd35-4d4e-a61e-59b281828b58
   java.io.IOException: 675d35ba-bb57-4ae2-b121-2cd47ec358f4: Group 
group-59B281828B58 not found.
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.removeGroup(XceiverServerRatis.java:774)
        at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler.handle(ClosePipelineCommandHandler.java:74)
   ```
   
   But if `ClosePipelineCommandHandler` is first to remove, the test fails:
   
   ```
   
testWatchForCommitForGroupMismatchException(org.apache.hadoop.ozone.client.rpc.TestWatchForCommit)
  Time elapsed: 39.399 s  <<< ERROR!
   java.io.IOException: 14058016-3fcf-4920-991e-a5361899d5d7: Group 
group-59B281828B58 not found.
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.removeGroup(XceiverServerRatis.java:774)
        at 
org.apache.hadoop.ozone.container.TestHelper.waitForPipelineClose(TestHelper.java:242)
        at 
org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForGroupMismatchException(TestWatchForCommit.java:353)
   ```
   
   https://issues.apache.org/jira/browse/HDDS-4013
   
   ## How was this patch tested?
   
   Ran `TestWatchForCommit` 50x - never failed, only timed out once:
   https://github.com/adoroszlai/hadoop-ozone/runs/1544890660
   
   Regular CI:
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/418502626


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to