symious opened a new pull request #2550:
URL: https://github.com/apache/ozone/pull/2550


   ## What changes were proposed in this pull request?
   
   Root cause is when MiniOzoneHAClusterImpl#bootstrapOzoneManager is creating 
a new OM, it may encounter a port conflict, this function will retry with a new 
port, but before that, the metadataManager of the first OM didn't close the 
lock on the rocksdb, which causes the test to fail for the retry.
   
   Options to solve:
   
   I tried to add a "metadataManager.stop()" in the constructor of OM when it 
fails to start RPC server, but it will prompt another error about the lock on 
ratis directory.
   I tried to stop the ratisServer too, but in 
https://github.com/apache/ratis/blob/dc0b68b4c0b8c187a08f669422a2cd099d7be0b7/ratis-common/src/main/java/org/apache/ratis/util/LifeCycle.java#L308,
 the close function will not be called, so the lock won't be released. Tried to 
call the closeMethod for State.NEW, but something wrong else happened.
   So I think it's much easier to just check if the port is available in 
MiniOzoneHAClusterImpl. 
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5632
   
   ## How was this patch tested?
   
   Test passed with following change.
   ```
   @@ -697,9 +698,11 @@ public void bootstrapOzoneManager(String omNodeId) 
throws Exception {
    
        long leaderSnapshotIndex = getOMLeader().getRatisSnapshotIndex();
    
   +    int start = 0;
        while (true) {
          try {
   -        basePort = 10000 + RANDOM.nextInt(1000) * 4;
   +//        basePort = 10000 + RANDOM.nextInt(1000) * 4;
   +        basePort = 10000 + start * 4;
            OzoneConfiguration newConf = addNewOMToConfig(getOMServiceId(),
                omNodeId, basePort);
    
   @@ -721,6 +724,7 @@ public void bootstrapOzoneManager(String omNodeId) 
throws Exception {
            if (e instanceof BindException ||
                e.getCause() instanceof BindException) {
              ++retryCount;
   +          start++;
              LOG.info("MiniOzoneHACluster port conflicts, retried {} times",
                  retryCount);
            } else {
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to