bharathv commented on a change in pull request #2769:
URL: https://github.com/apache/hbase/pull/2769#discussion_r599172025



##########
File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsBase.java
##########
@@ -159,7 +159,7 @@ public void tearDownAfterMethod() throws Exception {
     int missing = NUM_SLAVES_BASE - getNumServers();
     LOG.info("Restoring servers: " + missing);
     for (int i = 0; i < missing; i++) {
-      ((MiniHBaseCluster) CLUSTER).startRegionServer();
+      ((MiniHBaseCluster) CLUSTER).startRegionServerAndWait(2000);

Review comment:
       Cool, ok.

##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
##########
@@ -176,7 +176,7 @@ private void testSanity(final String testName) throws 
Exception {
   @Test
   public void testRegionAssignmentAfterMasterRecoveryDueToZKExpiry() throws 
Exception {
     MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
-    cluster.startRegionServer();
+    cluster.startRegionServerAndWait(2000);

Review comment:
       There seems to be a race condition. Multiple tests rely on this 
guarantee that the regionserver is up by the time this call is returned. A 
fixed sleep doesn't seem like a right solution.  Instead we should fix the 
following method
   
   ``` 
     /**
        * Block until the region server has come online, indicating it is ready
        * to be used.
        */
       public void waitForServerOnline() {
         // The server is marked online after the init method completes inside 
of
         // the HRS#run method.  HRS#init can fail for whatever region.  In 
those
         // cases, we'll jump out of the run without setting online flag.  Check
         // stopRequested so we don't wait here a flag that will never be 
flipped.
         regionServer.waitForServerOnline();
       }
     }
   ```
   
   That translates to when `online` is marked in HRegionServer. Currently it is 
marked 
   
   
   ```
    public void waitForServerOnline(){
       while (!isStopped() && !isOnline()) {
         synchronized (online) {
           try {
             online.wait(msgInterval);
           } catch (InterruptedException ie) {
             Thread.currentThread().interrupt();
             break;
           }
         }
       }
     }
   ```
   
   Instead shouldn't it check when it is marked online by master (which happens 
after the first report is sent)? (block until the regionserver is marked online 
in the active master's server manager)
   
   @saintstack WDYT? This doesn't cause test flakes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to