bharathv commented on a change in pull request #2769:
URL: https://github.com/apache/hbase/pull/2769#discussion_r599172025
##########
File path:
hbase-server/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsBase.java
##########
@@ -159,7 +159,7 @@ public void tearDownAfterMethod() throws Exception {
int missing = NUM_SLAVES_BASE - getNumServers();
LOG.info("Restoring servers: " + missing);
for (int i = 0; i < missing; i++) {
- ((MiniHBaseCluster) CLUSTER).startRegionServer();
+ ((MiniHBaseCluster) CLUSTER).startRegionServerAndWait(2000);
Review comment:
Cool, ok.
##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
##########
@@ -176,7 +176,7 @@ private void testSanity(final String testName) throws
Exception {
@Test
public void testRegionAssignmentAfterMasterRecoveryDueToZKExpiry() throws
Exception {
MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
- cluster.startRegionServer();
+ cluster.startRegionServerAndWait(2000);
Review comment:
There seems to be a race condition. Multiple tests rely on this
guarantee that the regionserver is up by the time this call is returned. A
fixed sleep doesn't seem like a right solution. Instead we should fix the
following method
```
/**
* Block until the region server has come online, indicating it is ready
* to be used.
*/
public void waitForServerOnline() {
// The server is marked online after the init method completes inside
of
// the HRS#run method. HRS#init can fail for whatever region. In
those
// cases, we'll jump out of the run without setting online flag. Check
// stopRequested so we don't wait here a flag that will never be
flipped.
regionServer.waitForServerOnline();
}
}
```
That translates to when `online` is marked in HRegionServer. Currently it is
marked
```
public void waitForServerOnline(){
while (!isStopped() && !isOnline()) {
synchronized (online) {
try {
online.wait(msgInterval);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
break;
}
}
}
}
```
Instead shouldn't it check when it is marked online by master (which happens
after the first report is sent)? (block until the regionserver is marked online
in the active master's server manager)
@saintstack WDYT? This doesn't cause test flakes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]