Github user dlogothetis commented on a diff in the pull request:
https://github.com/apache/giraph/pull/84#discussion_r218507979
--- Diff:
giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java ---
@@ -1379,9 +1379,15 @@ private boolean barrierOnWorkerList(String
finishedWorkerPath,
// Wait for a signal or timeout
boolean eventTriggered = event.waitMsecs(eventLoopTimeout);
+
+ // If the event was triggered, we reset it. In the next loop run, we
will
+ // read ZK to get the new hosts.
+ if (eventTriggered) {
+ event.reset();
+ }
+
long elapsedTimeSinceRegularRunMsec = System.currentTimeMillis() -
lastRegularRunTimeMsec;
- event.reset();
--- End diff --
It's possible that after `event.waitMsecs` exits (due to timeout) and
before `event.reset()` get called, the event get signaled. In this case, in the
next loop `event.waitMsec` will timeout again and `logInfoOnlyRun` will
continue to be false.
---