Github user srdo commented on a diff in the pull request:
https://github.com/apache/storm/pull/2734#discussion_r197856612
--- Diff: storm-client/src/jvm/org/apache/storm/utils/Time.java ---
@@ -185,8 +173,17 @@ public static void advanceTimeNanos(long nanos) {
if (nanos < 0) {
throw new IllegalArgumentException("advanceTime only accepts
positive time as an argument");
}
- long newTime = SIMULATED_CURR_TIME_NANOS.addAndGet(nanos);
- LOG.debug("Advanced simulated time to {}", newTime);
+ synchronized (SLEEP_TIMES_LOCK) {
+ long newTime = SIMULATED_CURR_TIME_NANOS.addAndGet(nanos);
+ Iterator<AtomicLong> sleepTimesIter =
THREAD_SLEEP_TIMES_NANOS.values().iterator();
+ while (sleepTimesIter.hasNext()) {
+ AtomicLong curr = sleepTimesIter.next();
+ if (SIMULATED_CURR_TIME_NANOS.get() >= curr.get()) {
+ sleepTimesIter.remove();
+ }
+ }
+ LOG.debug("Advanced simulated time to {}", newTime);
+ }
--- End diff --
That's not why this change is added. I want to make sure that when a test
does `LocalCluster.advanceClusterTime`, all sleeping threads that are supposed
to wake up actually get a chance to run before `waitForIdle` is checked again.
The issue is that if we don't remove the thread from the map here, there isn't
(as far as I can tell) a guarantee that the sleeping threads actually woke up.
For example, if a test has a loop
```
while (some condition)
some code
LocalCluster.advanceClusterTime
```
the first call to advanceClusterTime will wait until all threads are
sleeping. If thread 1 is told to sleep for 1ms, it will enter the
THREAD_SLEEP_TIMES_NANOS map. On the next iteration,
`LocalCluster.advanceClusterTime` will increment the time, which should cause
thread 1 to wake up. If the testing thread is "too fast", it might recheck
`waitForIdle` and catch thread 1 before it manages to wake up and remove itself
from THREAD_SLEEP_TIMES_NANOS. This causes LocalCluster.advanceClusterTime to
return before the cluster is actually idle, because thread 1 will wake up and
start running again after advanceClusterTime returns.
The effect is you can't be sure that all threads are actually idle once
LocalCluster.advanceClusterTime returns.
---