Github user srdo commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2734#discussion_r197856612
  
    --- Diff: storm-client/src/jvm/org/apache/storm/utils/Time.java ---
    @@ -185,8 +173,17 @@ public static void advanceTimeNanos(long nanos) {
             if (nanos < 0) {
                 throw new IllegalArgumentException("advanceTime only accepts 
positive time as an argument");
             }
    -        long newTime = SIMULATED_CURR_TIME_NANOS.addAndGet(nanos);
    -        LOG.debug("Advanced simulated time to {}", newTime);
    +        synchronized (SLEEP_TIMES_LOCK) {
    +            long newTime = SIMULATED_CURR_TIME_NANOS.addAndGet(nanos);
    +            Iterator<AtomicLong> sleepTimesIter = 
THREAD_SLEEP_TIMES_NANOS.values().iterator();
    +            while (sleepTimesIter.hasNext()) {
    +                AtomicLong curr = sleepTimesIter.next();
    +                if (SIMULATED_CURR_TIME_NANOS.get() >= curr.get()) {
    +                    sleepTimesIter.remove();
    +                }
    +            }
    +            LOG.debug("Advanced simulated time to {}", newTime);
    +        }
    --- End diff --
    
    That's not why this change is added. I want to make sure that when a test 
does `LocalCluster.advanceClusterTime`, all sleeping threads that are supposed 
to wake up actually get a chance to run before `waitForIdle` is checked again. 
The issue is that if we don't remove the thread from the map here, there isn't 
(as far as I can tell) a guarantee that the sleeping threads actually woke up.
    
    For example, if a test has a loop
    ```
    while (some condition)
      some code 
      LocalCluster.advanceClusterTime
    ```
    the first call to advanceClusterTime will wait until all threads are 
sleeping. If thread 1 is told to sleep for 1ms, it will enter the 
THREAD_SLEEP_TIMES_NANOS map. On the next iteration, 
`LocalCluster.advanceClusterTime` will increment the time, which should cause 
thread 1 to wake up. If the testing thread is "too fast", it might recheck 
`waitForIdle` and catch thread 1 before it manages to wake up and remove itself 
from THREAD_SLEEP_TIMES_NANOS. This causes LocalCluster.advanceClusterTime to 
return before the cluster is actually idle, because thread 1 will wake up and 
start running again after advanceClusterTime returns.
    
    The effect is you can't be sure that all threads are actually idle once 
LocalCluster.advanceClusterTime returns.


---

Reply via email to