[jira] [Commented] (SOLR-15660) Remove universal 10 second test thread leak linger.

Michael Gibney (Jira) Tue, 10 May 2022 12:58:06 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534560#comment-17534560
 ]


Michael Gibney commented on SOLR-15660:
---------------------------------------

Following on some offline conversation with [~krisden], I double-checked and am 
now completely convinced that it is unavoidably possible for worker threads to 
outlive ThreadPoolExecutor, even after 
{{ThreadPoolExecutor.awaitTermination()}} returns.

The following drop-in test is a simplified version of the exact pattern used in 
ThreadPoolExecutor termination, and it reliably demonstrates the kind of stack 
traces that we're talking about in this issue. My conclusion is that it's truly 
impossible to avoid these errors, absent ThreadLeakLingering, or maybe some 
crazy fancy footwork (not workable in Executors managed by third-party 
components) using ThreadFactory to track threads and actually join() on thread 
death.

I also have an answer regarding why {{linger = 10}} (milliseconds) and such low 
values seem to make a difference. I think randomizedtesting thread leak linger 
time [effectively has a floor of 
100ms|https://github.com/randomizedtesting/randomizedtesting/blob/65735481fc398424fd17c08a338efcb8b7185a2f/randomized-runner/src/main/java/com/carrotsearch/randomizedtesting/ThreadLeakControl.java#L581-L589],
 so all of the specific linger times specified by SOLR-15660 would have been 
effectively rounded up to 100).

{code:java}
  @Test
  public void demoFalsePositiveThreadLeak() throws Exception {
    // This test replicates a simplified version of what happens in
    // ThreadPoolExecutor.awaitTermination(), to prove that it is possible
    // to "leak" threads after awaitTermination() returns.
    final int count = 32;
    ThreadPoolExecutor exec = (ThreadPoolExecutor) 
Executors.newFixedThreadPool(count);
    try {
      // make other work
      // without this there isn't enough multithreaded cpu pressure to trigger
      // false positive thread leak detection
      for (int i = 0; i < count; i++) {
        exec.submit(() -> {
          int j = 0;
          final Thread current = Thread.currentThread();
          while ((++j & 0xFFFF) != 0 || !current.isInterrupted()) {
            // soak up processors
          }
        });
      }

      // now do the signal testing
      final ReentrantLock lock = new ReentrantLock();
      final Condition c = lock.newCondition();
      for (int i = 0; i < 100; i++) {
        System.err.println("try "+i);
        final Thread one = new Thread(() -> {
          lock.lock();
          try {
            for (int j = 0; j < 100000000; j++) {
              // busy-wait
              // Thread.sleep() is sufficient to generate non-TERMINATED
              // status, but will not reliably get us representative stack
              // traces
            }
            c.signalAll();
          } finally {
            lock.unlock();
          }
        });
        one.start();
        lock.lock();
        try {
          c.await();
        } finally {
          lock.unlock();
        }
        State state = one.getState();
        StackTraceElement[] trace = one.getStackTrace();
        if (State.TERMINATED != state && trace.length > 0) {
          AssertionError er = new AssertionError("Expected state TERMINATED; 
found: "+state);
          er.setStackTrace(trace);
          throw er;
        }
      }
    } finally {
      System.err.println("shutting down");
      ExecutorUtil.shutdownNowAndAwaitTermination(exec);
    }
  }
{code}

> Remove universal 10 second test thread leak linger.
> ---------------------------------------------------
>
>                 Key: SOLR-15660
>                 URL: https://issues.apache.org/jira/browse/SOLR-15660
>             Project: Solr
>          Issue Type: Test
>          Components: Tests
>            Reporter: Mark Robert Miller
>            Assignee: Mark Robert Miller
>            Priority: Minor
>             Fix For: 9.0
>
>         Attachments: screenshot-1.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-15660) Remove universal 10 second test thread leak linger.

Reply via email to