[ https://issues.apache.org/jira/browse/SOLR-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534560#comment-17534560 ]
Michael Gibney commented on SOLR-15660: --------------------------------------- Following on some offline conversation with [~krisden], I double-checked and am now completely convinced that it is unavoidably possible for worker threads to outlive ThreadPoolExecutor, even after {{ThreadPoolExecutor.awaitTermination()}} returns. The following drop-in test is a simplified version of the exact pattern used in ThreadPoolExecutor termination, and it reliably demonstrates the kind of stack traces that we're talking about in this issue. My conclusion is that it's truly impossible to avoid these errors, absent ThreadLeakLingering, or maybe some crazy fancy footwork (not workable in Executors managed by third-party components) using ThreadFactory to track threads and actually join() on thread death. I also have an answer regarding why {{linger = 10}} (milliseconds) and such low values seem to make a difference. I think randomizedtesting thread leak linger time [effectively has a floor of 100ms|https://github.com/randomizedtesting/randomizedtesting/blob/65735481fc398424fd17c08a338efcb8b7185a2f/randomized-runner/src/main/java/com/carrotsearch/randomizedtesting/ThreadLeakControl.java#L581-L589], so all of the specific linger times specified by SOLR-15660 would have been effectively rounded up to 100). {code:java} @Test public void demoFalsePositiveThreadLeak() throws Exception { // This test replicates a simplified version of what happens in // ThreadPoolExecutor.awaitTermination(), to prove that it is possible // to "leak" threads after awaitTermination() returns. final int count = 32; ThreadPoolExecutor exec = (ThreadPoolExecutor) Executors.newFixedThreadPool(count); try { // make other work // without this there isn't enough multithreaded cpu pressure to trigger // false positive thread leak detection for (int i = 0; i < count; i++) { exec.submit(() -> { int j = 0; final Thread current = Thread.currentThread(); while ((++j & 0xFFFF) != 0 || !current.isInterrupted()) { // soak up processors } }); } // now do the signal testing final ReentrantLock lock = new ReentrantLock(); final Condition c = lock.newCondition(); for (int i = 0; i < 100; i++) { System.err.println("try "+i); final Thread one = new Thread(() -> { lock.lock(); try { for (int j = 0; j < 100000000; j++) { // busy-wait // Thread.sleep() is sufficient to generate non-TERMINATED // status, but will not reliably get us representative stack // traces } c.signalAll(); } finally { lock.unlock(); } }); one.start(); lock.lock(); try { c.await(); } finally { lock.unlock(); } State state = one.getState(); StackTraceElement[] trace = one.getStackTrace(); if (State.TERMINATED != state && trace.length > 0) { AssertionError er = new AssertionError("Expected state TERMINATED; found: "+state); er.setStackTrace(trace); throw er; } } } finally { System.err.println("shutting down"); ExecutorUtil.shutdownNowAndAwaitTermination(exec); } } {code} > Remove universal 10 second test thread leak linger. > --------------------------------------------------- > > Key: SOLR-15660 > URL: https://issues.apache.org/jira/browse/SOLR-15660 > Project: Solr > Issue Type: Test > Components: Tests > Reporter: Mark Robert Miller > Assignee: Mark Robert Miller > Priority: Minor > Fix For: 9.0 > > Attachments: screenshot-1.png > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org