[ 
https://issues.apache.org/jira/browse/SOLR-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533066#comment-17533066
 ] 

Michael Gibney commented on SOLR-15660:
---------------------------------------

There's several issues being conflated here I think. The stack trace that Jan 
and Gus refer to above _is_ able to be addressed I believe; I've taken an 
initial stab at doing so in 
[apache/solr#842|https://github.com/apache/solr/pull/842]. But I think it's 
impossible to entirely get around ThreadLeakLinger, and probably a mistake to 
try to do so, due to the [number of errors we're seeing 
now|https://lists.apache.org/list?bui...@solr.apache.org:dfr=2022-4-1] that are 
really _completely_ spurious.

There's some high-level discussion of ThreadLeakLinger in MAHOUT-1345 that 
makes it clear even if we fixed all the actual thread leaks, we'd still be 
getting "leaked" threads detected spuriously [between 
awaitTermination.signalAll and actual thread 
death|https://github.com/openjdk/jdk17u/blob/20f3576cd1bbe516360b0d9f7deaacdad94df4d7/src/java.base/share/classes/java/util/concurrent/ThreadPoolExecutor.java#L728-L733].

Universal thread leak linger of 10s is probably overkill, granted; and may (?) 
have masked a bunch of actual issues. But I'd argue that universal thread leak 
linger of perhaps 1s could be viewed as a small price to pay for avoiding tons 
of spurious failures and avoiding the need for every developer to be intimately 
familiar with the issues surrounding (perfectly normal) delayed thread death in 
Executors.

Notably though, even the TestLeaderElectionZkExpiry test failure, though "real" 
in a sense, may not have been worth the trouble to address. Yes it was 
surprising to me that ZooKeeper.close() doesn't block until the connection 
threads die. But although we can tighten that up, I'd argue it represents a 
transient/aesthetic resource leak, not a practically significant one. And the 
unavoidable Executor "resource leak" is a game changer from my perspective. I 
mean, 10ms might _indeed_ be enough time to avoid these spurious errors. 
Evidently _0s_ is usually enough time to avoid such errors :)

> Remove universal 10 second test thread leak linger.
> ---------------------------------------------------
>
>                 Key: SOLR-15660
>                 URL: https://issues.apache.org/jira/browse/SOLR-15660
>             Project: Solr
>          Issue Type: Test
>          Components: Tests
>            Reporter: Mark Robert Miller
>            Assignee: Mark Robert Miller
>            Priority: Minor
>             Fix For: 9.0
>
>         Attachments: screenshot-1.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to