[ 
https://issues.apache.org/jira/browse/SOLR-11911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404696#comment-16404696
 ] 

Andrzej Bialecki  commented on SOLR-11911:
------------------------------------------

This test doesn't use MiniSolrCloudCluster, it uses the simulator. However, 
you're right that the underlying issue was the Callable-s that didn't want to 
shut down when the executors were shutdown, specifically the loop in 
{{ComputePlanAction}}. In regular (non-simulated) tests that use small clusters 
and small collections this wasn't visible, but here with a 100 nodes and 
thousands of replicas the time it takes to compute all operations becomes 
significant - larger than the thread linger time.

Regarding the shutdown of the cluster - whether simulated or not - it should 
interrupt the processing of autoscaling events because they won't be acted upon 
anyway.

bq. so even if one of these executor tasks was effectively blocked forever, 
shouldn't that be causing the test to timeout, not report a leaked thread?
The executor that processes trigger events (which manages the threads that were 
leaking here) is closed using {{shutdownNow}} for the reason above. This 
interrupts the threads, but the actual code didn't check for the interrupted 
status and continued looping.


> TestLargeCluster.testSearchRate() failure
> -----------------------------------------
>
>                 Key: SOLR-11911
>                 URL: https://issues.apache.org/jira/browse/SOLR-11911
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Steve Rowe
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>
> My Jenkins found a branch_7x seed that reproduced 4/5 times for me:
> {noformat}
> Checking out Revision af9706cb89335a5aa04f9bcae0c2558a61803b50 
> (refs/remotes/origin/branch_7x)
> [...]
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestLargeCluster 
> -Dtests.method=testSearchRate -Dtests.seed=2D7724685882A83D -Dtests.slow=true 
> -Dtests.locale=be-BY -Dtests.timezone=Africa/Ouagadougou -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>    [junit4] FAILURE 1.24s J0  | TestLargeCluster.testSearchRate <<<
>    [junit4]    > Throwable #1: java.lang.AssertionError: The trigger did not 
> fire at all
>    [junit4]    >      at 
> __randomizedtesting.SeedInfo.seed([2D7724685882A83D:703F3AE197440E72]:0)
>    [junit4]    >      at 
> org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate(TestLargeCluster.java:547)
>    [junit4]    >      at java.lang.Thread.run(Thread.java:748)
> [...]
>    [junit4]   2> NOTE: test params are: codec=CheapBastard, 
> sim=RandomSimilarity(queryNorm=true): {}, locale=be-BY, 
> timezone=Africa/Ouagadougou
>    [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 
> 1.8.0_151 (64-bit)/cpus=16,threads=1,free=388243840,total=502267904
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to