[ https://issues.apache.org/jira/browse/SOLR-11911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404696#comment-16404696 ]
Andrzej Bialecki commented on SOLR-11911: ------------------------------------------ This test doesn't use MiniSolrCloudCluster, it uses the simulator. However, you're right that the underlying issue was the Callable-s that didn't want to shut down when the executors were shutdown, specifically the loop in {{ComputePlanAction}}. In regular (non-simulated) tests that use small clusters and small collections this wasn't visible, but here with a 100 nodes and thousands of replicas the time it takes to compute all operations becomes significant - larger than the thread linger time. Regarding the shutdown of the cluster - whether simulated or not - it should interrupt the processing of autoscaling events because they won't be acted upon anyway. bq. so even if one of these executor tasks was effectively blocked forever, shouldn't that be causing the test to timeout, not report a leaked thread? The executor that processes trigger events (which manages the threads that were leaking here) is closed using {{shutdownNow}} for the reason above. This interrupts the threads, but the actual code didn't check for the interrupted status and continued looping. > TestLargeCluster.testSearchRate() failure > ----------------------------------------- > > Key: SOLR-11911 > URL: https://issues.apache.org/jira/browse/SOLR-11911 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Steve Rowe > Assignee: Andrzej Bialecki > Priority: Major > > My Jenkins found a branch_7x seed that reproduced 4/5 times for me: > {noformat} > Checking out Revision af9706cb89335a5aa04f9bcae0c2558a61803b50 > (refs/remotes/origin/branch_7x) > [...] > [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestLargeCluster > -Dtests.method=testSearchRate -Dtests.seed=2D7724685882A83D -Dtests.slow=true > -Dtests.locale=be-BY -Dtests.timezone=Africa/Ouagadougou -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > [junit4] FAILURE 1.24s J0 | TestLargeCluster.testSearchRate <<< > [junit4] > Throwable #1: java.lang.AssertionError: The trigger did not > fire at all > [junit4] > at > __randomizedtesting.SeedInfo.seed([2D7724685882A83D:703F3AE197440E72]:0) > [junit4] > at > org.apache.solr.cloud.autoscaling.sim.TestLargeCluster.testSearchRate(TestLargeCluster.java:547) > [junit4] > at java.lang.Thread.run(Thread.java:748) > [...] > [junit4] 2> NOTE: test params are: codec=CheapBastard, > sim=RandomSimilarity(queryNorm=true): {}, locale=be-BY, > timezone=Africa/Ouagadougou > [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation > 1.8.0_151 (64-bit)/cpus=16,threads=1,free=388243840,total=502267904 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org