[
https://issues.apache.org/jira/browse/SOLR-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851833#comment-17851833
]
David Smiley commented on SOLR-16122:
-
This test seems to fail due to thread leaks.
Happened yesterday in CI:
{noformat}
2> INFO: All leaked threads terminated.
> com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked
from SUITE scope at org.apache.solr.cloud.TestLeaderElectionZkExpiry:
>1) Thread[id=9557,
name=zkConnectionManagerCallback-5960-thread-1-EventThread, state=WAITING,
group=TGRP-TestLeaderElectionZkExpiry]
> at java.base@11.0.16.1/jdk.internal.misc.Unsafe.park(Native
Method)
> at
java.base@11.0.16.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
> at
java.base@11.0.16.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2081)
> at
java.base@11.0.16.1/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)
> at
app//org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:535)
>2) Thread[id=9549,
name=zkConnectionManagerCallback-5960-thread-1-EventThread, state=WAITING,
group=TGRP-TestLeaderElectionZkExpiry]
> at java.base@11.0.16.1/java.lang.Object.wait(Native Method)
> at java.base@11.0.16.1/java.lang.Object.wait(Object.java:328)
> at
app//org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1583)
> at
app//org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1555)
> at
app//org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1522)
> at
app//org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:1227)
> at
app//org.apache.solr.common.cloud.SolrZkClient.updateKeeper(SolrZkClient.java:863)
> at
app//org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:190)
> at
app//org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:59)
> at
app//org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:179)
> at
app//org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:564)
> at
app//org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:539)
> at __randomizedtesting.SeedInfo.seed([B35AE6C0068D8659]:0)
{noformat}
And also for me on Crave recently (this time the OverseerShutdownThread):
{noformat}
2> SEVERE: 1 thread leaked from SUITE scope at
org.apache.solr.cloud.TestLeaderElectionZkExpiry:
2>1) Thread[id=349, name=OverseerExitThread, state=TIMED_WAITING,
group=Overseer state updater.]
2> at java.base@11.0.23/java.lang.Thread.sleep(Native Method)
2> at
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:101)
2> at
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:80)
2> at
app//org.apache.solr.common.cloud.SolrZkClient.delete(SolrZkClient.java:345)
2> at
app//org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:118)
2> at
app//org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:310)
2> at
app//org.apache.solr.cloud.LeaderElector.retryElection(LeaderElector.java:395)
2> at
app//org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:133)
2> at
app//org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:310)
2> at
app//org.apache.solr.cloud.LeaderElector.retryElection(LeaderElector.java:395)
2> at
app//org.apache.solr.cloud.ZkController.rejoinOverseerElection(ZkController.java:2364)
2> at
app//org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:511)
2> at
app//org.apache.solr.cloud.Overseer$ClusterStateUpdater$$Lambda$1667/0x00010099b840.run(Unknown
Source)
2> at java.base@11.0.23/java.lang.Thread.run(Thread.java:829)
{noformat}
This one above seems clear to me how it could happen since a new Thread is
spawned with no wait
[here|https://github.com/apache/solr/blob/70b6e4f6952cb7f9b3647865404487c68264668d/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L417].
> TestLeaderElectionZkExpiry failing frequently
> -
>
> Key: SOLR-16122
> URL: https://issues.apache.org/jira/browse/SOLR-16122
> Project: Solr
> Issue Type: Bug
>Affects Versions: 9.0
>Reporter: Jan Høydahl
>Priority: Major
>
> Failing in 10% of runs - marking as {{@BadApple}} before the 9.0 release
--
This message was sent by Atlassian Jira