[ https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe updated SOLR-10420: ------------------------------ Attachment: OverseerTest.80.stdout I ran all Solr tests with the patch on master, and one test failed: {noformat} [junit4] 2> 264992 ERROR (OverseerExitThread) [ ] o.a.s.c.Overseer could not read the data [junit4] 2> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer_elect/leader [junit4] 2> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) [junit4] 2> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) [junit4] 2> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356) [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353) [junit4] 2> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353) [junit4] 2> at org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:290) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=OverseerTest -Dtests.method=testExternalClusterStateChangeBehavior -Dtests.seed=2110CE0AEF674CFA -Dtests.slow=true -Dtests.locale=es-GT -Dtests.timezone=Asia/Kolkata -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 5.46s J12 | OverseerTest.testExternalClusterStateChangeBehavior <<< [junit4] > Throwable #1: java.lang.AssertionError: Illegal state, was: down expected:active clusterState:live nodes:[]collections:{c1=DocCollection(c1//clusterstate.json/2)={ [junit4] > "shards":{"shard1":{ [junit4] > "parent":null, [junit4] > "range":null, [junit4] > "state":"active", [junit4] > "replicas":{"core_node1":{ [junit4] > "base_url":"http://127.0.0.1/solr", [junit4] > "node_name":"node1", [junit4] > "core":"core1", [junit4] > "roles":"", [junit4] > "state":"down"}}}}, [junit4] > "router":{"name":"implicit"}}, test=LazyCollectionRef(test)} [junit4] > at __randomizedtesting.SeedInfo.seed([2110CE0AEF674CFA:490ECDE60DF716B4]:0) [junit4] > at org.apache.solr.cloud.AbstractDistribZkTestBase.verifyReplicaStatus(AbstractDistribZkTestBase.java:273) [junit4] > at org.apache.solr.cloud.OverseerTest.testExternalClusterStateChangeBehavior(OverseerTest.java:1259) {noformat} I ran the repro line a couple of times and it didn't reproduce. I then beasted 100 iterations of the test suite using Miller's beasting script, and it failed once. I'm attaching the test log from the failure. Looking at emailed Jenkins reports of {{testExternalClusterStateChangeBehavior()}} failing, I see that it was failing almost daily until the day SOLR-9191 was committed (June 9, 2016), and then zero failures since, so this failure seems suspicious to me, since this issue is related to SOLR-9191. I beasted 200 iterations of OverseerTest without the patch, and got zero failures. > Solr 6.x leaking one SolrZkClient instance per second > ----------------------------------------------------- > > Key: SOLR-10420 > URL: https://issues.apache.org/jira/browse/SOLR-10420 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 6.5, 6.4.2 > Reporter: Markus Jelsma > Fix For: master (7.0), branch_6x > > Attachments: OverseerTest.80.stdout, SOLR-10420.patch > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > So i opened VisualVM to keep an eye on it and spotted a different problem > that occurs in all our Solr 6.4.2 and 6.5.0 nodes. > It appears Solr is leaking one SolrZkClient instance per second via > DistributedQueue$ChildWatcher. That one per second is quite accurate for all > nodes, there are about the same amount of instances as there are seconds > since Solr started. I know VisualVM's instance count includes > objects-to-be-collected, the instance count does not drop after a forced > garbed collection round. > It doesn't matter how many cores or collections the nodes carry or how heavy > traffic is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org