[ https://issues.apache.org/jira/browse/SOLR-12047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cao Manh Dat updated SOLR-12047: -------------------------------- Summary: Increase checkStateInZk timeout (was: Increasing checkStateInZk timeout) > Increase checkStateInZk timeout > ------------------------------- > > Key: SOLR-12047 > URL: https://issues.apache.org/jira/browse/SOLR-12047 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 7.0 > Reporter: Varun Thacker > Priority: Major > Attachments: SOLR-12047.patch > > > I've seen this with 2 users running Solr 7.2.1 in the last 2 days where a > restart fails to load some cores on a node. > > Here's the stack trace > > > {noformat} > date time ERROR > (coreLoadExecutor-6-thread-2-processing-n:solr-number:8983_solr) [c:name > s:shard r:core_node130 x:collection_shard_replica] o.a.s.c.ZkController > org.apache.solr.common.SolrException: coreNodeName core_node130 does not > exist in shard shard4: > DocCollection(collection_name//collections/collection_name/state.json/2385)={ > ..collection state.json ... > } > at org.apache.solr.cloud.ZkController.checkStateInZk(ZkController.java:1687) > at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1590) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1030) > ... > at java.lang.Thread.run(Thread.java:748) > date time ERROR > (coreContainerWorkExecutor-2-thread-1-processing-n:solr-number:8983_solr) [ ] > o.a.s.c.CoreContainer Error waiting for SolrCore to be created > java.util.concurrent.ExecutionException: > org.apache.solr.common.SolrException: Unable to create core > [collection_shardX_replica_n129] > ... > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.solr.common.SolrException: Unable to create core > [collection_shardX_replica_n129] > ... > ... 5 more > Caused by: org.apache.solr.common.SolrException: > at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1619) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1030) > ... 7 more{noformat} > I created the Jira saying Solr 7.x since it's tied to legacyCloud being set > to false by default starting Solr 7.0 > > > In ZkController#checkStateInZk where the block is only run with > legacyCloud=false ( L1645 ) we do a waitForState ( L1667 ) and only wait 3 > seconds. If we don't get the desired state the core will fail to load > > With big enough clusters this 3 second timeout is too low and we should > increase it to a large number such that we don't cause core initialization > failures > Line reference is from Solr 7.2.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org