[
https://issues.apache.org/jira/browse/SOLR-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jan Høydahl updated SOLR-13061:
-------------------------------
Priority: Major (was: Blocker)
> Solr replica remaining down status when hitting the maxQueueSize as 20000
> after Solr servers restarted
> ------------------------------------------------------------------------------------------------------
>
> Key: SOLR-13061
> URL: https://issues.apache.org/jira/browse/SOLR-13061
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 7.2, 7.3, 7.3.1, 7.4, 7.5
> Environment: Cluster info: 6 nodes, 30 Solr servers (5 Solr server
> per node)
> 1000 collections, 10 shards per collection, 3 replica per shard
> Exception happened when restarting Solr cluster.
> Reporter: Zhaohui Ma
> Priority: Major
> Labels: performance
>
> 1. Cluster info: 6 nodes, 30 Solr servers
> 1000 collections, 10 shards per collection, 3 replica per shard.
> Exception happened when restarting Solr cluster.
>
> 2. Exception happened when restarting Solr cluster. The question is NO
> exception hander is defined when this exception
> "java.lang.IllegalStateException: queue is full" is thrown when arriving at
> the threshold
> STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to
> preRegister and never come up again.
>
> 3. Suggestions:
> a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable? Any plan or risk
> to enlarge this queue size as 20000 is too much small.
> b. Should this configuration STATE_UPDATE_MAX_QUEUE configurable by user?
> Currently it is hard code in Overseer.java:
> public static final int STATE_UPDATE_MAX_QUEUE = 20000;
> c. IllegalStateException should be handled and retry logic should be added.
>
> 4. Detailed error is given as below.
> 2018-12-12 11:20:24,737 | ERROR |
> coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr |
> Error waiting for SolrCore to be created |
> org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578)
> java.util.concurrent.ExecutionException:
> org.apache.solr.common.SolrException: Unable to create core
> [collection9_shard1_replica3]
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.solr.common.SolrException: Unable to create core
> [collection9_shard1_replica3]
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
> at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546)
> ... 5 more
> Caused by: java.lang.IllegalStateException: queue is full
> at
> org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346)
> at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245)
> at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061)
> ... 6 more
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]