[ https://issues.apache.org/jira/browse/SOLR-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hendrik Haddorp updated SOLR-10092: ----------------------------------- Attachment: SOLR-10092.patch This new patch works with legacyCloud=false correctly for me but I must admit that I do not fully understand what the code tries to do. The flow in Solr is like this: 1) OverseerAutoReplicaFailoverThread decides to create a new core to replace a failed one 2) CoreContainer.create(String coreName, Path instancePath, Map<String, String> parameters, boolean newCollection) gets invoked 3) CoreContainer.create(CoreDescriptor dcore, boolean publishState, boolean newCollection) 4) ZkController.preRegister 5) ZkController.checkStateInZk If the legacyCloud mode is on nothing at all happens in step 5 and one check in step 2 is also not made. When legacyCloud mode is on things work but if it is off the code fails in step 5 because no shardId is set in the create core call done from the Overseer. This I fixed in my first patch so that the shared id/name gets passed into the core creation. The code in step 5 does check if the core creation data matches to what is stored in ZK. This can however not work in this case as the "baseUrl" will of course not match as we are trying to replace the core with a new one. So I now removed the baseUrl comparison and everything seems to work fine for with legacyClound on and off. Given that I don't really understand what check is done here and why that is only done when legacyCloud=false my fix might also not be correct and should be done different. But my patched version works at least ;-) > HDFS: AutoAddReplica fails > -------------------------- > > Key: SOLR-10092 > URL: https://issues.apache.org/jira/browse/SOLR-10092 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: hdfs > Affects Versions: 6.3 > Reporter: Hendrik Haddorp > Attachments: SOLR-10092.patch, SOLR-10092.patch > > > OverseerAutoReplicaFailoverThread fails to create replacement core with this > exception: > o.a.s.c.OverseerAutoReplicaFailoverThread Exception trying to create new > replica on > http://...:9000/solr:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error from server at http://...:9000/solr: Error CREATEing SolrCore > 'test2.collection-09_shard1_replica1': Unable to create core > [test2.collection-09_shard1_replica1] Caused by: No shard id for > CoreDescriptor[name=test2.collection-09_shard1_replica1;instanceDir=/var/opt/solr/test2.collection-09_shard1_replica1] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) > at > org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.createSolrCore(OverseerAutoReplicaFailoverThread.java:456) > at > org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.lambda$addReplica$0(OverseerAutoReplicaFailoverThread.java:251) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > also see this mail thread about the issue: > https://lists.apache.org/thread.html/%3CCAA70BoWyzbvQuJTyzaG4Kx1tj0Djgcm+MV=x_hoac1e6cse...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org