[jira] [Updated] (SOLR-10092) HDFS: AutoAddReplica fails

Hendrik Haddorp (JIRA) Fri, 24 Feb 2017 01:50:35 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hendrik Haddorp updated SOLR-10092:
-----------------------------------
    Attachment: SOLR-10092.patch

This new patch works with legacyCloud=false correctly for me but I must admit 
that I do not fully understand what the code tries to do.

The flow in Solr is like this:
1) OverseerAutoReplicaFailoverThread decides to create a new core to replace a 
failed one
2) CoreContainer.create(String coreName, Path instancePath, Map<String, String> 
parameters, boolean newCollection) gets invoked
3) CoreContainer.create(CoreDescriptor dcore, boolean publishState, boolean 
newCollection)
4) ZkController.preRegister
5) ZkController.checkStateInZk

If the legacyCloud mode is on nothing at all happens in step 5 and one check in 
step 2 is also not made.

When legacyCloud mode is on things work but if it is off the code fails in step 
5 because no shardId is set in the create core call done from the Overseer. 
This I fixed in my first patch so that the shared id/name gets passed into the 
core creation. The code in step 5 does check if the core creation data matches 
to what is stored in ZK. This can however not work in this case as the 
"baseUrl" will of course not match as we are trying to replace the core with a 
new one. So I now removed the baseUrl comparison and everything seems to work 
fine for with legacyClound on and off. Given that I don't really understand 
what check is done here and why that is only done when legacyCloud=false my fix 
might also not be correct and should be done different. But my patched version 
works at least ;-)

> HDFS: AutoAddReplica fails
> --------------------------
>
>                 Key: SOLR-10092
>                 URL: https://issues.apache.org/jira/browse/SOLR-10092
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: hdfs
>    Affects Versions: 6.3
>            Reporter: Hendrik Haddorp
>         Attachments: SOLR-10092.patch, SOLR-10092.patch
>
>
> OverseerAutoReplicaFailoverThread fails to create replacement core with this 
> exception:
> o.a.s.c.OverseerAutoReplicaFailoverThread Exception trying to create new 
> replica on 
> http://...:9000/solr:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>  Error from server at http://...:9000/solr: Error CREATEing SolrCore 
> 'test2.collection-09_shard1_replica1': Unable to create core 
> [test2.collection-09_shard1_replica1] Caused by: No shard id for 
> CoreDescriptor[name=test2.collection-09_shard1_replica1;instanceDir=/var/opt/solr/test2.collection-09_shard1_replica1]
>     at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593)
>     at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
>     at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
>     at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
>     at 
> org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.createSolrCore(OverseerAutoReplicaFailoverThread.java:456)
>     at 
> org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.lambda$addReplica$0(OverseerAutoReplicaFailoverThread.java:251)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745) 
> also see this mail thread about the issue: 
> https://lists.apache.org/thread.html/%3CCAA70BoWyzbvQuJTyzaG4Kx1tj0Djgcm+MV=x_hoac1e6cse...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10092) HDFS: AutoAddReplica fails

Reply via email to