[jira] [Commented] (SOLR-3721) Multiple concurrent recoveries of same shard?

Mark Miller (JIRA) Thu, 16 Aug 2012 11:07:39 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436175#comment-13436175
 ]


Mark Miller commented on SOLR-3721:
-----------------------------------

bq.  Believe there is nothing solid to do!

Well, we can do some practical things right? I don't think we need to support a 
node coming back from the dead a year later and it had some updates the cluster 
doesn't have. A node coming up 2 minutes later is something we want to worry 
about though.

So basically we either need something timing based or admin command based that 
lets you start a cold shard (slice :)) and each node waits around for X amount 
of time or until command X is received, and then leader election begins. 
                
> Multiple concurrent recoveries of same shard?
> ---------------------------------------------
>
>                 Key: SOLR-3721
>                 URL: https://issues.apache.org/jira/browse/SOLR-3721
>             Project: Solr
>          Issue Type: Bug
>          Components: multicore, SolrCloud
>    Affects Versions: 4.0
>         Environment: Using our own Solr release based on Apache revision 
> 1355667 from 4.x branch. Our changes to the Solr version is our solutions to 
> TLT-3178 etc., and should have no effect on this issue.
>            Reporter: Per Steffensen
>              Labels: concurrency, multicore, recovery, solrcloud
>             Fix For: 4.0
>
>         Attachments: recovery_in_progress.png, recovery_start_finish.log
>
>
> We run a performance/endurance test on a 7 Solr instance SolrCloud setup and 
> eventually Solrs lose ZK connections and go into recovery. BTW the recovery 
> often does not ever succeed, but we are looking into that. While doing that I 
> noticed that, according to logs, multiple recoveries are in progress at the 
> same time for the same shard. That cannot be intended and I can certainly 
> imagine that it will cause some problems.
> It is just the logs that are wrong, did I make some mistake, or is this a 
> real bug?
> See attached grep from log, grepping only on "Finished recovery" and 
> "Starting recovery" logs.
> {code}
> grep -B 1 "Finished recovery\|Starting recovery" solr9.log solr8.log 
> solr7.log solr6.log solr5.log solr4.log solr3.log solr2.log solr1.log 
> solr0.log > recovery_start_finish.log
> {code}
> It can be hard to get an overview of the log, but I have generated a graph 
> showing (based alone on "Started recovery" and "Finished recovery" logs) how 
> many recoveries are in progress at any time for the different shards. See 
> attached recovery_in_progress.png. The graph is also a little hard to get an 
> overview of (due to the many shards) but it is clear that for several shards 
> there are multiple recoveries going on at the same time, and that several 
> recoveries never succeed.
> Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-3721) Multiple concurrent recoveries of same shard?

Reply via email to