[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency

Erick Erickson (JIRA) Wed, 14 Mar 2018 13:18:21 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399238#comment-16399238
 ]


Erick Erickson commented on SOLR-12088:
---------------------------------------

Jerry:

It Depends (tm). Most are reasonably short, 15 seconds to a couple of minutes. 
So if you're seeing this last much longer than that it's a red herring.

Solr itself should be able to clean up dead replicas, what version are you 
using? By that I mean you can re-issue DELETEREPLICA and it "should" work.

There's a bit of a rough patch if you have legacyCloud set. Prior to 7x this 
was the default, and nodes could reconstruct themselves in ZK, the key is 
whether your ZooKeeper tree has partial collections representations in 
/clusterstate.json, likely corresponding to these deleted replicas. If that's 
the case, you can 


> stop the Solr instance

> manually remove the dead replicas

> start Solr back up.

once all that's done for the dead replicas, 

> replace /clusterstate.json with a single pair of empty brackets {} but ONLY 
> if your /collections/whatever/state.json has the complete, accurate picture 
> of the collection in question. This caveat is _very_ important because if you 
> _don't_ have a valid state.json (i.e. you're in "state format 2") then you'll 
> lose your collections, so be _very_ cautious.

Now, all that said, if performance is still slow after many minutes, it's a bug 
we should fix. The cluster maintenance stuff is steadily improving BTW.

Erick

> Shards with dead replicas cause increased write latency
> -------------------------------------------------------
>
>                 Key: SOLR-12088
>                 URL: https://issues.apache.org/jira/browse/SOLR-12088
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 7.2
>            Reporter: Jerry Bao
>            Priority: Major
>
> If a collection's shard contains dead replicas, write latency to the 
> collection is increased. For example, if a collection has 10 shards with a 
> replication factor of 3, and one of those shards contains 3 replicas and 3 
> downed replicas, write latency is increased in comparison to a shard that 
> contains only 3 replicas.
> My feeling here is that downed replicas should be completely ignored and not 
> cause issues to other alive replicas in terms of write latency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency

Reply via email to