[ https://issues.apache.org/jira/browse/SOLR-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410578#comment-16410578 ]
Lucene/Solr QA commented on SOLR-12087: --------------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} SOLR-12087 does not apply to master. Rebase required? Wrong Branch? See https://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-12087 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12915636/SOLR-12087.patch | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/12/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Deleting replicas sometimes fails and causes the replicas to exist in the > down state > ------------------------------------------------------------------------------------ > > Key: SOLR-12087 > URL: https://issues.apache.org/jira/browse/SOLR-12087 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: 7.2 > Reporter: Jerry Bao > Assignee: Cao Manh Dat > Priority: Critical > Attachments: SOLR-12087.patch, SOLR-12087.patch, SOLR-12087.patch, > SOLR-12087.test.patch, Screen Shot 2018-03-16 at 11.50.32 AM.png > > > Sometimes when deleting replicas, the replica fails to be removed from the > cluster state. This occurs especially when deleting replicas en mass; the > resulting cause is that the data is deleted but the replicas aren't removed > from the cluster state. Attempting to delete the downed replicas causes > failures because the core does not exist anymore. > This also occurs when trying to move replicas, since that move is an add and > delete. > Some more information regarding this issue; when the MOVEREPLICA command is > issued, the new replica is created successfully but the replica to be deleted > fails to be removed from state.json (the core is deleted though) and we see > two logs spammed. > # The node containing the leader replica continually (read every second) > attempts to initiate recovery on the replica and fails to do so because the > core does not exist. As a result it continually publishes a down state for > the replica to zookeeper. > # The deleted replica node spams that it cannot locate the core because it's > been deleted. > During this period of time, we see an increase in ZK network connectivity > overall, until the replica is finally deleted (spamming DELETEREPLICA on the > shard until its removed from the state) > My guess is there's two issues at hand here: > # The leader continually attempts to recover a downed replica that is > unrecoverable because the core does not exist. > # The replica to be deleted is having trouble being deleted from state.json > in ZK. > This is mostly consistent for my use case. I'm running 7.2.1 with 66 nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org