[jira] [Commented] (SOLR-12999) Index replication could delete segments first

Jason Baik (JIRA) Fri, 28 Dec 2018 19:20:13 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-12999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730543#comment-16730543
 ]


Jason Baik commented on SOLR-12999:
-----------------------------------

I'm from the team--[~mbraun688] mentioned in an earlier comment--which is 
currently using a fork of the IndexFetcher that implements the change being 
proposed here (i.e. delete the unnecessary index files BEFORE copying the 
master's remote files to reduce pressure on disk space). 

I want to chime in to report an edge case that must be handled with care. We 
learned it the hard way after losing a few shards this week.

Basically, we encountered a situation that [~erickerickson] anticipated:
{quote}Say replication deletes segments, then _for any reason_, the sync fails 
to complete...
{quote}
 

In a nutshell, 
 * We had a replica that had to sync with the leader via full index replication.
 * Our fork of the IndexFetcher deleted all segments on the replica prior to 
initiating index copying.
 * Before the sync was complete, the entire Solr cluster was shut down (due to 
a human mistake).
 * When the cluster was brought back up, the replica with all segments deleted 
connected to Zookeeper first, and initiated 
org.apache.solr.cloud.ShardLeaderElectionContext#runLeaderProcess() for the 
shard.
 * To our surprise, this replica was elected as the leader b/c:
 ** org.apache.solr.update.PeerSync#sync() inspected the tail of the 
transaction log, and found that this replica had all the updates that the other 
replicas had in their transaction logs.
 ** This led to PeerSyncResult.succes = true, and ShardLeaderElectionContext 
saw it as good enough reason to elect this replica as the leader...
 * After this point, any replica that synced with this new leader also copied 
over the wiped index, and we started losing data in all replicas...

It's a rather extreme case that was caused by an unlucky sequence of events, 
where all replicas of the shard went down at once, then the replica with 
segments deleted initiated the leader process, but this does demonstrate a 
disastrous scenario that yet another failure in Solr cloud while a replica is 
in a wiped state, can potentially lead to a complete loss of a shard. The 
implementer of this change should consider putting in some safety measures that 
prevents the replica with the segments deleted can never be elected as the 
leader in the event of a failure that causes another round of leader election.

 

> Index replication could delete segments first
> ---------------------------------------------
>
>                 Key: SOLR-12999
>                 URL: https://issues.apache.org/jira/browse/SOLR-12999
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: replication (java)
>            Reporter: David Smiley
>            Priority: Major
>
> Index replication could optionally delete files that it knows will not be 
> needed _first_.  This would reduce disk capacity requirements of Solr, and it 
> would reduce some disk fragmentation when space get tight.
> Solr (IndexFetcher) already grabs the remote file list, and it could see 
> which files it has locally, then delete the others.  Today it asks Lucene to 
> {{deleteUnusedFiles}} at the end.  This new mode would probably only be 
> useful if there is no SolrIndexSearcher open, since it would prevent the 
> removal of files.
> The motivating scenario is a SolrCloud replica that is going into full 
> recovery.  It ought to not be fielding searches.  The code changes would not 
> depend on SolrCloud though.
> This option would have some danger the user should be aware of.  If the 
> replication fails, leaving the local files incomplete/corrupt, the only 
> recourse is to try full replication again.  You can't just give up and field 
> queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12999) Index replication could delete segments first

Reply via email to