[ https://issues.apache.org/jira/browse/SOLR-12999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730543#comment-16730543 ]
Jason Baik commented on SOLR-12999: ----------------------------------- I'm from the team--[~mbraun688] mentioned in an earlier comment--which is currently using a fork of the IndexFetcher that implements the change being proposed here (i.e. delete the unnecessary index files BEFORE copying the master's remote files to reduce pressure on disk space). I want to chime in to report an edge case that must be handled with care. We learned it the hard way after losing a few shards this week. Basically, we encountered a situation that [~erickerickson] anticipated: {quote}Say replication deletes segments, then _for any reason_, the sync fails to complete... {quote} In a nutshell, * We had a replica that had to sync with the leader via full index replication. * Our fork of the IndexFetcher deleted all segments on the replica prior to initiating index copying. * Before the sync was complete, the entire Solr cluster was shut down (due to a human mistake). * When the cluster was brought back up, the replica with all segments deleted connected to Zookeeper first, and initiated org.apache.solr.cloud.ShardLeaderElectionContext#runLeaderProcess() for the shard. * To our surprise, this replica was elected as the leader b/c: ** org.apache.solr.update.PeerSync#sync() inspected the tail of the transaction log, and found that this replica had all the updates that the other replicas had in their transaction logs. ** This led to PeerSyncResult.succes = true, and ShardLeaderElectionContext saw it as good enough reason to elect this replica as the leader... * After this point, any replica that synced with this new leader also copied over the wiped index, and we started losing data in all replicas... It's a rather extreme case that was caused by an unlucky sequence of events, where all replicas of the shard went down at once, then the replica with segments deleted initiated the leader process, but this does demonstrate a disastrous scenario that yet another failure in Solr cloud while a replica is in a wiped state, can potentially lead to a complete loss of a shard. The implementer of this change should consider putting in some safety measures that prevents the replica with the segments deleted can never be elected as the leader in the event of a failure that causes another round of leader election. > Index replication could delete segments first > --------------------------------------------- > > Key: SOLR-12999 > URL: https://issues.apache.org/jira/browse/SOLR-12999 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) > Reporter: David Smiley > Priority: Major > > Index replication could optionally delete files that it knows will not be > needed _first_. This would reduce disk capacity requirements of Solr, and it > would reduce some disk fragmentation when space get tight. > Solr (IndexFetcher) already grabs the remote file list, and it could see > which files it has locally, then delete the others. Today it asks Lucene to > {{deleteUnusedFiles}} at the end. This new mode would probably only be > useful if there is no SolrIndexSearcher open, since it would prevent the > removal of files. > The motivating scenario is a SolrCloud replica that is going into full > recovery. It ought to not be fielding searches. The code changes would not > depend on SolrCloud though. > This option would have some danger the user should be aware of. If the > replication fails, leaving the local files incomplete/corrupt, the only > recourse is to try full replication again. You can't just give up and field > queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org