[ https://issues.apache.org/jira/browse/SOLR-12999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730725#comment-16730725 ]
Jason Baik commented on SOLR-12999: ----------------------------------- [~noble.paul] {quote}If a full replication is required {quote} Can you please clarify what you mean by "full replication"? Is it literally the case that the 100% of the leader's index needs to be copied ([https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L520),] or it is going to be some configurable, heuristic percentage that's deemed "close to full"? If it's the former, the usefulness of this change is reduced because a tiny similarity between the segments of the leader and the replica will cause Solr not consider it as a "full replication". For example, we often found ourselves in situations where: * The leader's index size is 100GB * The recovering replica has 90GB of disk left. * The leader and the replica segments have a 1% (1GB) similarity * For this small similarity, Solr does not consider this as a full replication, and the "delete local index and free up disk space" doesn't trigger. * The replica ends up copying 99% (99GB) of the leader's index and goes out of disk space. We got around this by conditioning the delete of the local index on a heuristic such that if an N% of the leader's index needs to be copied, trigger the logic. I hope that's the direction you guys are thinking. > Index replication could delete segments first > --------------------------------------------- > > Key: SOLR-12999 > URL: https://issues.apache.org/jira/browse/SOLR-12999 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) > Reporter: David Smiley > Priority: Major > > Index replication could optionally delete files that it knows will not be > needed _first_. This would reduce disk capacity requirements of Solr, and it > would reduce some disk fragmentation when space get tight. > Solr (IndexFetcher) already grabs the remote file list, and it could see > which files it has locally, then delete the others. Today it asks Lucene to > {{deleteUnusedFiles}} at the end. This new mode would probably only be > useful if there is no SolrIndexSearcher open, since it would prevent the > removal of files. > The motivating scenario is a SolrCloud replica that is going into full > recovery. It ought to not be fielding searches. The code changes would not > depend on SolrCloud though. > This option would have some danger the user should be aware of. If the > replication fails, leaving the local files incomplete/corrupt, the only > recourse is to try full replication again. You can't just give up and field > queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org