[ https://issues.apache.org/jira/browse/SOLR-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311244#comment-15311244 ]
Lanny Ripple commented on SOLR-7820: ------------------------------------ Experiencing this right now since as a startup pinching penny's isn't optional. We're about 70% allocated on disk with 60 or so shards over a dozen or two collections. If any couple of replicas throw a hissy it's not a big deal for Solr to recover. If a node goes down, or in one case the AWS instance starts being flaky, then we fill disk and get to spend a lot of time baby sitting the recovery. If Solr sequencing recovery to avoid blowing disk isn't a good idea then please at least expose tooling to make it easier for a human to do the same thing. Even a way to start Solr without immediately trying to sync would be a win. When Solr goes all-in to recover then the collections API times out on DELETEREPLICA. > IndexFetcher should calculate ahead of time how much space is needed for full > snapshot based recovery and cleanly abort instead of trying and running out > of space on a node > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-7820 > URL: https://issues.apache.org/jira/browse/SOLR-7820 > Project: Solr > Issue Type: Improvement > Components: replication (java) > Reporter: Timothy Potter > > When a replica is trying to recover and it's IndexFetcher decides it needs to > pull the full index from a peer (isFullCopyNeeded == true), then the existing > index directory should be deleted before the full copy is started to free up > disk to pull a fresh index, otherwise the server will potentially need 2x the > disk space (old + incoming new). Currently, the IndexFetcher removes the > index directory after the new is downloaded; however, once the fetcher > decides a full copy is needed, what is the value of the existing index? It's > clearly out-of-date and should not serve queries. Since we're deleting data > preemptively, maybe this should be an advanced configuration property, only > to be used by those that are disk-space constrained (which I'm seeing more > and more with people deploying high-end SSDs - they typically don't have 2x > the disk capacity required by an index). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org