[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947902#comment-16947902 ]
Yonik Seeley commented on SOLR-13815: ------------------------------------- Actually, doc_38 and doc_40 don't look the same. When indexing on the new subShard for doc_38, we see update.distrib=FROMLEADER For doc_40, we see update.distrib=TOLEADER, so for doc_40, it was forwarded to the new leader. If we look at DistributedZkUpdateProcessor, it looks like a slice is only considered a sub-slice if it is in the CONSTRUCTION or Slice.State.RECOVERY state: {code} protected List<SolrCmdDistributor.Node> getSubShardLeaders(DocCollection coll, String shardId, String docId, SolrInputDocument doc) { Collection<Slice> allSlices = coll.getSlices(); List<SolrCmdDistributor.Node> nodes = null; for (Slice aslice : allSlices) { final Slice.State state = aslice.getState(); if (state == Slice.State.CONSTRUCTION || state == Slice.State.RECOVERY) { {code} This must introduce the race condition, where the state of the sub-slice was just changed to active (and hence it won't be returned by getSubShardLeaders), but the code to check/forward to the leader has already completed. I'm not sure what the implications are of removing the state checks. We either need to do that, or somehow close the hole that causes the race condition. > Live split can lose data > ------------------------ > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Yonik Seeley > Priority: Major > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 10m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org