[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1601: LUCENE-8962: Ensure we don't include fully deleted segments in a commit

GitBox Wed, 24 Jun 2020 08:45:03 -0700


mikemccand commented on a change in pull request #1601:
URL: https://github.com/apache/lucene-solr/pull/1601#discussion_r444984980




##########
File path: 
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java
##########
@@ -396,10 +396,12 @@ protected void merge(MergePolicy.OneMerge merge) throws 
IOException {
             waitForMerge.await();
             writer.updateDocument(new Term("id", "2"), d2);

Review comment:
       This test fully deleted the 2nd segment (since it only had `d2`, which 
we are deleting here).  Maybe also add a similar case that leaves one not 
deleted document in the segment too?  I think we optimize 100% deleted merge 
source segments too, so this should tickle slightly different path.

##########
File path: 
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMergePolicy.java
##########
@@ -396,10 +396,12 @@ protected void merge(MergePolicy.OneMerge merge) throws 
IOException {
             waitForMerge.await();
             writer.updateDocument(new Term("id", "2"), d2);
             writer.flush();
-            waitForUpdate.countDown();
           } catch (Exception e) {
             throw new AssertionError(e);
+          } finally {
+            waitForUpdate.countDown();

Review comment:
       Ahh is this so the test does not hang on an (unexpected) exception in 
the `try` block?

##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -3265,24 +3269,40 @@ public void mergeFinished(boolean committed, boolean 
segmentDropped) throws IOEx
                           }
                           // Construct a OneMerge that applies to toCommit
                           MergePolicy.OneMerge applicableMerge = new 
MergePolicy.OneMerge(toCommitMergedAwaySegments);
-                          applicableMerge.info = info.clone();
-                          long segmentCounter = 
Long.parseLong(info.info.name.substring(1), Character.MAX_RADIX);
+                          applicableMerge.info = origInfo;
+                          long segmentCounter = 
Long.parseLong(origInfo.info.name.substring(1), Character.MAX_RADIX);
                           committingSegmentInfos.counter = 
Math.max(committingSegmentInfos.counter, segmentCounter + 1);
                           
committingSegmentInfos.applyMergeChanges(applicableMerge, false);
                         }
                         toWrap.mergeFinished(committed, false);
                         super.mergeFinished(committed, segmentDropped);
                       }
 
+                      @Override
+                      void onMergeCommit() {
+                        origInfo = this.info.clone();

Review comment:
       Whoa!  This is the functional part of the fix?  Because we snapshot the 
`SegmentCommitInfo` *before* carrying over deletes that happened while the 
merge was running?  Very cool.
   
   But, how do we ensure that the prior live docs (without concurrent deletions 
carried over) are written for this merged segment, and matching `origInfo`'s 
`delGen`?  Normally `commitMergedDeletesAndUpdates` would do that, except it is 
carrying over the concurrent deletions, which we need to avoid.  Oh, wait, 
sorry: such deletions are of course merged away!  So, we want 100% live docs in 
the newly merged segment.  Tricky.
   
   We probably also need a test case confirm that (concurrent) DV updates are 
also transactional?  I think this same approach should work for DV updates too? 
 We will keep only the DV updates that got merged away, and any new ones that 
we carry over will not be visible in the merged segment because you are cloning 
the `SegmentCommitInfo` before `commitMerge` runs?

##########
File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
##########
@@ -3907,6 +3927,7 @@ private static void 
carryOverHardDeletes(ReadersAndUpdates mergedReadersAndUpdat
   @SuppressWarnings("try")
   private synchronized boolean commitMerge(MergePolicy.OneMerge merge, 
MergeState mergeState) throws IOException {

Review comment:
       We should really rename this method to not use the word `commit` since 
that already means something (different!) in `IndexWriter`.  Maybe 
`finishMerge`, or `applyMergedSegments` or something.  Naming is the hardest 
part!  We don't have to do that in this PR...

##########
File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
##########
@@ -403,6 +409,22 @@ boolean isDone() {
     Optional<Boolean> hasCompletedSuccessfully() {
       return Optional.ofNullable(mergeCompleted.getNow(null));
     }
+
+    void onMergeCommit() {
+    }
+
+    void setMergeReaders(IOContext mergeContext, ReaderPool readerPool) throws 
IOException {
+      for (final SegmentCommitInfo info : segments) {
+        // Hold onto the "live" reader; we will use this to
+        // commit merged deletes
+        final ReadersAndUpdates rld = readerPool.get(info, true);

Review comment:
       This used to be `getPooledInstance`, but I think this is functionally 
identical.

##########
File path: lucene/core/src/java/org/apache/lucene/index/MergePolicy.java
##########
@@ -403,6 +409,22 @@ boolean isDone() {
     Optional<Boolean> hasCompletedSuccessfully() {
       return Optional.ofNullable(mergeCompleted.getNow(null));
     }
+
+    void onMergeCommit() {
+    }
+
+    void setMergeReaders(IOContext mergeContext, ReaderPool readerPool) throws 
IOException {
+      for (final SegmentCommitInfo info : segments) {

Review comment:
       Maybe `assert` that `readers` is empty at the start here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1601: LUCENE-8962: Ensure we don't include fully deleted segments in a commit

Reply via email to