Hi lucene dev,

For lucene merge logic, I see such code below 
(https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L5075):


merge.initMergeReaders(
    sci -> {
      final ReadersAndUpdates rld = getPooledInstance(sci, true);
      rld.setIsMerging();
      return rld.getReaderForMerge(context);
    });

It looks that if ReadersAndUpdates#addDVUpdate is invoked by another thread 
between rld.setIsMerging() and rld.getReaderForMerge(context), mergingDVUpdates 
in ReadersAndUpdates could end up with duplicated del gen for the same field. 
It happens as follows:

Merge thread:                                                                   
                    Another Thread:
1. rld.setIsMerging()
                                                                                
                               2. rld.addDVUpdate(update)
                                                                                
                                   it places the update both in 
pendingDVUpdates and mergingDVUpdates
3. rld.getReaderForMerge(context)
Carry over all pendingDVUpdates to mergingDVUpdates

Does it make more sense if we invoke red.setIsMerging() and 
old.getReaderForMerge(context) atomically ? By doing so, we can avoid the issue 
above.

Please correct me if I miss something.

Thanks,
Will

Reply via email to