Hi lucene dev,
For lucene merge logic, I see such code below
(https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L5075):
merge.initMergeReaders(
sci -> {
final ReadersAndUpdates rld = getPooledInstance(sci, true);
rld.setIsMerging();
return rld.getReaderForMerge(context);
});
It looks that if ReadersAndUpdates#addDVUpdate is invoked by another thread
between rld.setIsMerging() and rld.getReaderForMerge(context), mergingDVUpdates
in ReadersAndUpdates could end up with duplicated del gen for the same field.
It happens as follows:
Merge thread:
Another Thread:
1. rld.setIsMerging()
2. rld.addDVUpdate(update)
it places the update both in
pendingDVUpdates and mergingDVUpdates
3. rld.getReaderForMerge(context)
Carry over all pendingDVUpdates to mergingDVUpdates
Does it make more sense if we invoke red.setIsMerging() and
old.getReaderForMerge(context) atomically ? By doing so, we can avoid the issue
above.
Please correct me if I miss something.
Thanks,
Will