salvatorecampagna commented on issue #15761:
URL: https://github.com/apache/lucene/issues/15761#issuecomment-3951519625

   After running the test multiple times, I see it fails intermittently with 
this:
   
   ```
   java.lang.AssertionError
       at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4935)
       at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4921)
       at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4735)
       at 
org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6545)
       at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:661)
       at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:726)
   ```
   
   The failing assertion is at 
[`IndexWriter._mergeInit`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L4935):
   
   ```java
   private synchronized void _mergeInit(MergePolicy.OneMerge merge) throws 
IOException {
       testPoint("startMergeInit");
       assert merge.registerDone;  // <-- fails here
       ...
   }
   ```
   
   This looks like a race condition: a thread interrupt causes the writer to 
call `mergeFinish()` (which sets `registerDone = false`) before the merge 
thread gets to call `_mergeInit()`. This seems to be a flaky test triggered by 
concurrent threads.
   
   **Note:** I had to change `SuppressingConcurrentMergeScheduler` locally to 
see the AssertionError. The original exception is lost  as `null` is passed to 
`super.handleMergeException()`:
   
   ```java
   protected void handleMergeException(Throwable exc) {
       while (true) {
         if (isOK(exc)) {
           return;
         }
         exc = exc.getCause();
         if (exc == null) {
           super.handleMergeException(exc); // passes null and original 
exception is lost
         }
       }
   }
   ```
   
   The CI output shows `MergeException` with no cause, which likely leads to 
the confusion with the "Inconsistency of field data structures" message (that 
message is actually expected output. See: `testDocValuesMixedSkippingIndex` and 
not the cause of the merge failure).
   
   At a minimum, `SuppressingConcurrentMergeScheduler` should propagate the 
original exception instead of `null`. I will investigate if there is a way to 
fix the underlying race condition as well.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to