salvatorecampagna commented on issue #15761:
URL: https://github.com/apache/lucene/issues/15761#issuecomment-3951519625
After running the test multiple times, I see it fails intermittently with
this:
```
java.lang.AssertionError
at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4935)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4921)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4735)
at
org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6545)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:661)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:726)
```
The failing assertion is at
[`IndexWriter._mergeInit`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L4935):
```java
private synchronized void _mergeInit(MergePolicy.OneMerge merge) throws
IOException {
testPoint("startMergeInit");
assert merge.registerDone; // <-- fails here
...
}
```
This looks like a race condition: a thread interrupt causes the writer to
call `mergeFinish()` (which sets `registerDone = false`) before the merge
thread gets to call `_mergeInit()`. This seems to be a flaky test triggered by
concurrent threads.
**Note:** I had to change `SuppressingConcurrentMergeScheduler` locally to
see the AssertionError. The original exception is lost as `null` is passed to
`super.handleMergeException()`:
```java
protected void handleMergeException(Throwable exc) {
while (true) {
if (isOK(exc)) {
return;
}
exc = exc.getCause();
if (exc == null) {
super.handleMergeException(exc); // passes null and original
exception is lost
}
}
}
```
The CI output shows `MergeException` with no cause, which likely leads to
the confusion with the "Inconsistency of field data structures" message (that
message is actually expected output. See: `testDocValuesMixedSkippingIndex` and
not the cause of the merge failure).
At a minimum, `SuppressingConcurrentMergeScheduler` should propagate the
original exception instead of `null`. I will investigate if there is a way to
fix the underlying race condition as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]