mikemccand commented on code in PR #12530:
URL: https://github.com/apache/lucene/pull/12530#discussion_r1388366133
##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -610,6 +610,31 @@ public Status checkIndex(List<String> onlySegments,
ExecutorService executorServ
return result;
}
+ // https://github.com/apache/lucene/issues/7820: also attempt to open any
older commit points (segments_N), which will catch certain
+ // corruption like missing _N.si files for segments not also referenced by
the newest commit point (which was already loaded,
Review Comment:
> @mikemccand Just to clarify this comment - I was using @buzztaiki 's
[original test
case](https://github.com/apache/lucene/issues/7009#issuecomment-1223544484)
with slight modifications to test this:
Hmm I'm confused -- doesn't `parseSegmentInfos` read a single `segments_N`
file? It goes through that segments file and reads each separate
`SegmentInfo`, but not the other `segments_(N-1)` files in the index?
I thought the issue here was `segments_N` (and all the separate segments /
`.si` files it references) is intact, but, `segments_(N-1)` is broken because
it references a segment where its `.si` file is missing?
> I would like to try to make missing .si files behave the same way as
having missing .cfs do currently and make it possible to use -exorcise for this
case
Maybe we could fix the exception thrown when a `.si` cannot be found to a
subclass of `CorruptIndexException` and add a member
e.g.`set/getAffectedSegment` that would tell us which segment the `.si`
belonged to? And `CheckIndex` could catch that and do its `excorcise` thing?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]