gokaai commented on code in PR #12872:
URL: https://github.com/apache/lucene/pull/12872#discussion_r1419454517
##########
lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java:
##########
@@ -389,13 +389,39 @@ private static void parseSegmentInfos(
}
long totalDocs = 0;
+ SegmentInfo info;
for (int seg = 0; seg < numSegments; seg++) {
String segName = input.readString();
byte[] segmentID = new byte[StringHelper.ID_LENGTH];
input.readBytes(segmentID, 0, segmentID.length);
Codec codec = readCodec(input);
- SegmentInfo info =
- codec.segmentInfoFormat().read(directory, segName, segmentID,
IOContext.READ);
+ try {
+ info = codec.segmentInfoFormat().read(directory, segName, segmentID,
IOContext.READ);
+ } catch (FileNotFoundException | NoSuchFileException e) {
+ // https://github.com/apache/lucene/issues/7820 - If we throw a
CorruptIndex exception here,
+ // a missing segment info will cause parseSegmentInfos() to
short-circuit and stop reading
+ // the rest of the segments. So, when encountering a segment with a
missing
+ // segment info (.si) file, we initialize an empty segment info
(containing 0 docs)
+ // to keep track of the segment name, so that we can continue reading
other segments, and
+ // the segment with the missing .si file can be later validated or
exorcised by CheckIndex()
+ info =
Review Comment:
Thanks for reviewing @mikemccand ! What I find tricky here are two things -
- Throwing an exception in `CheckIndex` short-circuits the flow that makes
it possible for `-exorcise` to do it's thing
- Since we don't do a deeper level check on segments which aren't the
latest segment, the recently added unit test where we expect prior commits to
throw an exception if corrupt, does not pass
I'll make a new revision trying to address both of these things + your
comments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]