jpountz commented on code in PR #12685:
URL: https://github.com/apache/lucene/pull/12685#discussion_r1361188738
##########
lucene/core/src/java/org/apache/lucene/index/SegmentInfo.java:
##########
@@ -153,6 +157,16 @@ public boolean getUseCompoundFile() {
return isCompoundFile;
}
+ /** Returns true if this segment contains documents written as blocks. */
Review Comment:
Add a link to `addDocuments` and `updateDocuments`? I wonder if this should
be a bit more specific, e.g. "as blocks of 2 docs or more" to clarify that
calling `addDocuments` with a single document doesn't count.
##########
lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java:
##########
@@ -1815,4 +1815,71 @@ public void testAddIndicesWithSoftDeletes() throws
IOException {
assertEquals(wrappedReader.numDocs(), writer.getDocStats().maxDoc);
IOUtils.close(reader, writer, dir3, dir2, dir1);
}
+
+ public void testAddIndicesWithBlocks() throws IOException {
+ boolean addHasBlocks = random().nextBoolean();
+ boolean baseHasBlocks = rarely();
Review Comment:
All these cases look worth testing every time intead of randomly picking a
single combination?
##########
lucene/core/src/java/org/apache/lucene/index/SegmentInfo.java:
##########
@@ -153,6 +157,16 @@ public boolean getUseCompoundFile() {
return isCompoundFile;
}
+ /** Returns true if this segment contains documents written as blocks. */
Review Comment:
Maybe also mention that this started being recorded in 9.9 and that indexes
created earlier than that will return `false` regardless?
##########
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##########
@@ -3368,9 +3368,15 @@ public void addIndexesReaderMerge(MergePolicy.OneMerge
merge) throws IOException
String mergedName = newSegmentName();
Directory mergeDirectory = mergeScheduler.wrapForMerge(merge, directory);
int numSoftDeleted = 0;
+ boolean hasBlocks = false;
for (MergePolicy.MergeReader reader : merge.getMergeReader()) {
CodecReader leaf = reader.codecReader;
numDocs += leaf.numDocs();
+ if (reader.reader == null) {
+ hasBlocks = true; // NOCOMMIT: can we just assume that it has blocks
and go with worst case here?
Review Comment:
Maybe we could we expose getHasBlocks in LeafMetaData to be able to get this
information from a CodecReader?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]