jpountz commented on code in PR #14133:
URL: https://github.com/apache/lucene/pull/14133#discussion_r1915083479
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java:
##########
@@ -572,7 +597,36 @@ public int freq() throws IOException {
}
private void refillFullBlock() throws IOException {
- forDeltaUtil.decodeAndPrefixSum(docInUtil, prevDocID, docBuffer);
+ int bitsPerValue = docIn.readByte();
+ if (bitsPerValue > 0) {
+ forDeltaUtil.decodeAndPrefixSum(bitsPerValue, docInUtil, prevDocID,
docBuffer);
+ encoding = DeltaEncoding.PACKED;
+ } else if (bitsPerValue == 0) {
+ // dense block: 128 one bits
+ docBitSet.set(0, BLOCK_SIZE);
+ docBitSetBase = prevDocID + 1;
+ docCumulativeWordPopCounts[0] = Long.SIZE;
+ docCumulativeWordPopCounts[1] = 2 * Long.SIZE;
+ encoding = DeltaEncoding.UNARY;
+ } else {
+ assert level0LastDocID != NO_MORE_DOCS;
+ // block is encoded as a bit set
+ docBitSetBase = prevDocID + 1;
+ int numLongs = -bitsPerValue;
+ docIn.readLongs(docBitSet.getBits(), 0, numLongs);
+ // Note: this for loop auto-vectorizes
+ for (int i = 0; i < numLongs - 1; ++i) {
+ docCumulativeWordPopCounts[i] =
Long.bitCount(docBitSet.getBits()[i]);
+ }
+ for (int i = 1; i < numLongs - 1; ++i) {
+ docCumulativeWordPopCounts[i] += docCumulativeWordPopCounts[i - 1];
+ }
+ docCumulativeWordPopCounts[numLongs - 1] = BLOCK_SIZE;
+ assert docCumulativeWordPopCounts[numLongs - 2]
Review Comment:
We only use the bit set encoding for "full" blocks. Tail blocks, which may
have less than 128 doc IDs to record, keep using the current encoding that
stores deltas using group-varint, they never use a bit set.
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java:
##########
@@ -572,7 +597,36 @@ public int freq() throws IOException {
}
private void refillFullBlock() throws IOException {
- forDeltaUtil.decodeAndPrefixSum(docInUtil, prevDocID, docBuffer);
+ int bitsPerValue = docIn.readByte();
+ if (bitsPerValue > 0) {
+ forDeltaUtil.decodeAndPrefixSum(bitsPerValue, docInUtil, prevDocID,
docBuffer);
+ encoding = DeltaEncoding.PACKED;
+ } else if (bitsPerValue == 0) {
+ // dense block: 128 one bits
+ docBitSet.set(0, BLOCK_SIZE);
+ docBitSetBase = prevDocID + 1;
+ docCumulativeWordPopCounts[0] = Long.SIZE;
+ docCumulativeWordPopCounts[1] = 2 * Long.SIZE;
+ encoding = DeltaEncoding.UNARY;
+ } else {
+ assert level0LastDocID != NO_MORE_DOCS;
+ // block is encoded as a bit set
+ docBitSetBase = prevDocID + 1;
+ int numLongs = -bitsPerValue;
+ docIn.readLongs(docBitSet.getBits(), 0, numLongs);
+ // Note: this for loop auto-vectorizes
+ for (int i = 0; i < numLongs - 1; ++i) {
+ docCumulativeWordPopCounts[i] =
Long.bitCount(docBitSet.getBits()[i]);
+ }
+ for (int i = 1; i < numLongs - 1; ++i) {
Review Comment:
Indeed. :) I added a comment to make it clearer.
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java:
##########
@@ -572,7 +597,36 @@ public int freq() throws IOException {
}
private void refillFullBlock() throws IOException {
- forDeltaUtil.decodeAndPrefixSum(docInUtil, prevDocID, docBuffer);
+ int bitsPerValue = docIn.readByte();
+ if (bitsPerValue > 0) {
+ forDeltaUtil.decodeAndPrefixSum(bitsPerValue, docInUtil, prevDocID,
docBuffer);
+ encoding = DeltaEncoding.PACKED;
+ } else if (bitsPerValue == 0) {
+ // dense block: 128 one bits
Review Comment:
I'm not sure what is confusing, `docBitSet.set(0, BLOCK_SIZE)` sets
BLOCK_SIZE bits to `true`? I refactored a bit, hopefully it is clearer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]