[GitHub] lucene-solr pull request #525: LUCENE-8585: Index-time jump-tables for DocVa...

jpountz Fri, 21 Dec 2018 09:50:58 -0800

Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/525#discussion_r243646386
  
    --- Diff: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/IndexedDISI.java ---
    @@ -0,0 +1,542 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.lucene.codecs.lucene80;
    +
    +import java.io.DataInput;
    +import java.io.IOException;
    +
    +import org.apache.lucene.search.DocIdSetIterator;
    +import org.apache.lucene.store.IndexInput;
    +import org.apache.lucene.store.IndexOutput;
    +import org.apache.lucene.store.RandomAccessInput;
    +import org.apache.lucene.util.ArrayUtil;
    +import org.apache.lucene.util.BitSetIterator;
    +import org.apache.lucene.util.FixedBitSet;
    +import org.apache.lucene.util.RoaringDocIdSet;
    +
    +/**
    + * Disk-based implementation of a {@link DocIdSetIterator} which can return
    + * the index of the current document, i.e. the ordinal of the current 
document
    + * among the list of documents that this iterator can return. This is 
useful
    + * to implement sparse doc values by only having to encode values for 
documents
    + * that actually have a value.
    + * <p>Implementation-wise, this {@link DocIdSetIterator} is inspired of
    + * {@link RoaringDocIdSet roaring bitmaps} and encodes ranges of {@code 
65536}
    + * documents independently and picks between 3 encodings depending on the
    + * density of the range:<ul>
    + *   <li>{@code ALL} if the range contains 65536 documents exactly,
    + *   <li>{@code DENSE} if the range contains 4096 documents or more; in 
that
    + *       case documents are stored in a bit set,
    + *   <li>{@code SPARSE} otherwise, and the lower 16 bits of the doc IDs are
    + *       stored in a {@link DataInput#readShort() short}.
    + * </ul>
    + * <p>Only ranges that contain at least one value are encoded.
    + * <p>This implementation uses 6 bytes per document in the worst-case, 
which happens
    + * in the case that all ranges contain exactly one document.
    + *
    + * 
    + * To avoid O(n) lookup time complexity, with n being the number of 
documents, two lookup
    + * tables are used: A lookup table for block blockCache and index, and a 
rank structure
    + * for DENSE block lookups.
    + *
    + * The lookup table is an array of {@code long}s with an entry for each 
block. It allows for
    + * direct jumping to the block, as opposed to iteration from the current 
position and forward
    + * one block at a time.
    + *
    + * Each long entry consists of 2 logical parts:
    + *
    + * The first 31 bits hold the index (number of set bits in the blocks) up 
to just before the
    + * wanted block. The next 33 bits holds the offset in bytes into the 
underlying slice.
    + * As there is a maximum of 2^16 blocks, it follows that the maximum size 
of any block must
    + * not exceed 2^17 bits to avoid overflow. This is currently the case, 
with the largest
    + * block being DENSE and using 2^16 + 288 bits, and is likely to continue 
to hold as using
    + * more than double the amount of bits is unlikely to be an efficient 
representation.
    + * The cache overhead is numDocs/1024 bytes.
    + *
    + * Note: There are 4 types of blocks: ALL, DENSE, SPARSE and non-existing 
(0 set bits).
    + * In the case of non-existing blocks, the entry in the lookup table has 
index equal to the
    + * previous entry and offset equal to the next non-empty block.
    + *
    + * The block lookup table is stored at the end of the total block 
structure.
    + *
    + *
    + * The rank structure for DENSE blocks is an array of unsigned {@code 
short}s with an entry
    + * for each sub-block of 512 bits out of the 65536 bits in the outer DENSE 
block.
    + *
    + * Each rank-entry states the number of set bits within the block up to 
the bit before the
    + * bit positioned at the start of the sub-block.
    + * Note that that the rank entry of the first sub-block is always 0 and 
that the last entry can
    + * at most be 65536-512 = 65024 and thus will always fit into an unsigned 
short.
    + *
    + * The rank structure for a given DENSE block is stored at the beginning 
of the DENSE block.
    + * This ensures locality and keeps logistics simple.
    + *
    + * @lucene.internal
    + */
    +final class IndexedDISI extends DocIdSetIterator {
    +
    +  // jump-table time/space trade-offs to consider:
    +  // The block offsets and the block indexes could be stored in more 
compressed form with
    +  // two PackedInts or two MonotonicDirectReaders.
    +  // The DENSE ranks (128 shorts = 256 bytes) could likewise be 
compressed. As there is at least
    +  // 4096 set bits in DENSE blocks, there will be at least one rank with 
2^12 bits, so it is
    +  // doubtful if there is much to gain here.
    +  
    +  private static final int BLOCK_SIZE = 65536;   // The number of docIDs 
that a single block represents
    +  static final int BLOCK_BITS = 16;
    +  private static final long BLOCK_INDEX_SHIFT = 33; // Number of bits to 
shift a lookup entry to get the index
    +  private static final long BLOCK_INDEX_MASK = ~0L << BLOCK_INDEX_SHIFT; 
// The index bits in a lookup entry
    +  private static final long BLOCK_LOOKUP_MASK = ~BLOCK_INDEX_MASK; // The 
offset bits in a lookup entry
    +
    +  private static final int DENSE_BLOCK_LONGS = BLOCK_SIZE/Long.SIZE; // 
1024
    +  private static final int RANK_BLOCK_SIZE = 512; // The number of 
docIDs/bits in each rank-sub-block within a DENSE block
    +  private static final int RANK_BLOCK_LONGS = RANK_BLOCK_SIZE/Long.SIZE; 
// The number of longs making up a rank-block (8)
    +  private static final int RANK_BLOCK_BITS = 9;
    +  private static final int RANKS_PER_BLOCK = BLOCK_SIZE / RANK_BLOCK_SIZE; 
// 128
    +
    +  static final int MAX_ARRAY_LENGTH = (1 << 12) - 1;
    +
    +  private static void flush(int block, FixedBitSet buffer, int 
cardinality, IndexOutput out) throws IOException {
    +    assert block >= 0 && block < 65536;
    +    out.writeShort((short) block);
    +    assert cardinality > 0 && cardinality <= 65536;
    +    out.writeShort((short) (cardinality - 1));
    +    if (cardinality > MAX_ARRAY_LENGTH) {
    +      if (cardinality != 65536) { // all docs are set
    +        final byte[] rank = createRank(buffer);
    +        out.writeBytes(rank, rank.length);
    +        for (long word : buffer.getBits()) {
    +          out.writeLong(word);
    +        }
    +      }
    +    } else {
    +      BitSetIterator it = new BitSetIterator(buffer, cardinality);
    +      for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; 
doc = it.nextDoc()) {
    +        out.writeShort((short) doc);
    +      }
    +    }
    +  }
    +
    +  // Creates a DENSE rank-entry (the number of set bits up to a given 
point) for the buffer.
    +  // One rank-entry for every 512 bits/8 longs for a total of 128 * 16 
bits.
    +  // Represented as a byte[] for fast flushing and mirroring of the 
retrieval representation.
    +  private static byte[] createRank(FixedBitSet buffer) {
    +    final byte[] rank = new byte[RANKS_PER_BLOCK*2];
    +    final long[] bits = buffer.getBits();
    +    int bitCount = 0;
    +    for (int word = 0 ; word < DENSE_BLOCK_LONGS ; word++) {
    +      if ((word & 0x07) == 0) { // Every 8 longs
    +        rank[word >> 2] = (byte)(bitCount>>8);
    +        rank[(word >> 2)+1] = (byte)(bitCount & 0xFF);
    +      }
    +      bitCount += Long.bitCount(bits[word]);
    +    }
    +    return rank;
    +  }
    +
    +  /**
    +   * Writes the docIDs from it to out, in logical blocks, one for each 
65536 docIDs in monotonically
    +   * increasing gap-less order.
    +   * @param it  the document IDs.
    +   * @param out destination for the blocks.
    +   * @throws IOException if there was an error writing to out.
    +   * @return the number of jump-table entries following the blocks, -1 for 
no entries. This should be stored in meta.
    +   */
    +  static short writeBitSet(DocIdSetIterator it, IndexOutput out) throws 
IOException {
    +    final long origo = out.getFilePointer(); // All jumps are relative to 
the origo
    +    int totalCardinality = 0;
    +    int blockCardinality = 0;
    +    final FixedBitSet buffer = new FixedBitSet(1<<16);
    +    long[] jumps = new long[ArrayUtil.oversize(1, Long.BYTES)];
    +    jumps[0] = out.getFilePointer()-origo; // First block starts at index 0
    +    int prevBlock = -1;
    +    int jumpBlockIndex = 0;
    +
    +    for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc 
= it.nextDoc()) {
    +      final int block = doc >>> 16;
    +      if (prevBlock != -1 && block != prevBlock) {
    +        // Track offset+index from previous block up to current
    +        jumps = addJumps(jumps, out.getFilePointer()-origo, 
totalCardinality, jumpBlockIndex, prevBlock+1);
    +        jumpBlockIndex = prevBlock+1;
    +        // Flush block
    +        flush(prevBlock, buffer, blockCardinality, out);
    +        // Reset for next block
    +        buffer.clear(0, buffer.length());
    +        totalCardinality += blockCardinality;
    +        blockCardinality = 0;
    +      }
    +      buffer.set(doc & 0xFFFF);
    +      blockCardinality++;
    +      prevBlock = block;
    +    }
    +    if (blockCardinality > 0) {
    +      jumps = addJumps(jumps, out.getFilePointer()-origo, 
totalCardinality, jumpBlockIndex, prevBlock+1);
    +      flush(prevBlock, buffer, blockCardinality, out);
    +      buffer.clear(0, buffer.length());
    +      prevBlock++;
    +    }
    +    final int lastBlock = prevBlock == -1 ? 0 : prevBlock;
    +    // NO_MORE_DOCS is stored explicitly
    +    buffer.set(DocIdSetIterator.NO_MORE_DOCS & 0xFFFF);
    +    flush(DocIdSetIterator.NO_MORE_DOCS >>> 16, buffer, 1, out);
    +    // offset+index jump-table stored at the end
    +    return flushBlockJumps(jumps, lastBlock, out, origo);
    +  }
    +
    +  // Adds entries to the offset & index jump-table for blocks
    +  private static long[] addJumps(long[] jumps, long offset, long index, 
int startBlock, int endBlock) {
    +    jumps = ArrayUtil.grow(jumps, endBlock +1);
    +    final long jump = (index << BLOCK_INDEX_SHIFT) | offset;
    +    for (int b = startBlock; b < endBlock; b++) {
    +      jumps[b] = jump;
    +    }
    +    return jumps;
    +  }
    +
    +  // Flushes the offet & index jump-table for blocks. This should be the 
last data written to out
    +  // This method returns the blockCount for the blocks reachable for the 
jump_table or -1 for no jump-table
    +  private static short flushBlockJumps(long[] jumps, int blockCount, 
IndexOutput out, long origo) throws IOException {
    +    if (blockCount == 1) { // A single jump is just wasted space so we 
ignore that
    +      blockCount = 0;
    +    }
    +    for (int i = 0 ; i < blockCount ; i++) {
    +      out.writeLong(jumps[i]);
    +    }
    +    // As there are at most 32k blocks, the count is a short
    +    // The jumpTableOffset will be at lastPos - (blockCount * Long.BYTES)
    +    return (short)blockCount;
    +  }
    +
    +  /** The slice that stores the {@link DocIdSetIterator}. */
    +  private final IndexInput slice;
    +  private final int jumpTableEntryCount;
    +  private final RandomAccessInput jumpTable; // Skip blocks of 64K bits
    +  private final byte[] rankTable = new byte[256]; // rank-table (logically 
128 * 16 bit) for DENSE blocks
    +  private final long cost;
    +
    +  /**
    +   * @param in backing data.
    +   * @param offset starting offset for blocks in backing data.
    +   * @param length the number of bytes in the backing data.
    +   * @param jumpTableEntryCount the number of blocks convered by the 
jump-table.
    +   * @param cost normally the number of logical docIDs.
    +   */
    +  IndexedDISI(IndexInput in, long offset, long length, int 
jumpTableEntryCount, long cost) throws IOException {
    +    this(in.slice("docs", offset, length), jumpTableEntryCount, cost);
    +  }
    +
    +  /**
    +   * This constructor allows to pass the slice directly in case it helps 
reuse.
    +   * see eg. Lucene80 norms producer's merge instance.
    +   * @param slice backing data, starting from position 0 to slice.length
    +   * @param jumpTableEntryCount the number of blocks convered by the 
jump-table.
    +   * @param cost normally the number of logical docIDs.
    +   */
    +  IndexedDISI(IndexInput slice, int jumpTableEntryCount, long cost) throws 
IOException {
    +    this.slice = slice;
    +    this.cost = cost;
    +
    +    this.jumpTableEntryCount = jumpTableEntryCount;
    +    //slice.seek(slice.length()-Short.BYTES);
    +    if (jumpTableEntryCount <= 0) {
    +      jumpTable = null;
    +    } else {
    +      long jumpTableOffset = slice.length()-jumpTableEntryCount*Long.BYTES;
    +      jumpTable = slice.randomAccessSlice(jumpTableOffset, 
slice.length()-jumpTableOffset);
    +    }
    --- End diff --
    
    you will need to support reuse for jumpTable as well. Doc values are 
special because they get pulled when writing postings in order to index 
impacts, and we don't want to create a new slice for every indexed term. I 
believe the test that tests slice creation only doesn't fail because we don't 
create jump tables if there aren't many docs.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] lucene-solr pull request #525: LUCENE-8585: Index-time jump-tables for DocVa...

Reply via email to