scanning whole index stored fields while using best compression mode

Anton Zenkov Wed, 03 Jun 2015 13:01:47 -0700

Hello,

I ran into a problem while trying to make a utility which loads all
documents in the index one by one. Loading was super slow. Orders of
magnitude slower then it is supposed to be. After some debugging and
looking at the code I figured that the culprit was the index compression
which I set to BEST_COMPRESSION since index size is critical for me. Every
time lucene loads a document it decompresses the whole 64K block which this
document is in my documents are very small and I have ~ 10^9 documents per
index  which makes scanning the index practically impossible.
There was an addition recently made to the CompressingStoredFieldsReader to
eagerly decompress all docs in the block and cache results but the is only
available during merging.
I was able to get access to the merging reader by doing this:


DirectoryReader reader = DirectoryReader.open(directory);

for (LeafReaderContext leafReaderContext : reader.leaves()) {
    StoredFieldsReader fieldsReader =
SlowCodecReaderWrapper.wrap(leafReaderContext.reader()).getFieldsReader().getMergeInstance();

for (int i = 0; i < leafReader.maxDoc(); i++) {
    DocumentStoredFieldVisitor visitor = new DocumentStoredFieldVisitor();
    fieldsReader.visitDocument(i, visitor);
    visitor.getDocument();

}

}

I was wondering if there is better way of doing this and if there are plans
to make access to the faster document loading through some API. Should I
try to come up with a patch for this?

Thanks!
Anton

scanning whole index stored fields while using best compression mode

Reply via email to