Terms dict should block-encode terms ------------------------------------ Key: LUCENE-2872 URL: https://issues.apache.org/jira/browse/LUCENE-2872 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2872.patch
With PrefixCodedTermsReader/Writer we now encode each term standalone, ie its bytes, metadata, details for postings (frq/prox file pointers), etc. But, this is costly when something wants to visit many terms but pull metadata for only few (eg respelling, certain MTQs). This is particularly costly for sep codec because it has more metadata to store, per term. So instead I think we should block-encode all terms between indexed term, so that the metadata is stored "column stride" instead. This makes it faster to enum just terms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org