Terms dict should block-encode terms
------------------------------------

                 Key: LUCENE-2872
                 URL: https://issues.apache.org/jira/browse/LUCENE-2872
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 4.0
         Attachments: LUCENE-2872.patch

With PrefixCodedTermsReader/Writer we now encode each term standalone,
ie its bytes, metadata, details for postings (frq/prox file pointers),
etc.

But, this is costly when something wants to visit many terms but pull
metadata for only few (eg respelling, certain MTQs).  This is
particularly costly for sep codec because it has more metadata to
store, per term.

So instead I think we should block-encode all terms between indexed
term, so that the metadata is stored "column stride" instead.  This
makes it faster to enum just terms.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to