Michael McCandless created LUCENE-4227:
------------------------------------------

             Summary: DirectPostingsFormat, storing postings as simple int[] in 
memory, if you have tons of RAM
                 Key: LUCENE-4227
                 URL: https://issues.apache.org/jira/browse/LUCENE-4227
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless


This postings format just wraps Lucene40 (on disk) but then at search
time it loads (up front) all terms postings into RAM.

You'd use this if you have insane amounts of RAM and want the fastest
possible search performance.  The postings are not compressed: docIds,
positions are stored as straight int[]s.

The terms are stored as a skip list (array of byte[]), but I packed
all terms together into a single long byte[]: I had started as actual
separate byte[] per term but the added pointer deref and loss of
locality was a lot (~2X) slower for terms-dict intensive queries like
FuzzyQuery.

Low frequency postings (docFreq <= 32 by default) store all docs, pos
and offsets into a single int[].  High frequency postings store docs
as int[], freqs as int[], and positions as int[][] parallel arrays.
For skipping I just do a growing binary search.

I also made specialized DirectTermScorer and DirectExactPhraseScorer
for the high freq case that just pull the int[] and iterate
themselves.

All tests pass.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to