Michael McCandless created LUCENE-4227: ------------------------------------------
Summary: DirectPostingsFormat, storing postings as simple int[] in memory, if you have tons of RAM Key: LUCENE-4227 URL: https://issues.apache.org/jira/browse/LUCENE-4227 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless This postings format just wraps Lucene40 (on disk) but then at search time it loads (up front) all terms postings into RAM. You'd use this if you have insane amounts of RAM and want the fastest possible search performance. The postings are not compressed: docIds, positions are stored as straight int[]s. The terms are stored as a skip list (array of byte[]), but I packed all terms together into a single long byte[]: I had started as actual separate byte[] per term but the added pointer deref and loss of locality was a lot (~2X) slower for terms-dict intensive queries like FuzzyQuery. Low frequency postings (docFreq <= 32 by default) store all docs, pos and offsets into a single int[]. High frequency postings store docs as int[], freqs as int[], and positions as int[][] parallel arrays. For skipping I just do a growing binary search. I also made specialized DirectTermScorer and DirectExactPhraseScorer for the high freq case that just pull the int[] and iterate themselves. All tests pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org