[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431485#comment-13431485 ]
Michael McCandless commented on LUCENE-3892: -------------------------------------------- I also see (smaller) gains with BlockPacked vs Block (this is 10M doc index): {noformat} Task QPS base StdDev base QPS packedStdDev packed Pct diff AndHighMed 69.19 0.53 66.43 0.63 -5% - -2% Fuzzy2 63.71 1.24 62.25 1.58 -6% - 2% Respell 62.69 1.41 61.53 1.47 -6% - 2% IntNRQ 11.86 0.43 11.73 0.03 -4% - 2% Fuzzy1 75.48 1.21 75.05 1.52 -4% - 3% Wildcard 53.23 0.63 52.96 0.25 -2% - 1% MedSpanNear 4.88 0.16 4.88 0.11 -5% - 5% PKLookup 191.48 2.84 191.62 3.98 -3% - 3% HighTerm 35.71 0.63 35.91 0.06 -1% - 2% Prefix3 83.14 1.34 83.83 0.49 -1% - 3% LowTerm 513.35 0.77 517.92 1.50 0% - 1% HighSpanNear 1.70 0.06 1.71 0.03 -4% - 6% AndHighHigh 23.45 0.09 23.69 0.10 0% - 1% OrHighLow 27.27 1.06 27.59 0.15 -3% - 5% OrHighMed 23.61 0.92 23.89 0.17 -3% - 6% OrHighHigh 11.42 0.44 11.59 0.12 -3% - 6% MedSloppyPhrase 6.84 0.17 6.95 0.23 -4% - 7% LowPhrase 22.02 0.39 22.43 0.15 0% - 4% MedTerm 196.76 3.01 200.62 0.33 0% - 3% LowSpanNear 9.60 0.24 9.82 0.31 -3% - 8% MedPhrase 13.08 0.30 13.41 0.12 0% - 5% LowSloppyPhrase 7.55 0.21 7.77 0.27 -3% - 9% AndHighLow 649.84 18.26 669.08 6.63 0% - 6% HighSloppyPhrase 1.98 0.08 2.04 0.09 -4% - 12% HighPhrase 1.76 0.11 1.96 0.10 0% - 24% {noformat} The index is 4669 MB with Block and 4790 with BlockPacked = ~2.6% larger ... seems worth it! Apps can always tune the 20% too. > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892-BlockTermScorer.patch, > LUCENE-3892-blockFor&hardcode(base).patch, > LUCENE-3892-blockFor&packedecoder(comp).patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch, > LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, > LUCENE-3892-handle_open_files.patch, > LUCENE-3892-pfor-compress-iterate-numbits.patch, > LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, > LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, > LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, > LUCENE-3892_settings.patch, LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org