[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431882#comment-13431882 ]
Han Jiang commented on LUCENE-3892: ----------------------------------- I revived the PFor codes, and test it agains BlockFor and BlockPacked: BlockFor as base: {noformat} Task QPS base StdDev base QPS pfor StdDev pfor Pct diff AndHighHigh 121.54 1.37 116.69 2.03 -6% - -1% AndHighLow 2286.36 14.19 2212.92 11.48 -4% - -2% AndHighMed 322.97 7.37 294.19 4.76 -12% - -5% Fuzzy1 85.56 1.46 87.97 3.27 -2% - 8% Fuzzy2 30.94 0.56 32.16 1.34 -2% - 10% HighPhrase 9.39 0.38 9.02 0.45 -12% - 5% HighSloppyPhrase 5.38 0.08 5.24 0.12 -6% - 1% HighSpanNear 10.38 0.39 9.92 0.08 -8% - 0% HighTerm 180.30 6.87 172.83 6.26 -11% - 3% IntNRQ 62.01 3.73 60.89 3.54 -12% - 10% LowPhrase 42.44 0.67 38.73 0.89 -12% - -5% LowSloppyPhrase 62.82 0.79 56.79 0.43 -11% - -7% LowSpanNear 81.79 2.00 74.10 1.13 -12% - -5% LowTerm 1763.95 39.62 1721.30 34.22 -6% - 1% MedPhrase 27.87 0.59 25.82 0.74 -11% - -2% MedSloppyPhrase 32.15 0.41 29.91 0.31 -9% - -4% MedSpanNear 23.48 0.71 22.00 0.05 -9% - -3% MedTerm 662.11 24.22 638.81 19.31 -9% - 3% OrHighHigh 26.82 0.47 27.14 1.93 -7% - 10% OrHighLow 152.40 3.54 156.58 11.11 -6% - 12% OrHighMed 103.20 2.26 105.84 7.55 -6% - 12% PKLookup 216.38 4.32 219.32 2.59 -1% - 4% Prefix3 169.89 4.97 163.82 3.34 -8% - 1% Respell 83.23 1.44 86.20 3.00 -1% - 9% Wildcard 155.81 2.79 152.30 2.54 -5% - 1% {noformat} BlockPacked as base: {noformat} Task QPS base StdDev base QPS pfor StdDev pfor Pct diff AndHighHigh 122.94 3.43 116.24 1.90 -9% - -1% AndHighLow 2294.32 58.32 2199.14 31.97 -7% - 0% AndHighMed 325.55 12.44 290.20 3.80 -15% - -6% Fuzzy1 88.33 1.84 87.86 2.54 -5% - 4% Fuzzy2 31.92 0.80 32.00 0.92 -5% - 5% HighPhrase 9.73 0.47 9.04 0.29 -14% - 0% HighSloppyPhrase 5.49 0.19 5.16 0.03 -9% - -1% HighSpanNear 10.93 0.23 9.90 0.09 -12% - -6% HighTerm 178.31 6.37 171.06 6.14 -10% - 3% IntNRQ 60.87 4.71 62.38 5.49 -13% - 20% LowPhrase 44.97 1.18 38.36 1.01 -19% - -10% LowSloppyPhrase 69.61 1.19 55.90 1.39 -23% - -16% LowSpanNear 88.50 0.66 72.80 2.23 -20% - -14% LowTerm 1769.84 32.66 1717.02 39.75 -6% - 1% MedPhrase 28.88 0.84 25.57 0.68 -16% - -6% MedSloppyPhrase 34.47 0.50 29.29 0.54 -17% - -12% MedSpanNear 24.88 0.32 21.69 0.38 -15% - -10% MedTerm 667.95 21.61 633.73 22.17 -11% - 1% OrHighHigh 27.96 1.29 26.82 0.81 -11% - 3% OrHighLow 158.62 5.82 155.08 5.05 -8% - 4% OrHighMed 107.16 4.19 104.81 3.17 -8% - 4% PKLookup 217.22 1.86 216.83 1.87 -1% - 1% Prefix3 167.32 6.72 166.12 6.53 -8% - 7% Respell 85.25 2.27 85.85 2.16 -4% - 6% Wildcard 156.24 5.69 154.63 3.02 -6% - 4% {noformat} Current PFor impl only saves 1.8% against For, but get quite some perf loss. Let's use the Packed version! > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892-BlockTermScorer.patch, > LUCENE-3892-blockFor&hardcode(base).patch, > LUCENE-3892-blockFor&packedecoder(comp).patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch, > LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, > LUCENE-3892-handle_open_files.patch, LUCENE-3892-non-specialized.patch, > LUCENE-3892-pfor-compress-iterate-numbits.patch, > LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, > LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, > LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, > LUCENE-3892_settings.patch, LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org