[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431324#comment-13431324 ]
Adrien Grand edited comment on LUCENE-3892 at 8/8/12 7:38 PM: -------------------------------------------------------------- I did some changes to the {{BlockPacked}} codec: - encoding and decoding using int[] instead of long[] - selection of the format based on a configurable overhead ratio. The results are encouraging (using acceptableOverheadRatio = PackedInts.DEFAULT = 20%): {noformat} Task QPS 3892 StdDev 3892QPS 3892-packedStdDev 3892-packed Pct diff PKLookup 256.93 8.89 256.85 7.47 -6% - 6% OrHighLow 145.14 9.86 145.14 9.35 -12% - 14% Respell 110.26 1.84 110.27 2.01 -3% - 3% AndHighHigh 112.97 0.81 113.19 2.17 -2% - 2% Fuzzy1 102.15 1.47 102.86 3.13 -3% - 5% OrHighHigh 94.56 6.56 95.43 6.35 -11% - 15% Fuzzy2 42.49 0.77 42.89 1.43 -4% - 6% OrHighMed 175.30 11.34 177.42 10.83 -10% - 14% AndHighLow 1925.02 23.92 1952.57 48.68 -2% - 5% HighPhrase 8.96 0.41 9.11 0.46 -7% - 11% Wildcard 189.79 2.13 193.12 1.57 0% - 3% HighSpanNear 6.47 0.15 6.59 0.25 -4% - 8% Prefix3 256.67 2.58 262.40 2.84 0% - 4% LowTerm 1746.52 52.80 1789.54 54.30 -3% - 8% HighTerm 238.70 13.46 245.63 16.60 -9% - 16% MedTerm 923.64 38.19 951.18 46.85 -5% - 12% AndHighMed 364.46 3.65 377.09 10.03 0% - 7% IntNRQ 56.58 1.02 58.84 0.80 0% - 7% HighSloppyPhrase 11.73 0.30 12.40 0.62 -2% - 13% LowSpanNear 29.64 0.96 32.44 0.98 2% - 16% MedSpanNear 22.96 0.72 25.16 0.85 2% - 16% MedPhrase 40.99 1.25 45.09 1.24 3% - 16% LowSloppyPhrase 37.88 0.99 41.98 1.49 4% - 17% LowPhrase 64.40 2.04 71.84 1.41 5% - 17% MedSloppyPhrase 42.29 1.16 47.32 1.54 5% - 18% {noformat} I hope this will be confirmed on your computers this time .:-) was (Author: jpountz): I did some changes to the {{BlockPacked}} codec: - encoding and decoding using int[] instead of long[] - selection of the format based on a configurable overhead ratio. The results are encouraging: {noformat} Task QPS 3892 StdDev 3892QPS 3892-packedStdDev 3892-packed Pct diff PKLookup 256.93 8.89 256.85 7.47 -6% - 6% OrHighLow 145.14 9.86 145.14 9.35 -12% - 14% Respell 110.26 1.84 110.27 2.01 -3% - 3% AndHighHigh 112.97 0.81 113.19 2.17 -2% - 2% Fuzzy1 102.15 1.47 102.86 3.13 -3% - 5% OrHighHigh 94.56 6.56 95.43 6.35 -11% - 15% Fuzzy2 42.49 0.77 42.89 1.43 -4% - 6% OrHighMed 175.30 11.34 177.42 10.83 -10% - 14% AndHighLow 1925.02 23.92 1952.57 48.68 -2% - 5% HighPhrase 8.96 0.41 9.11 0.46 -7% - 11% Wildcard 189.79 2.13 193.12 1.57 0% - 3% HighSpanNear 6.47 0.15 6.59 0.25 -4% - 8% Prefix3 256.67 2.58 262.40 2.84 0% - 4% LowTerm 1746.52 52.80 1789.54 54.30 -3% - 8% HighTerm 238.70 13.46 245.63 16.60 -9% - 16% MedTerm 923.64 38.19 951.18 46.85 -5% - 12% AndHighMed 364.46 3.65 377.09 10.03 0% - 7% IntNRQ 56.58 1.02 58.84 0.80 0% - 7% HighSloppyPhrase 11.73 0.30 12.40 0.62 -2% - 13% LowSpanNear 29.64 0.96 32.44 0.98 2% - 16% MedSpanNear 22.96 0.72 25.16 0.85 2% - 16% MedPhrase 40.99 1.25 45.09 1.24 3% - 16% LowSloppyPhrase 37.88 0.99 41.98 1.49 4% - 17% LowPhrase 64.40 2.04 71.84 1.41 5% - 17% MedSloppyPhrase 42.29 1.16 47.32 1.54 5% - 18% {noformat} I hope this will be confirmed on your computers this time .:-) > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892-BlockTermScorer.patch, > LUCENE-3892-blockFor&hardcode(base).patch, > LUCENE-3892-blockFor&packedecoder(comp).patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch, > LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, > LUCENE-3892-handle_open_files.patch, > LUCENE-3892-pfor-compress-iterate-numbits.patch, > LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, > LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, > LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, > LUCENE-3892_settings.patch, LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org