[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425010#comment-13425010
]
Han Jiang edited comment on LUCENE-3892 at 7/30/12 5:36 PM:
------------------------------------------------------------
Previous experiments showed a net loss with packed ints API, however there're
slight difference e.g. all-value-the-same case is not handled equally. I
suppose these two patches should make the comparison fair enough.
Base: BlockForPF + hardcoded decoder
Comp: BlockForPF + PackedInts.Decoder
{noformat}
Task QPS base StdDev base QPS comp StdDev comp Pct
diff
AndHighHigh 25.66 0.31 22.61 1.21 -17% -
-6%
AndHighMed 74.17 1.45 59.48 3.62 -26% -
-13%
Fuzzy1 95.60 1.51 96.06 2.22 -3% -
4%
Fuzzy2 28.67 0.50 28.51 0.75 -4% -
3%
IntNRQ 33.31 0.60 30.73 1.51 -13% -
-1%
OrHighHigh 17.58 0.59 16.22 1.18 -17% -
2%
OrHighMed 34.42 0.93 32.14 2.33 -15% -
2%
PKLookup 217.08 4.25 213.76 1.37 -4% -
1%
Phrase 6.10 0.12 5.34 0.07 -15% -
-9%
Prefix3 77.27 1.26 70.42 2.87 -13% -
-3%
Respell 92.91 1.34 92.61 1.83 -3% -
3%
SloppyPhrase 5.35 0.16 5.00 0.29 -14% -
1%
SpanNear 6.05 0.15 5.47 0.07 -12% -
-6%
Term 37.62 0.32 33.08 1.70 -17% -
-6%
TermBGroup1M 17.45 0.64 16.40 0.73 -13% -
1%
TermBGroup1M1P 25.20 0.69 23.47 1.24 -14% -
0%
TermGroup1M 18.53 0.65 17.40 0.76 -13% -
1%
Wildcard 44.39 0.49 40.51 1.69 -13% -
-3%
{noformat}
Hmm, quite strange that we are already getting perf loss with baseline patch:
Base: BlockForPF in current branch
Comp: BlockForPF + hardcoded decoder(patch file)
{noformat}
Task QPS base StdDev base QPS comp StdDev comp Pct
diff
AndHighHigh 26.71 0.98 24.15 0.82 -15% -
-2%
AndHighMed 73.37 5.01 61.30 1.97 -24% -
-7%
Fuzzy1 85.73 4.95 84.30 1.79 -9% -
6%
Fuzzy2 30.15 2.05 29.52 0.66 -10% -
7%
IntNRQ 38.56 1.69 36.91 1.27 -11% -
3%
OrHighHigh 16.98 1.48 16.82 0.94 -13% -
14%
OrHighMed 34.60 2.79 34.70 2.22 -13% -
16%
PKLookup 214.93 3.99 213.86 1.23 -2% -
1%
Phrase 11.53 0.23 10.75 0.42 -12% -
-1%
Prefix3 107.15 3.83 102.12 2.69 -10% -
1%
Respell 87.41 5.41 86.08 1.76 -9% -
7%
SloppyPhrase 5.90 0.15 5.66 0.21 -9% -
2%
SpanNear 4.99 0.12 4.79 0.01 -6% -
-1%
Term 49.37 2.38 45.53 0.49 -12% -
-2%
TermBGroup1M 17.23 0.40 16.44 0.53 -9% -
0%
TermBGroup1M1P 22.02 0.50 22.42 0.60 -3% -
7%
TermGroup1M 13.65 0.29 13.05 0.28 -8% -
0%
Wildcard 48.73 2.01 46.35 1.31 -11% -
2%
{noformat}
was (Author: billy):
Previous experiments showed a net loss with packed ints API, however
there're slight difference e.g. all-value-the-same case is not handled equally.
I suppose these two patches should make the comparison fair enough.
Base: BlockForPF + hardwired decoder
Comp: BlockForPF + PackedInts.Decoder
{noformat}
Task QPS base StdDev base QPS comp StdDev comp Pct
diff
AndHighHigh 25.66 0.31 22.61 1.21 -17% -
-6%
AndHighMed 74.17 1.45 59.48 3.62 -26% -
-13%
Fuzzy1 95.60 1.51 96.06 2.22 -3% -
4%
Fuzzy2 28.67 0.50 28.51 0.75 -4% -
3%
IntNRQ 33.31 0.60 30.73 1.51 -13% -
-1%
OrHighHigh 17.58 0.59 16.22 1.18 -17% -
2%
OrHighMed 34.42 0.93 32.14 2.33 -15% -
2%
PKLookup 217.08 4.25 213.76 1.37 -4% -
1%
Phrase 6.10 0.12 5.34 0.07 -15% -
-9%
Prefix3 77.27 1.26 70.42 2.87 -13% -
-3%
Respell 92.91 1.34 92.61 1.83 -3% -
3%
SloppyPhrase 5.35 0.16 5.00 0.29 -14% -
1%
SpanNear 6.05 0.15 5.47 0.07 -12% -
-6%
Term 37.62 0.32 33.08 1.70 -17% -
-6%
TermBGroup1M 17.45 0.64 16.40 0.73 -13% -
1%
TermBGroup1M1P 25.20 0.69 23.47 1.24 -14% -
0%
TermGroup1M 18.53 0.65 17.40 0.76 -13% -
1%
Wildcard 44.39 0.49 40.51 1.69 -13% -
-3%
{noformat}
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch,
> LUCENE-3892-blockFor&hardcode(base).patch,
> LUCENE-3892-blockFor&packedecoder(comp).patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints.patch,
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-handle_open_files.patch,
> LUCENE-3892-pfor-compress-iterate-numbits.patch,
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch,
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]