[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430507#comment-13430507
]
Michael McCandless commented on LUCENE-3892:
--------------------------------------------
I tried smaller block sizes than 128. Here's 128 (base) vs 64:
{noformat}
Task QPS base StdDev base QPS block64StdDev block64 Pct
diff
AndHighHigh 23.91 0.57 22.28 0.27 -10% -
-3%
AndHighMed 60.63 1.02 56.96 1.13 -9% -
-2%
MedSloppyPhrase 7.69 0.01 7.30 0.13 -6% -
-3%
HighSloppyPhrase 1.93 0.02 1.83 0.04 -8% -
-1%
LowSloppyPhrase 6.84 0.03 6.57 0.11 -6% -
-1%
Fuzzy1 65.49 0.85 63.50 1.68 -6% -
0%
HighPhrase 1.57 0.04 1.53 0.04 -7% -
3%
OrHighLow 22.89 0.98 22.38 0.61 -8% -
4%
OrHighMed 17.65 0.70 17.27 0.43 -8% -
4%
IntNRQ 9.50 0.48 9.33 0.36 -10% -
7%
OrHighHigh 8.98 0.36 8.84 0.19 -7% -
4%
HighTerm 29.60 2.64 29.16 1.44 -13% -
13%
Fuzzy2 65.54 0.86 64.63 2.13 -5% -
3%
Wildcard 45.27 1.27 44.78 0.48 -4% -
2%
MedTerm 150.40 12.65 148.99 6.63 -12% -
12%
Prefix3 72.55 2.55 72.31 1.02 -5% -
4%
LowTerm 421.62 38.27 422.40 9.47 -10% -
12%
LowSpanNear 7.55 0.34 7.62 0.22 -6% -
8%
HighSpanNear 1.34 0.09 1.35 0.06 -9% -
12%
MedPhrase 12.45 0.24 12.66 0.13 -1% -
4%
Respell 59.54 1.80 60.95 1.86 -3% -
8%
MedSpanNear 3.70 0.24 3.80 0.15 -7% -
14%
PKLookup 154.56 2.45 158.96 1.89 0% -
5%
LowPhrase 20.21 0.33 20.95 0.15 1% -
6%
AndHighLow 577.81 12.46 637.96 29.80 3% -
18%
{noformat}
And 128 (base) vs 32:
{noformat}
Task QPS base StdDev base QPS block64StdDev block64 Pct
diff
AndHighHigh 23.86 0.52 20.68 0.59 -17% -
-8%
IntNRQ 9.48 0.38 8.84 0.46 -15% -
2%
HighSloppyPhrase 1.87 0.04 1.76 0.06 -11% -
0%
Prefix3 72.65 2.18 68.24 2.96 -12% -
1%
HighTerm 29.91 1.40 28.28 2.94 -19% -
9%
Wildcard 44.74 0.83 42.43 1.49 -10% -
0%
HighSpanNear 1.37 0.08 1.30 0.07 -15% -
6%
MedTerm 152.73 5.28 145.45 14.69 -17% -
8%
MedSloppyPhrase 7.46 0.12 7.12 0.25 -9% -
0%
HighPhrase 1.57 0.03 1.50 0.01 -7% -
-1%
OrHighLow 22.94 0.70 22.00 1.10 -11% -
3%
AndHighMed 58.72 1.79 56.60 1.95 -9% -
2%
LowSloppyPhrase 6.67 0.10 6.44 0.20 -7% -
1%
OrHighMed 17.52 0.56 17.00 0.82 -10% -
5%
LowSpanNear 7.53 0.35 7.34 0.39 -11% -
7%
OrHighHigh 8.84 0.31 8.62 0.43 -10% -
6%
MedSpanNear 3.79 0.20 3.71 0.21 -12% -
9%
PKLookup 153.34 3.22 150.19 4.91 -7% -
3%
Fuzzy1 62.93 1.77 62.28 2.23 -7% -
5%
LowTerm 410.23 21.57 410.83 35.19 -13% -
14%
MedPhrase 12.55 0.14 12.65 0.08 0% -
2%
LowPhrase 20.42 0.17 20.77 0.21 0% -
3%
Fuzzy2 61.44 3.12 64.13 1.97 -3% -
13%
Respell 56.65 3.29 60.21 1.39 -1% -
15%
AndHighLow 588.05 12.37 720.63 19.33 16% -
28%
{noformat}
It looks like there's some speedup to AndHighLow and LowPhrase ... but
slowdowns in other (harder) queries... so I think net/net we should
leave block size at 128.
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch,
> LUCENE-3892-blockFor&hardcode(base).patch,
> LUCENE-3892-blockFor&packedecoder(comp).patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints.patch,
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-handle_open_files.patch,
> LUCENE-3892-pfor-compress-iterate-numbits.patch,
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch,
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]