[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Han Jiang updated LUCENE-3892:
------------------------------
Attachment: LUCENE-3892_for.patch
LUCENE-3892_pfor.patch
The new "3892_pfor" patch fixed some "SuppressingCodec" stuff since last two
weeks. And the "3892_for" lazily implements "For" postingsformat based on
current codes. These two patches are temporary separated, in order to prevent
performance reduction for the sake of method overriding.
Currently, blocksize ranges from 32 to 128 are tested on both two patches.
However, for those skipping-intensive queries, there is no significant
performance gain when smaller blocksize was applied.
Here is a previous result for PFor, with blockSize=64, comparing with 128(in
brackets):
{noformat}
Task QPS Base StdDev Base QPS PFor StdDev PFor Pct
diff
Phrase 4.93 0.36 3.10 0.33 -47% -
-25% (-47% - -25%)
AndHighMed 27.92 2.26 19.16 1.72 -42% -
-18% (-37% - -15%)
SpanNear 2.73 0.16 1.96 0.24 -40% -
-14% (-36% - -13%)
SloppyPhrase 4.19 0.21 3.20 0.30 -34% -
-12% (-30% - -6%)
Wildcard 19.44 0.87 17.11 0.94 -20% -
-2% (-17% - 3%)
AndHighHigh 7.50 0.38 6.61 0.59 -23% -
1% (-19% - 6%)
IntNRQ 4.06 0.52 3.88 0.35 -22% -
19% (-16% - 24%)
Prefix3 31.00 1.69 30.45 2.29 -13% -
11% ( -6% - 20%)
OrHighHigh 4.16 0.47 4.11 0.34 -18% -
20% (-14% - 27%)
OrHighMed 4.98 0.59 4.94 0.41 -18% -
22% (-14% - 27%)
Respell 40.29 2.11 40.11 2.13 -10% -
10% (-15% - 2%)
TermBGroup1M 20.50 0.32 20.52 0.80 -5% -
5% ( 1% - 10%)
TermGroup1M 13.51 0.43 13.61 0.40 -5% -
7% ( 1% - 9%)
Fuzzy1 43.20 1.83 44.02 1.95 -6% -
11% (-11% - 1%)
PKLookup 87.16 1.78 89.52 0.94 0% -
5% ( -2% - 7%)
Fuzzy2 16.09 0.80 16.54 0.77 -6% -
13% (-11% - 6%)
Term 43.56 1.53 45.26 3.84 -8% -
16% ( 2% - 26%)
TermBGroup1M1P 21.33 0.64 22.24 1.23 -4% -
13% ( 0% - 14%)
{noformat}
Also, the For postingsformat shows few performance change. So I suppose the
bottleneck isn't in this method: PForUtil.patchException.
Here is an example with blockSize=64:
{noformat}
Task QPS Base StdDev Base QPS For StdDev For Pct
diff
Phrase 5.03 0.45 3.30 0.43 -47% -
-18%
AndHighMed 28.05 2.33 18.83 1.77 -43% -
-19%
SpanNear 2.69 0.18 1.94 0.25 -40% -
-12%
SloppyPhrase 4.19 0.20 3.22 0.35 -34% -
-10%
AndHighHigh 7.61 0.46 6.41 0.54 -27% -
-2%
Respell 41.36 1.65 37.94 2.42 -17% -
1%
Wildcard 19.20 0.77 17.89 0.99 -15% -
2%
OrHighHigh 4.22 0.37 3.94 0.32 -21% -
10%
OrHighMed 5.06 0.46 4.73 0.39 -21% -
11%
Fuzzy1 44.15 1.31 42.38 1.74 -10% -
2%
Fuzzy2 16.48 0.59 15.84 0.76 -11% -
4%
TermGroup1M 13.32 0.35 13.44 0.53 -5% -
7%
PKLookup 87.70 1.81 88.62 1.22 -2% -
4%
TermBGroup1M 20.14 0.47 20.40 0.59 -3% -
6%
Prefix3 30.31 1.49 31.08 2.26 -9% -
15%
TermBGroup1M1P 21.13 0.46 21.79 1.42 -5% -
12%
IntNRQ 3.96 0.45 4.14 0.46 -16% -
31%
Term 43.07 1.51 46.06 4.50 -6% -
21%
{noformat}
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892_for.patch, LUCENE-3892_pfor.patch,
> LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch,
> LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]