[
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430636#comment-13430636
]
Michael McCandless commented on LUCENE-4283:
--------------------------------------------
Thanks Billy, that's a nice optimization! I think other postings formats
should do the same thing...
It seems to give a small gain to the skip-heavy queries:
{noformat}
Task QPS base StdDev baseQPS nextskipStdDev nextskip
Pct diff
AndHighHigh 23.87 0.09 23.56 0.19 -2% -
0%
Fuzzy2 63.37 1.07 62.59 0.86 -4% -
1%
OrHighHigh 11.67 0.08 11.53 0.35 -4% -
2%
Fuzzy1 75.44 1.02 74.59 0.74 -3% -
1%
OrHighMed 24.14 0.18 23.89 0.72 -4% -
2%
Respell 62.66 0.65 62.04 1.37 -4% -
2%
OrHighLow 27.86 0.23 27.60 0.85 -4% -
2%
HighSloppyPhrase 2.00 0.04 1.99 0.05 -5% -
3%
HighSpanNear 1.70 0.02 1.69 0.01 -2% -
1%
LowTerm 517.40 1.67 514.32 2.68 -1% -
0%
LowSloppyPhrase 7.61 0.07 7.58 0.16 -3% -
2%
MedSloppyPhrase 6.90 0.09 6.88 0.13 -3% -
2%
PKLookup 192.23 1.99 191.81 3.80 -3% -
2%
Prefix3 82.35 0.63 82.36 1.06 -2% -
2%
Wildcard 52.49 0.44 52.54 0.41 -1% -
1%
HighTerm 36.03 0.11 36.09 0.03 0% -
0%
IntNRQ 11.56 0.07 11.58 0.03 0% -
1%
MedTerm 197.94 0.88 198.87 0.36 0% -
1%
MedSpanNear 4.84 0.07 4.86 0.03 -1% -
2%
LowSpanNear 9.49 0.26 9.64 0.01 -1% -
4%
LowPhrase 21.95 0.38 22.39 0.08 0% -
4%
AndHighLow 641.56 10.38 657.49 5.64 0% -
5%
MedPhrase 13.04 0.30 13.37 0.05 0% -
5%
AndHighMed 67.13 0.57 69.30 0.80 1% -
5%
HighPhrase 1.81 0.10 1.87 0.03 -3% -
11%
{noformat}
> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch,
> LUCENE-4283-codes-cleanup.patch, LUCENE-4283-record-next-skip.patch,
> LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch,
> LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize.
> Every time the skipper reaches the last level 0 skip point, we'll have to
> decode a whole block to read doc/freq data. Also, a higher level skip list
> will be created only for those df>blockSize^k, which means for most terms,
> skipping will just be a linear scan. If we increase current blockSize for
> better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set
> skipInterval=32*k.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]