[
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431486#comment-13431486
]
Michael McCandless commented on LUCENE-4283:
--------------------------------------------
Thanks Billy, patch looks good... I also see some improvements in the skip
heavy queries:
{noformat}
Task QPS base StdDev base QPS for StdDev for Pct
diff
HighSpanNear 1.70 0.05 1.66 0.02 -6% -
2%
PKLookup 192.84 3.29 190.09 2.97 -4% -
1%
MedSloppyPhrase 6.86 0.09 6.79 0.13 -4% -
2%
HighSloppyPhrase 1.97 0.04 1.96 0.08 -6% -
5%
MedSpanNear 4.88 0.12 4.85 0.06 -4% -
3%
OrHighMed 23.40 0.74 23.31 0.73 -6% -
6%
LowSloppyPhrase 7.58 0.12 7.56 0.18 -4% -
3%
OrHighLow 27.00 0.92 26.93 0.86 -6% -
6%
Wildcard 52.66 0.43 52.54 0.32 -1% -
1%
Prefix3 82.44 0.90 82.36 0.87 -2% -
2%
IntNRQ 11.61 0.02 11.60 0.02 0% -
0%
LowTerm 513.72 0.95 513.40 2.77 0% -
0%
OrHighHigh 11.27 0.35 11.27 0.35 -6% -
6%
HighTerm 36.10 0.07 36.10 0.03 0% -
0%
MedTerm 198.76 0.26 198.85 0.23 0% -
0%
Respell 61.52 1.12 61.88 0.36 -1% -
3%
Fuzzy1 74.60 1.37 75.07 0.58 -1% -
3%
Fuzzy2 62.36 1.33 63.09 0.33 -1% -
3%
AndHighHigh 23.62 0.08 24.07 0.21 0% -
3%
LowSpanNear 9.65 0.22 9.88 0.06 0% -
5%
LowPhrase 22.08 0.37 22.63 0.31 0% -
5%
HighPhrase 1.77 0.10 1.83 0.09 -6% -
14%
MedPhrase 13.09 0.29 13.54 0.25 0% -
7%
AndHighLow 662.00 1.45 700.98 24.76 1% -
9%
AndHighMed 69.58 0.18 75.15 1.28 5% -
10%
{noformat}
> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch,
> LUCENE-4283-codes-cleanup.patch, LUCENE-4283-record-next-skip.patch,
> LUCENE-4283-record-skip&inlining-scanning.patch, LUCENE-4283-slow.patch,
> LUCENE-4283-small-interval-fully.patch,
> LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize.
> Every time the skipper reaches the last level 0 skip point, we'll have to
> decode a whole block to read doc/freq data. Also, a higher level skip list
> will be created only for those df>blockSize^k, which means for most terms,
> skipping will just be a linear scan. If we increase current blockSize for
> better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set
> skipInterval=32*k.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]