[
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428698#comment-13428698
]
Michael McCandless commented on LUCENE-4283:
--------------------------------------------
I tested the -fully patch:
{noformat}
Task QPS base StdDev base QPS comp StdDev comp Pct
diff
AndHighLow 628.46 8.28 155.04 1.42 -75% -
-74%
LowSpanNear 5.07 0.02 4.85 0.10 -6% -
-2%
MedSpanNear 9.12 0.07 8.86 0.22 -5% -
0%
OrHighMed 26.16 1.15 25.53 2.65 -16% -
12%
AndHighMed 44.92 0.88 43.94 0.30 -4% -
0%
OrHighLow 38.76 1.70 37.97 4.03 -16% -
13%
OrHighHigh 9.57 0.45 9.40 1.02 -16% -
14%
HighTerm 22.88 0.13 22.83 0.95 -4% -
4%
HighSloppyPhrase 2.14 0.10 2.14 0.11 -9% -
10%
LowSloppyPhrase 5.31 0.22 5.32 0.22 -7% -
8%
LowPhrase 7.85 0.09 7.87 0.21 -3% -
3%
HighSpanNear 1.65 0.01 1.66 0.04 -2% -
3%
Respell 77.70 1.24 78.14 2.12 -3% -
4%
MedTerm 138.26 0.52 139.07 5.52 -3% -
4%
PKLookup 193.63 2.06 195.98 2.84 -1% -
3%
MedSloppyPhrase 12.15 0.34 12.33 0.48 -5% -
8%
LowTerm 525.12 4.89 534.89 14.12 -1% -
5%
Fuzzy2 87.20 2.05 89.05 3.27 -3% -
8%
Fuzzy1 97.81 2.33 99.94 3.99 -4% -
8%
AndHighHigh 18.39 0.27 19.62 0.06 4% -
8%
MedPhrase 5.09 0.11 5.52 0.33 0% -
17%
Wildcard 67.59 0.58 73.76 3.37 3% -
15%
Prefix3 25.51 0.39 29.54 1.60 7% -
23%
HighPhrase 3.55 0.12 4.13 0.33 3% -
30%
IntNRQ 8.79 0.08 10.67 1.52 3% -
40%
{noformat}
It seems like we are getting some gains for Med/HighPhrase, but AndHighLow is
still way off.
> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch,
> LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch,
> LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize.
> Every time the skipper reaches the last level 0 skip point, we'll have to
> decode a whole block to read doc/freq data. Also, a higher level skip list
> will be created only for those df>blockSize^k, which means for most terms,
> skipping will just be a linear scan. If we increase current blockSize for
> better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set
> skipInterval=32*k.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]