[
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124385#comment-13124385
]
Michael McCandless commented on LUCENE-1536:
--------------------------------------------
I also bench'd Robert's patch (turned off verifyScores in lucenebench because
of LUCENE-3503); results look very similar:
{noformat}
Task QPS base StdDev baseQPS filterlowStdDev filterlow
Pct diff
PhraseF0.5 20.18 0.65 8.05 0.56 -64% -
-55%
PhraseF1.0 12.26 0.33 7.96 0.54 -41% -
-28%
AndHighHighF95.0 16.56 0.13 15.98 1.09 -10% -
3%
Fuzzy2F99.0 80.52 4.67 77.72 2.34 -11% -
5%
AndHighHighF99.0 16.55 0.12 15.97 1.05 -10% -
3%
AndHighHighF100.0 16.54 0.13 15.98 1.06 -10% -
3%
Fuzzy2F100.0 80.32 4.60 77.64 2.34 -11% -
5%
Fuzzy2F90.0 80.80 5.17 78.19 2.77 -12% -
7%
AndHighHighF90.0 16.57 0.15 16.05 1.13 -10% -
4%
OrHighHighF0.1 72.17 3.60 70.11 3.69 -12% -
7%
OrHighHighF0.5 29.26 1.23 28.44 1.50 -11% -
6%
Fuzzy2F95.0 79.95 4.49 77.86 2.10 -10% -
5%
WildcardF0.1 59.21 4.21 58.01 3.42 -13% -
11%
WildcardF0.5 54.94 3.78 53.88 3.08 -13% -
11%
WildcardF1.0 51.31 3.31 50.35 2.44 -12% -
9%
WildcardF2.0 46.99 2.93 46.13 2.15 -11% -
9%
Wildcard 38.73 1.94 38.14 1.78 -10% -
8%
Fuzzy2F75.0 80.57 5.03 79.38 2.04 -9% -
7%
AndHighHighF75.0 16.63 0.14 16.41 1.21 -9% -
6%
SloppyPhraseF100.0 7.73 0.15 7.64 0.25 -6% -
4%
SloppyPhraseF99.0 7.74 0.15 7.66 0.26 -6% -
4%
TermF0.1 328.10 15.20 325.54 16.82 -10% -
9%
OrHighHigh 10.68 1.11 10.61 0.75 -16% -
18%
TermF0.5 127.55 3.70 126.88 6.02 -7% -
7%
PhraseF0.1 63.93 2.25 63.62 2.87 -8% -
7%
PhraseF2.0 7.88 0.19 7.86 0.31 -6% -
6%
AndHighHighF0.1 129.64 5.02 129.28 6.98 -9% -
9%
SloppyPhraseF0.1 53.80 0.79 53.86 1.84 -4% -
5%
SloppyPhraseF95.0 7.74 0.15 7.75 0.27 -5% -
5%
SloppyPhraseF0.5 18.44 0.31 18.47 0.64 -4% -
5%
SloppyPhraseF1.0 13.10 0.23 13.13 0.47 -5% -
5%
SloppyPhrase 7.81 0.10 7.83 0.30 -4% -
5%
AndHighHighF0.5 47.61 1.00 47.76 2.33 -6% -
7%
Fuzzy2F1.0 81.49 4.85 81.96 0.96 -6% -
8%
Fuzzy1 47.97 3.71 48.35 1.94 -10% -
13%
Fuzzy1F0.1 64.31 3.56 64.82 0.83 -5% -
8%
Fuzzy2 80.93 6.15 81.61 1.74 -8% -
11%
Phrase 3.58 0.10 3.63 0.18 -6% -
9%
SpanNearF100.0 2.98 0.10 3.03 0.12 -5% -
9%
SloppyPhraseF90.0 7.74 0.15 7.87 0.28 -3% -
7%
AndHighHigh 17.31 0.24 17.62 0.64 -3% -
6%
Fuzzy2F0.1 89.54 5.78 91.38 1.44 -5% -
10%
SpanNearF99.0 2.98 0.09 3.04 0.13 -5% -
9%
Term 58.94 6.06 60.38 4.40 -13% -
22%
SpanNearF0.1 29.91 1.07 30.70 1.43 -5% -
11%
SpanNearF0.5 8.73 0.30 8.98 0.41 -5% -
11%
SpanNearF5.0 3.33 0.11 3.42 0.16 -5% -
11%
Fuzzy2F50.0 80.90 5.19 83.29 2.28 -5% -
13%
SpanNear 3.01 0.10 3.10 0.14 -4% -
11%
TermF1.0 87.07 2.01 89.92 6.38 -6% -
13%
SpanNearF95.0 2.98 0.10 3.10 0.13 -3% -
12%
PhraseF100.0 3.37 0.06 3.51 0.17 -2% -
11%
PhraseF99.0 3.37 0.05 3.52 0.17 -2% -
11%
PhraseF95.0 3.37 0.06 3.56 0.18 -1% -
12%
PKLookup 126.08 5.73 133.37 1.97 0% -
12%
SpanNearF90.0 2.98 0.10 3.18 0.14 -1% -
15%
PhraseF90.0 3.38 0.06 3.61 0.18 0% -
14%
WildcardF100.0 32.22 1.76 34.59 1.43 -2% -
18%
WildcardF99.0 32.23 1.79 34.61 1.39 -2% -
18%
SloppyPhraseF75.0 7.74 0.16 8.32 0.33 1% -
14%
WildcardF95.0 32.15 1.83 34.72 1.37 -1% -
19%
WildcardF90.0 32.10 1.82 34.90 1.29 0% -
19%
Fuzzy1F100.0 42.36 1.85 46.19 2.07 0% -
19%
AndHighHighF50.0 16.76 0.10 18.30 1.44 0% -
18%
Fuzzy1F99.0 42.21 1.84 46.21 1.96 0% -
19%
Fuzzy2F20.0 81.24 5.06 88.97 2.11 0% -
19%
Fuzzy1F95.0 42.25 1.85 46.54 2.10 0% -
20%
WildcardF75.0 31.98 1.81 35.49 1.32 1% -
22%
Fuzzy1F90.0 42.15 1.77 46.84 1.91 2% -
20%
PhraseF75.0 3.39 0.06 3.82 0.20 4% -
20%
Fuzzy1F75.0 42.01 1.55 47.63 1.98 4% -
22%
OrHighHighF1.0 22.54 0.94 25.87 1.81 2% -
28%
Fuzzy2F10.0 81.10 5.01 93.81 2.75 5% -
26%
WildcardF50.0 32.66 1.88 37.81 1.33 5% -
27%
Fuzzy1F50.0 42.25 1.68 49.68 1.91 8% -
27%
SpanNearF75.0 2.98 0.10 3.51 0.16 9% -
27%
Fuzzy2F5.0 80.39 4.38 96.21 2.19 10% -
29%
Fuzzy2F0.5 83.14 4.67 99.73 1.75 11% -
29%
Fuzzy2F2.0 80.95 4.92 98.00 1.76 12% -
31%
SloppyPhraseF50.0 7.78 0.16 9.62 0.43 15% -
31%
PhraseF50.0 3.45 0.06 4.36 0.24 17% -
35%
WildcardF20.0 35.76 2.01 45.85 1.89 16% -
41%
WildcardF5.0 41.47 2.41 53.54 2.38 16% -
43%
Fuzzy1F20.0 43.60 1.76 57.50 2.00 22% -
42%
WildcardF10.0 38.26 2.17 50.63 2.11 20% -
46%
TermF99.0 40.49 1.22 54.84 4.93 19% -
52%
TermF100.0 40.51 1.29 54.99 4.92 19% -
52%
TermF95.0 40.44 1.19 54.95 4.82 20% -
52%
TermF90.0 40.34 1.08 55.00 4.58 21% -
51%
OrHighHighF2.0 18.15 0.69 24.94 1.69 23% -
52%
TermF2.0 63.47 1.48 87.39 5.94 25% -
50%
TermF75.0 40.05 0.92 55.28 4.38 24% -
52%
Fuzzy1F0.5 51.14 2.45 71.30 1.82 29% -
50%
OrHighHighF100.0 7.05 0.15 9.96 0.73 28% -
54%
OrHighHighF99.0 7.04 0.15 9.97 0.72 28% -
55%
TermF50.0 40.94 0.70 58.33 4.05 30% -
55%
Fuzzy1F10.0 43.92 1.78 62.74 1.47 34% -
52%
OrHighHighF95.0 7.08 0.14 10.12 0.70 30% -
56%
OrHighHighF90.0 7.10 0.15 10.31 0.71 32% -
58%
PhraseF5.0 5.02 0.10 7.33 0.48 33% -
58%
Fuzzy1F1.0 47.45 2.15 70.55 1.95 38% -
60%
Fuzzy1F5.0 44.47 1.99 66.38 1.89 38% -
60%
Fuzzy1F2.0 46.09 1.98 69.35 1.65 40% -
60%
SpanNearF50.0 2.98 0.10 4.51 0.23 39% -
64%
OrHighHighF75.0 7.20 0.15 10.97 0.73 39% -
65%
AndHighHighF20.0 16.92 0.14 26.80 2.77 40% -
76%
PhraseF20.0 3.69 0.06 5.86 0.36 46% -
71%
TermF20.0 42.65 0.76 69.54 4.48 49% -
76%
PhraseF10.0 4.10 0.07 6.74 0.43 51% -
78%
OrHighHighF50.0 7.61 0.17 12.77 0.76 54% -
81%
OrHighHighF5.0 13.68 0.48 23.13 1.55 52% -
86%
TermF5.0 47.37 1.30 81.16 5.35 55% -
87%
TermF10.0 43.07 0.95 74.83 4.64 59% -
88%
AndHighHighF1.0 32.98 0.48 59.25 8.46 51% -
108%
SloppyPhraseF20.0 8.00 0.16 14.72 0.84 70% -
98%
OrHighHighF10.0 11.20 0.34 21.25 1.38 72% -
108%
OrHighHighF20.0 9.32 0.22 18.10 1.12 77% -
111%
AndHighHighF10.0 17.54 0.16 35.08 4.05 75% -
125%
AndHighHighF2.0 24.58 0.27 52.49 7.16 82% -
145%
AndHighHighF5.0 19.26 0.17 43.11 5.37 94% -
154%
SloppyPhraseF10.0 8.24 0.16 19.96 1.29 122% -
162%
SpanNearF20.0 3.01 0.10 8.24 0.48 149% -
199%
SloppyPhraseF5.0 8.75 0.17 26.13 1.80 172% -
225%
SloppyPhraseF2.0 10.35 0.20 33.95 2.51 198% -
259%
SpanNearF10.0 3.09 0.10 12.23 0.76 259% -
334%
SpanNearF1.0 5.75 0.19 30.48 2.42 372% -
492%
SpanNearF2.0 4.21 0.13 24.77 1.80 428% -
551%
{noformat}
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 2.4
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
> * Index is first 2M docs of Wikipedia. Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
> * I test across multiple queries. 1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4. "u s" means "united states" (phrase search).
> * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.99999 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
> * Method high means I use random-access filter API in
> IndexSearcher's main loop. Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
> * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]