[
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123072#comment-13123072
]
Robert Muir commented on LUCENE-1536:
-------------------------------------
Here's the results... F0.1 for example means filter accepting a random 0.1% of
documents.
{noformat}
Task QPS trunkStdDev trunk QPS patchStdDev patch Pct
diff
PhraseF0.1 67.61 1.89 29.85 2.52 -60% -
-50%
PhraseF0.5 20.08 0.72 13.09 1.11 -42% -
-26%
PhraseF1.0 12.37 0.46 8.84 0.88 -37% -
-18%
OrHighHighF0.1 78.84 1.19 59.96 2.87 -28% -
-19%
TermF0.5 133.27 4.80 125.91 7.29 -14% -
3%
OrHighHigh 12.73 0.45 12.13 0.92 -14% -
6%
Fuzzy1 57.63 1.70 56.62 2.33 -8% -
5%
Fuzzy2 96.92 2.25 96.19 2.63 -5% -
4%
AndHighHighF100.0 16.99 0.50 16.92 1.38 -11% -
10%
AndHighHighF99.0 17.00 0.48 16.94 1.37 -10% -
10%
AndHighHighF95.0 17.00 0.48 16.98 1.35 -10% -
10%
Fuzzy2F0.1 107.24 2.74 107.29 2.68 -4% -
5%
AndHighHighF90.0 17.04 0.47 17.13 1.36 -9% -
11%
Fuzzy1F0.1 74.60 1.58 75.03 1.55 -3% -
4%
SloppyPhraseF100.0 7.82 0.16 7.89 0.24 -4% -
6%
SloppyPhraseF99.0 7.82 0.16 7.92 0.23 -3% -
6%
Fuzzy2F100.0 97.16 2.31 98.43 2.19 -3% -
6%
PKLookup 171.71 6.83 174.15 7.28 -6% -
10%
WildcardF0.1 67.96 1.06 69.08 1.95 -2% -
6%
Wildcard 43.40 0.89 44.13 0.92 -2% -
5%
Fuzzy2F99.0 96.83 2.46 98.49 2.21 -3% -
6%
Fuzzy2F95.0 97.01 2.47 98.79 2.18 -2% -
6%
SpanNearF100.0 3.11 0.04 3.18 0.09 -1% -
6%
AndHighHighF75.0 17.13 0.48 17.57 1.36 -7% -
13%
Fuzzy2F90.0 97.01 2.53 99.49 2.10 -2% -
7%
OrHighHighF0.5 31.57 0.45 32.41 1.07 -2% -
7%
SloppyPhraseF95.0 7.82 0.18 8.03 0.25 -2% -
8%
SpanNearF99.0 3.11 0.04 3.20 0.09 -1% -
7%
AndHighHighF0.1 136.96 3.21 140.94 5.15 -3% -
9%
SloppyPhraseF0.1 56.27 0.88 57.97 1.47 -1% -
7%
Fuzzy2F0.5 100.39 2.48 103.57 2.47 -1% -
8%
PhraseF2.0 7.95 0.31 8.20 0.65 -8% -
15%
AndHighHigh 17.97 0.46 18.55 0.84 -3% -
10%
TermF0.1 351.76 9.38 363.42 16.25 -3% -
10%
SloppyPhrase 7.90 0.16 8.19 0.19 0% -
8%
Phrase 3.69 0.12 3.83 0.13 -3% -
10%
WildcardF0.5 62.57 0.88 65.31 2.07 0% -
9%
SloppyPhraseF90.0 7.83 0.16 8.18 0.24 0% -
9%
Fuzzy2F75.0 96.77 2.46 101.14 2.41 0% -
9%
SpanNear 3.15 0.04 3.30 0.07 1% -
8%
Term 71.54 4.98 74.98 5.61 -9% -
21%
SpanNearF95.0 3.11 0.05 3.26 0.09 0% -
9%
PhraseF100.0 3.49 0.13 3.68 0.15 -2% -
14%
PhraseF99.0 3.49 0.12 3.69 0.15 -2% -
14%
SpanNearF0.1 31.54 0.48 33.49 0.73 2% -
10%
PhraseF95.0 3.49 0.12 3.72 0.16 -1% -
15%
SpanNearF90.0 3.12 0.04 3.35 0.09 3% -
11%
Fuzzy2F50.0 97.08 2.32 104.79 2.66 2% -
13%
PhraseF90.0 3.49 0.13 3.78 0.16 0% -
17%
Fuzzy1F100.0 47.68 1.41 52.27 1.08 4% -
15%
Fuzzy1F99.0 47.57 1.49 52.28 1.19 4% -
16%
AndHighHighF50.0 17.30 0.48 19.12 1.47 0% -
22%
WildcardF1.0 58.03 0.81 64.32 2.40 5% -
16%
Fuzzy1F95.0 47.59 1.50 52.84 1.17 5% -
17%
SloppyPhraseF75.0 7.85 0.15 8.73 0.24 6% -
16%
Fuzzy2F1.0 98.59 2.36 110.12 2.89 6% -
17%
Fuzzy1F90.0 47.51 1.40 53.54 1.09 7% -
18%
PhraseF75.0 3.51 0.13 3.98 0.18 4% -
22%
TermF1.0 92.28 3.05 104.56 7.44 1% -
25%
WildcardF99.0 36.01 0.76 40.88 1.16 8% -
19%
Fuzzy1F0.5 59.00 1.10 67.10 1.36 9% -
18%
WildcardF100.0 35.92 0.79 40.86 1.19 8% -
19%
WildcardF95.0 36.01 0.75 41.02 1.19 8% -
19%
WildcardF90.0 36.06 0.70 41.14 1.20 8% -
19%
Fuzzy2F20.0 98.32 2.34 112.69 2.91 9% -
20%
WildcardF75.0 36.19 0.62 41.69 1.15 10% -
20%
AndHighHighF0.5 49.93 1.37 57.85 4.13 4% -
27%
Fuzzy1F75.0 47.25 1.50 55.55 1.11 11% -
23%
Fuzzy2F10.0 98.47 2.46 116.18 3.00 12% -
24%
WildcardF50.0 36.77 0.55 43.44 1.29 12% -
23%
OrHighHighF1.0 24.37 0.38 28.99 1.90 9% -
28%
Fuzzy1F2.0 52.64 1.05 63.12 1.32 15% -
24%
SpanNearF75.0 3.11 0.04 3.74 0.10 15% -
24%
Fuzzy2F5.0 97.96 2.31 118.02 3.48 14% -
27%
Fuzzy2F2.0 98.02 2.22 119.13 3.42 15% -
27%
OrHighHighF100.0 7.70 0.34 9.51 0.34 13% -
33%
OrHighHighF99.0 7.70 0.36 9.56 0.34 14% -
34%
Fuzzy1F50.0 47.46 1.24 59.15 1.18 19% -
30%
PhraseF50.0 3.57 0.12 4.45 0.23 14% -
35%
OrHighHighF95.0 7.73 0.35 9.73 0.35 16% -
36%
SloppyPhraseF50.0 7.92 0.16 10.09 0.28 21% -
33%
WildcardF2.0 53.32 0.69 68.29 3.44 20% -
36%
OrHighHighF90.0 7.77 0.35 9.97 0.35 18% -
39%
WildcardF20.0 41.13 0.60 54.63 2.12 25% -
39%
OrHighHighF75.0 7.91 0.32 10.73 0.36 26% -
45%
WildcardF5.0 47.44 0.57 65.42 3.11 29% -
46%
WildcardF10.0 44.01 0.53 61.16 2.61 31% -
46%
Fuzzy1F20.0 49.57 1.20 69.49 1.70 33% -
47%
Fuzzy1F1.0 54.39 1.07 76.95 2.03 35% -
48%
AndHighHighF1.0 34.63 1.07 50.01 4.02 28% -
60%
PhraseF5.0 5.16 0.20 7.61 0.75 27% -
68%
Fuzzy1F10.0 50.23 1.07 75.36 2.11 42% -
57%
OrHighHighF50.0 8.36 0.29 12.58 0.48 39% -
61%
OrHighHighF2.0 19.65 0.34 29.58 2.27 36% -
65%
SpanNearF50.0 3.11 0.04 4.76 0.12 47% -
58%
TermF2.0 68.99 2.38 106.22 8.65 36% -
72%
Fuzzy1F5.0 50.74 1.06 79.90 2.38 49% -
65%
PhraseF20.0 3.81 0.13 6.10 0.45 43% -
78%
TermF50.0 42.19 1.41 67.96 4.63 45% -
77%
TermF75.0 41.36 1.46 67.47 5.30 45% -
82%
TermF90.0 41.05 1.47 68.08 5.85 46% -
86%
TermF95.0 41.03 1.49 68.08 6.14 45% -
87%
PhraseF10.0 4.22 0.16 7.02 0.62 46% -
87%
TermF99.0 40.99 1.56 68.31 6.21 45% -
89%
TermF100.0 40.88 1.61 68.28 6.32 45% -
89%
SloppyPhraseF0.5 18.81 0.30 31.53 0.96 59% -
75%
AndHighHighF20.0 17.62 0.52 30.63 2.79 53% -
95%
OrHighHighF5.0 14.99 0.29 27.44 1.98 66% -
100%
SpanNearF0.5 9.17 0.12 17.12 0.42 79% -
93%
TermF20.0 45.25 1.50 84.63 6.04 68% -
107%
OrHighHighF20.0 10.35 0.25 19.60 1.08 74% -
104%
TermF5.0 52.49 1.71 99.90 8.02 69% -
112%
AndHighHighF2.0 25.97 0.81 50.45 4.72 70% -
119%
OrHighHighF10.0 12.36 0.22 24.25 1.56 80% -
112%
TermF10.0 46.97 1.47 92.60 7.08 76% -
119%
SloppyPhraseF20.0 8.18 0.16 16.35 0.58 89% -
111%
SpanNearF1.0 6.05 0.09 12.21 0.28 94% -
109%
AndHighHighF10.0 18.44 0.55 40.77 4.15 92% -
151%
AndHighHighF5.0 20.34 0.63 50.83 5.67 115% -
186%
SloppyPhraseF10.0 8.52 0.17 22.79 0.96 151% -
184%
SpanNearF20.0 3.15 0.05 9.03 0.24 174% -
198%
SloppyPhraseF1.0 13.62 0.23 42.77 2.29 192% -
236%
SpanNearF2.0 4.45 0.06 14.31 0.37 209% -
234%
SloppyPhraseF5.0 9.12 0.17 29.98 1.41 207% -
250%
SloppyPhraseF2.0 10.85 0.19 38.31 2.00 229% -
278%
SpanNearF10.0 3.25 0.05 13.71 0.39 303% -
339%
SpanNearF5.0 3.52 0.05 19.51 0.67 428% -
481%
{noformat}
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 2.4
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
> * Index is first 2M docs of Wikipedia. Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
> * I test across multiple queries. 1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4. "u s" means "united states" (phrase search).
> * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.99999 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
> * Method high means I use random-access filter API in
> IndexSearcher's main loop. Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
> * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]