[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124385#comment-13124385 ]
Michael McCandless commented on LUCENE-1536: -------------------------------------------- I also bench'd Robert's patch (turned off verifyScores in lucenebench because of LUCENE-3503); results look very similar: {noformat} Task QPS base StdDev baseQPS filterlowStdDev filterlow Pct diff PhraseF0.5 20.18 0.65 8.05 0.56 -64% - -55% PhraseF1.0 12.26 0.33 7.96 0.54 -41% - -28% AndHighHighF95.0 16.56 0.13 15.98 1.09 -10% - 3% Fuzzy2F99.0 80.52 4.67 77.72 2.34 -11% - 5% AndHighHighF99.0 16.55 0.12 15.97 1.05 -10% - 3% AndHighHighF100.0 16.54 0.13 15.98 1.06 -10% - 3% Fuzzy2F100.0 80.32 4.60 77.64 2.34 -11% - 5% Fuzzy2F90.0 80.80 5.17 78.19 2.77 -12% - 7% AndHighHighF90.0 16.57 0.15 16.05 1.13 -10% - 4% OrHighHighF0.1 72.17 3.60 70.11 3.69 -12% - 7% OrHighHighF0.5 29.26 1.23 28.44 1.50 -11% - 6% Fuzzy2F95.0 79.95 4.49 77.86 2.10 -10% - 5% WildcardF0.1 59.21 4.21 58.01 3.42 -13% - 11% WildcardF0.5 54.94 3.78 53.88 3.08 -13% - 11% WildcardF1.0 51.31 3.31 50.35 2.44 -12% - 9% WildcardF2.0 46.99 2.93 46.13 2.15 -11% - 9% Wildcard 38.73 1.94 38.14 1.78 -10% - 8% Fuzzy2F75.0 80.57 5.03 79.38 2.04 -9% - 7% AndHighHighF75.0 16.63 0.14 16.41 1.21 -9% - 6% SloppyPhraseF100.0 7.73 0.15 7.64 0.25 -6% - 4% SloppyPhraseF99.0 7.74 0.15 7.66 0.26 -6% - 4% TermF0.1 328.10 15.20 325.54 16.82 -10% - 9% OrHighHigh 10.68 1.11 10.61 0.75 -16% - 18% TermF0.5 127.55 3.70 126.88 6.02 -7% - 7% PhraseF0.1 63.93 2.25 63.62 2.87 -8% - 7% PhraseF2.0 7.88 0.19 7.86 0.31 -6% - 6% AndHighHighF0.1 129.64 5.02 129.28 6.98 -9% - 9% SloppyPhraseF0.1 53.80 0.79 53.86 1.84 -4% - 5% SloppyPhraseF95.0 7.74 0.15 7.75 0.27 -5% - 5% SloppyPhraseF0.5 18.44 0.31 18.47 0.64 -4% - 5% SloppyPhraseF1.0 13.10 0.23 13.13 0.47 -5% - 5% SloppyPhrase 7.81 0.10 7.83 0.30 -4% - 5% AndHighHighF0.5 47.61 1.00 47.76 2.33 -6% - 7% Fuzzy2F1.0 81.49 4.85 81.96 0.96 -6% - 8% Fuzzy1 47.97 3.71 48.35 1.94 -10% - 13% Fuzzy1F0.1 64.31 3.56 64.82 0.83 -5% - 8% Fuzzy2 80.93 6.15 81.61 1.74 -8% - 11% Phrase 3.58 0.10 3.63 0.18 -6% - 9% SpanNearF100.0 2.98 0.10 3.03 0.12 -5% - 9% SloppyPhraseF90.0 7.74 0.15 7.87 0.28 -3% - 7% AndHighHigh 17.31 0.24 17.62 0.64 -3% - 6% Fuzzy2F0.1 89.54 5.78 91.38 1.44 -5% - 10% SpanNearF99.0 2.98 0.09 3.04 0.13 -5% - 9% Term 58.94 6.06 60.38 4.40 -13% - 22% SpanNearF0.1 29.91 1.07 30.70 1.43 -5% - 11% SpanNearF0.5 8.73 0.30 8.98 0.41 -5% - 11% SpanNearF5.0 3.33 0.11 3.42 0.16 -5% - 11% Fuzzy2F50.0 80.90 5.19 83.29 2.28 -5% - 13% SpanNear 3.01 0.10 3.10 0.14 -4% - 11% TermF1.0 87.07 2.01 89.92 6.38 -6% - 13% SpanNearF95.0 2.98 0.10 3.10 0.13 -3% - 12% PhraseF100.0 3.37 0.06 3.51 0.17 -2% - 11% PhraseF99.0 3.37 0.05 3.52 0.17 -2% - 11% PhraseF95.0 3.37 0.06 3.56 0.18 -1% - 12% PKLookup 126.08 5.73 133.37 1.97 0% - 12% SpanNearF90.0 2.98 0.10 3.18 0.14 -1% - 15% PhraseF90.0 3.38 0.06 3.61 0.18 0% - 14% WildcardF100.0 32.22 1.76 34.59 1.43 -2% - 18% WildcardF99.0 32.23 1.79 34.61 1.39 -2% - 18% SloppyPhraseF75.0 7.74 0.16 8.32 0.33 1% - 14% WildcardF95.0 32.15 1.83 34.72 1.37 -1% - 19% WildcardF90.0 32.10 1.82 34.90 1.29 0% - 19% Fuzzy1F100.0 42.36 1.85 46.19 2.07 0% - 19% AndHighHighF50.0 16.76 0.10 18.30 1.44 0% - 18% Fuzzy1F99.0 42.21 1.84 46.21 1.96 0% - 19% Fuzzy2F20.0 81.24 5.06 88.97 2.11 0% - 19% Fuzzy1F95.0 42.25 1.85 46.54 2.10 0% - 20% WildcardF75.0 31.98 1.81 35.49 1.32 1% - 22% Fuzzy1F90.0 42.15 1.77 46.84 1.91 2% - 20% PhraseF75.0 3.39 0.06 3.82 0.20 4% - 20% Fuzzy1F75.0 42.01 1.55 47.63 1.98 4% - 22% OrHighHighF1.0 22.54 0.94 25.87 1.81 2% - 28% Fuzzy2F10.0 81.10 5.01 93.81 2.75 5% - 26% WildcardF50.0 32.66 1.88 37.81 1.33 5% - 27% Fuzzy1F50.0 42.25 1.68 49.68 1.91 8% - 27% SpanNearF75.0 2.98 0.10 3.51 0.16 9% - 27% Fuzzy2F5.0 80.39 4.38 96.21 2.19 10% - 29% Fuzzy2F0.5 83.14 4.67 99.73 1.75 11% - 29% Fuzzy2F2.0 80.95 4.92 98.00 1.76 12% - 31% SloppyPhraseF50.0 7.78 0.16 9.62 0.43 15% - 31% PhraseF50.0 3.45 0.06 4.36 0.24 17% - 35% WildcardF20.0 35.76 2.01 45.85 1.89 16% - 41% WildcardF5.0 41.47 2.41 53.54 2.38 16% - 43% Fuzzy1F20.0 43.60 1.76 57.50 2.00 22% - 42% WildcardF10.0 38.26 2.17 50.63 2.11 20% - 46% TermF99.0 40.49 1.22 54.84 4.93 19% - 52% TermF100.0 40.51 1.29 54.99 4.92 19% - 52% TermF95.0 40.44 1.19 54.95 4.82 20% - 52% TermF90.0 40.34 1.08 55.00 4.58 21% - 51% OrHighHighF2.0 18.15 0.69 24.94 1.69 23% - 52% TermF2.0 63.47 1.48 87.39 5.94 25% - 50% TermF75.0 40.05 0.92 55.28 4.38 24% - 52% Fuzzy1F0.5 51.14 2.45 71.30 1.82 29% - 50% OrHighHighF100.0 7.05 0.15 9.96 0.73 28% - 54% OrHighHighF99.0 7.04 0.15 9.97 0.72 28% - 55% TermF50.0 40.94 0.70 58.33 4.05 30% - 55% Fuzzy1F10.0 43.92 1.78 62.74 1.47 34% - 52% OrHighHighF95.0 7.08 0.14 10.12 0.70 30% - 56% OrHighHighF90.0 7.10 0.15 10.31 0.71 32% - 58% PhraseF5.0 5.02 0.10 7.33 0.48 33% - 58% Fuzzy1F1.0 47.45 2.15 70.55 1.95 38% - 60% Fuzzy1F5.0 44.47 1.99 66.38 1.89 38% - 60% Fuzzy1F2.0 46.09 1.98 69.35 1.65 40% - 60% SpanNearF50.0 2.98 0.10 4.51 0.23 39% - 64% OrHighHighF75.0 7.20 0.15 10.97 0.73 39% - 65% AndHighHighF20.0 16.92 0.14 26.80 2.77 40% - 76% PhraseF20.0 3.69 0.06 5.86 0.36 46% - 71% TermF20.0 42.65 0.76 69.54 4.48 49% - 76% PhraseF10.0 4.10 0.07 6.74 0.43 51% - 78% OrHighHighF50.0 7.61 0.17 12.77 0.76 54% - 81% OrHighHighF5.0 13.68 0.48 23.13 1.55 52% - 86% TermF5.0 47.37 1.30 81.16 5.35 55% - 87% TermF10.0 43.07 0.95 74.83 4.64 59% - 88% AndHighHighF1.0 32.98 0.48 59.25 8.46 51% - 108% SloppyPhraseF20.0 8.00 0.16 14.72 0.84 70% - 98% OrHighHighF10.0 11.20 0.34 21.25 1.38 72% - 108% OrHighHighF20.0 9.32 0.22 18.10 1.12 77% - 111% AndHighHighF10.0 17.54 0.16 35.08 4.05 75% - 125% AndHighHighF2.0 24.58 0.27 52.49 7.16 82% - 145% AndHighHighF5.0 19.26 0.17 43.11 5.37 94% - 154% SloppyPhraseF10.0 8.24 0.16 19.96 1.29 122% - 162% SpanNearF20.0 3.01 0.10 8.24 0.48 149% - 199% SloppyPhraseF5.0 8.75 0.17 26.13 1.80 172% - 225% SloppyPhraseF2.0 10.35 0.20 33.95 2.51 198% - 259% SpanNearF10.0 3.09 0.10 12.23 0.76 259% - 334% SpanNearF1.0 5.75 0.19 30.48 2.42 372% - 492% SpanNearF2.0 4.21 0.13 24.77 1.80 428% - 551% {noformat} > if a filter can support random access API, we should use it > ----------------------------------------------------------- > > Key: LUCENE-1536 > URL: https://issues.apache.org/jira/browse/LUCENE-1536 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search > Affects Versions: 2.4 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, > LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, > LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, > LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, > LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch > > > I ran some performance tests, comparing applying a filter via > random-access API instead of current trunk's iterator API. > This was inspired by LUCENE-1476, where we realized deletions should > really be implemented just like a filter, but then in testing found > that switching deletions to iterator was a very sizable performance > hit. > Some notes on the test: > * Index is first 2M docs of Wikipedia. Test machine is Mac OS X > 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. > * I test across multiple queries. 1-X means an OR query, eg 1-4 > means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 > AND 3 AND 4. "u s" means "united states" (phrase search). > * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, > 95, 98, 99, 99.99999 (filter is non-null but all bits are set), > 100 (filter=null, control)). > * Method high means I use random-access filter API in > IndexSearcher's main loop. Method low means I use random-access > filter API down in SegmentTermDocs (just like deleted docs > today). > * Baseline (QPS) is current trunk, where filter is applied as iterator up > "high" (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org