[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Michael McCandless (Commented) (JIRA) Mon, 10 Oct 2011 11:46:53 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124385#comment-13124385
 ]


Michael McCandless commented on LUCENE-1536:
--------------------------------------------

I also bench'd Robert's patch (turned off verifyScores in lucenebench because 
of LUCENE-3503); results look very similar:
{noformat}

                Task    QPS base StdDev baseQPS filterlowStdDev filterlow      
Pct diff
          PhraseF0.5       20.18        0.65        8.05        0.56  -64% -  
-55%
          PhraseF1.0       12.26        0.33        7.96        0.54  -41% -  
-28%
    AndHighHighF95.0       16.56        0.13       15.98        1.09  -10% -    
3%
         Fuzzy2F99.0       80.52        4.67       77.72        2.34  -11% -    
5%
    AndHighHighF99.0       16.55        0.12       15.97        1.05  -10% -    
3%
   AndHighHighF100.0       16.54        0.13       15.98        1.06  -10% -    
3%
        Fuzzy2F100.0       80.32        4.60       77.64        2.34  -11% -    
5%
         Fuzzy2F90.0       80.80        5.17       78.19        2.77  -12% -    
7%
    AndHighHighF90.0       16.57        0.15       16.05        1.13  -10% -    
4%
      OrHighHighF0.1       72.17        3.60       70.11        3.69  -12% -    
7%
      OrHighHighF0.5       29.26        1.23       28.44        1.50  -11% -    
6%
         Fuzzy2F95.0       79.95        4.49       77.86        2.10  -10% -    
5%
        WildcardF0.1       59.21        4.21       58.01        3.42  -13% -   
11%
        WildcardF0.5       54.94        3.78       53.88        3.08  -13% -   
11%
        WildcardF1.0       51.31        3.31       50.35        2.44  -12% -    
9%
        WildcardF2.0       46.99        2.93       46.13        2.15  -11% -    
9%
            Wildcard       38.73        1.94       38.14        1.78  -10% -    
8%
         Fuzzy2F75.0       80.57        5.03       79.38        2.04   -9% -    
7%
    AndHighHighF75.0       16.63        0.14       16.41        1.21   -9% -    
6%
  SloppyPhraseF100.0        7.73        0.15        7.64        0.25   -6% -    
4%
   SloppyPhraseF99.0        7.74        0.15        7.66        0.26   -6% -    
4%
            TermF0.1      328.10       15.20      325.54       16.82  -10% -    
9%
          OrHighHigh       10.68        1.11       10.61        0.75  -16% -   
18%
            TermF0.5      127.55        3.70      126.88        6.02   -7% -    
7%
          PhraseF0.1       63.93        2.25       63.62        2.87   -8% -    
7%
          PhraseF2.0        7.88        0.19        7.86        0.31   -6% -    
6%
     AndHighHighF0.1      129.64        5.02      129.28        6.98   -9% -    
9%
    SloppyPhraseF0.1       53.80        0.79       53.86        1.84   -4% -    
5%
   SloppyPhraseF95.0        7.74        0.15        7.75        0.27   -5% -    
5%
    SloppyPhraseF0.5       18.44        0.31       18.47        0.64   -4% -    
5%
    SloppyPhraseF1.0       13.10        0.23       13.13        0.47   -5% -    
5%
        SloppyPhrase        7.81        0.10        7.83        0.30   -4% -    
5%
     AndHighHighF0.5       47.61        1.00       47.76        2.33   -6% -    
7%
          Fuzzy2F1.0       81.49        4.85       81.96        0.96   -6% -    
8%
              Fuzzy1       47.97        3.71       48.35        1.94  -10% -   
13%
          Fuzzy1F0.1       64.31        3.56       64.82        0.83   -5% -    
8%
              Fuzzy2       80.93        6.15       81.61        1.74   -8% -   
11%
              Phrase        3.58        0.10        3.63        0.18   -6% -    
9%
      SpanNearF100.0        2.98        0.10        3.03        0.12   -5% -    
9%
   SloppyPhraseF90.0        7.74        0.15        7.87        0.28   -3% -    
7%
         AndHighHigh       17.31        0.24       17.62        0.64   -3% -    
6%
          Fuzzy2F0.1       89.54        5.78       91.38        1.44   -5% -   
10%
       SpanNearF99.0        2.98        0.09        3.04        0.13   -5% -    
9%
                Term       58.94        6.06       60.38        4.40  -13% -   
22%
        SpanNearF0.1       29.91        1.07       30.70        1.43   -5% -   
11%
        SpanNearF0.5        8.73        0.30        8.98        0.41   -5% -   
11%
        SpanNearF5.0        3.33        0.11        3.42        0.16   -5% -   
11%
         Fuzzy2F50.0       80.90        5.19       83.29        2.28   -5% -   
13%
            SpanNear        3.01        0.10        3.10        0.14   -4% -   
11%
            TermF1.0       87.07        2.01       89.92        6.38   -6% -   
13%
       SpanNearF95.0        2.98        0.10        3.10        0.13   -3% -   
12%
        PhraseF100.0        3.37        0.06        3.51        0.17   -2% -   
11%
         PhraseF99.0        3.37        0.05        3.52        0.17   -2% -   
11%
         PhraseF95.0        3.37        0.06        3.56        0.18   -1% -   
12%
            PKLookup      126.08        5.73      133.37        1.97    0% -   
12%
       SpanNearF90.0        2.98        0.10        3.18        0.14   -1% -   
15%
         PhraseF90.0        3.38        0.06        3.61        0.18    0% -   
14%
      WildcardF100.0       32.22        1.76       34.59        1.43   -2% -   
18%
       WildcardF99.0       32.23        1.79       34.61        1.39   -2% -   
18%
   SloppyPhraseF75.0        7.74        0.16        8.32        0.33    1% -   
14%
       WildcardF95.0       32.15        1.83       34.72        1.37   -1% -   
19%
       WildcardF90.0       32.10        1.82       34.90        1.29    0% -   
19%
        Fuzzy1F100.0       42.36        1.85       46.19        2.07    0% -   
19%
    AndHighHighF50.0       16.76        0.10       18.30        1.44    0% -   
18%
         Fuzzy1F99.0       42.21        1.84       46.21        1.96    0% -   
19%
         Fuzzy2F20.0       81.24        5.06       88.97        2.11    0% -   
19%
         Fuzzy1F95.0       42.25        1.85       46.54        2.10    0% -   
20%
       WildcardF75.0       31.98        1.81       35.49        1.32    1% -   
22%
         Fuzzy1F90.0       42.15        1.77       46.84        1.91    2% -   
20%
         PhraseF75.0        3.39        0.06        3.82        0.20    4% -   
20%
         Fuzzy1F75.0       42.01        1.55       47.63        1.98    4% -   
22%
      OrHighHighF1.0       22.54        0.94       25.87        1.81    2% -   
28%
         Fuzzy2F10.0       81.10        5.01       93.81        2.75    5% -   
26%
       WildcardF50.0       32.66        1.88       37.81        1.33    5% -   
27%
         Fuzzy1F50.0       42.25        1.68       49.68        1.91    8% -   
27%
       SpanNearF75.0        2.98        0.10        3.51        0.16    9% -   
27%
          Fuzzy2F5.0       80.39        4.38       96.21        2.19   10% -   
29%
          Fuzzy2F0.5       83.14        4.67       99.73        1.75   11% -   
29%
          Fuzzy2F2.0       80.95        4.92       98.00        1.76   12% -   
31%
   SloppyPhraseF50.0        7.78        0.16        9.62        0.43   15% -   
31%
         PhraseF50.0        3.45        0.06        4.36        0.24   17% -   
35%
       WildcardF20.0       35.76        2.01       45.85        1.89   16% -   
41%
        WildcardF5.0       41.47        2.41       53.54        2.38   16% -   
43%
         Fuzzy1F20.0       43.60        1.76       57.50        2.00   22% -   
42%
       WildcardF10.0       38.26        2.17       50.63        2.11   20% -   
46%
           TermF99.0       40.49        1.22       54.84        4.93   19% -   
52%
          TermF100.0       40.51        1.29       54.99        4.92   19% -   
52%
           TermF95.0       40.44        1.19       54.95        4.82   20% -   
52%
           TermF90.0       40.34        1.08       55.00        4.58   21% -   
51%
      OrHighHighF2.0       18.15        0.69       24.94        1.69   23% -   
52%
            TermF2.0       63.47        1.48       87.39        5.94   25% -   
50%
           TermF75.0       40.05        0.92       55.28        4.38   24% -   
52%
          Fuzzy1F0.5       51.14        2.45       71.30        1.82   29% -   
50%
    OrHighHighF100.0        7.05        0.15        9.96        0.73   28% -   
54%
     OrHighHighF99.0        7.04        0.15        9.97        0.72   28% -   
55%
           TermF50.0       40.94        0.70       58.33        4.05   30% -   
55%
         Fuzzy1F10.0       43.92        1.78       62.74        1.47   34% -   
52%
     OrHighHighF95.0        7.08        0.14       10.12        0.70   30% -   
56%
     OrHighHighF90.0        7.10        0.15       10.31        0.71   32% -   
58%
          PhraseF5.0        5.02        0.10        7.33        0.48   33% -   
58%
          Fuzzy1F1.0       47.45        2.15       70.55        1.95   38% -   
60%
          Fuzzy1F5.0       44.47        1.99       66.38        1.89   38% -   
60%
          Fuzzy1F2.0       46.09        1.98       69.35        1.65   40% -   
60%
       SpanNearF50.0        2.98        0.10        4.51        0.23   39% -   
64%
     OrHighHighF75.0        7.20        0.15       10.97        0.73   39% -   
65%
    AndHighHighF20.0       16.92        0.14       26.80        2.77   40% -   
76%
         PhraseF20.0        3.69        0.06        5.86        0.36   46% -   
71%
           TermF20.0       42.65        0.76       69.54        4.48   49% -   
76%
         PhraseF10.0        4.10        0.07        6.74        0.43   51% -   
78%
     OrHighHighF50.0        7.61        0.17       12.77        0.76   54% -   
81%
      OrHighHighF5.0       13.68        0.48       23.13        1.55   52% -   
86%
            TermF5.0       47.37        1.30       81.16        5.35   55% -   
87%
           TermF10.0       43.07        0.95       74.83        4.64   59% -   
88%
     AndHighHighF1.0       32.98        0.48       59.25        8.46   51% -  
108%
   SloppyPhraseF20.0        8.00        0.16       14.72        0.84   70% -   
98%
     OrHighHighF10.0       11.20        0.34       21.25        1.38   72% -  
108%
     OrHighHighF20.0        9.32        0.22       18.10        1.12   77% -  
111%
    AndHighHighF10.0       17.54        0.16       35.08        4.05   75% -  
125%
     AndHighHighF2.0       24.58        0.27       52.49        7.16   82% -  
145%
     AndHighHighF5.0       19.26        0.17       43.11        5.37   94% -  
154%
   SloppyPhraseF10.0        8.24        0.16       19.96        1.29  122% -  
162%
       SpanNearF20.0        3.01        0.10        8.24        0.48  149% -  
199%
    SloppyPhraseF5.0        8.75        0.17       26.13        1.80  172% -  
225%
    SloppyPhraseF2.0       10.35        0.20       33.95        2.51  198% -  
259%
       SpanNearF10.0        3.09        0.10       12.23        0.76  259% -  
334%
        SpanNearF1.0        5.75        0.19       30.48        2.42  372% -  
492%
        SpanNearF2.0        4.21        0.13       24.77        1.80  428% -  
551%
{noformat}

                
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
>                 Key: LUCENE-1536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Reply via email to