[jira] [Commented] (LUCENE-5049) Native (C++) implementation of "pure OR" BooleanQuery

Michael McCandless (JIRA) Sun, 09 Jun 2013 16:19:21 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679218#comment-13679218
 ]


Michael McCandless commented on LUCENE-5049:
--------------------------------------------

bq. This is an apples vs oranges comparison.

I agree: all we can conclude from the benchmark is that this hardwired
C++ impl is ~3X faster (on OR-of-TermQuery) than what we have today in
Java.

We can't tell exactly where the gains come from, i.e. how much is due
to 1) Java vs C and how much from 2) specializing everything to single
dedicated code.

If we could build the same specialized code in Java then we could
separately test these contributions.  Maybe we could also test which
components (decoding postings, matching, scoring, collecting) gave
which part of the gains.

                
> Native (C++) implementation of "pure OR" BooleanQuery
> -----------------------------------------------------
>
>                 Key: LUCENE-5049
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5049
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5049.patch
>
>
> I've been playing with a C++ implementation of BooleanQuery containing
> only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.
> The results are impressive: ~3X speedup for BQ OR over two terms, and
> also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
> to BQ OR over N terms:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev      
>           Pct diff
>                  MedTerm       69.47     (15.8%)       68.61     (13.4%)   
> -1.2% ( -26% -   33%)
>                 HighTerm       55.25     (16.2%)       54.63     (13.9%)   
> -1.1% ( -26% -   34%)
>                  LowTerm      333.10      (9.6%)      329.43      (8.0%)   
> -1.1% ( -17% -   18%)
>                   IntNRQ        3.37      (2.6%)        3.36      (4.6%)   
> -0.2% (  -7% -    7%)
>                  Prefix3       18.91      (2.0%)       19.04      (3.5%)    
> 0.7% (  -4% -    6%)
>                 Wildcard       29.40      (1.7%)       29.70      (2.8%)    
> 1.0% (  -3% -    5%)
>                MedPhrase      132.69      (6.2%)      134.66      (7.0%)    
> 1.5% ( -11% -   15%)
>         HighSloppyPhrase        0.82      (3.6%)        0.83      (3.5%)    
> 1.9% (  -5% -    9%)
>              AndHighHigh       19.65      (0.6%)       20.02      (0.8%)    
> 1.9% (   0% -    3%)
>               HighPhrase       11.74      (6.6%)       11.96      (7.1%)    
> 1.9% ( -11% -   16%)
>          MedSloppyPhrase       29.09      (1.2%)       29.76      (1.9%)    
> 2.3% (   0% -    5%)
>          LowSloppyPhrase       25.71      (1.4%)       26.98      (1.7%)    
> 4.9% (   1% -    8%)
>                  Respell      173.78      (3.0%)      182.41      (3.7%)    
> 5.0% (  -1% -   12%)
>              MedSpanNear       27.67      (2.5%)       29.07      (2.4%)    
> 5.1% (   0% -   10%)
>             HighSpanNear        2.95      (2.4%)        3.10      (2.8%)    
> 5.4% (   0% -   10%)
>              LowSpanNear        8.29      (3.4%)        8.82      (3.3%)    
> 6.4% (   0% -   13%)
>               AndHighMed       79.32      (1.6%)       84.44      (1.0%)    
> 6.5% (   3% -    9%)
>                LowPhrase       23.20      (2.0%)       25.14      (1.6%)    
> 8.4% (   4% -   12%)
>               AndHighLow      594.17      (3.4%)      660.32      (1.9%)   
> 11.1% (   5% -   16%)
>                   Fuzzy2       88.32      (6.4%)      121.44      (1.7%)   
> 37.5% (  27% -   48%)
>                   Fuzzy1       86.34      (6.0%)      153.49      (1.7%)   
> 77.8% (  66% -   90%)
>               OrHighHigh       16.29      (2.5%)       48.29      (1.3%)  
> 196.5% ( 188% -  205%)
>                OrHighMed       28.98      (2.7%)       87.81      (0.9%)  
> 203.0% ( 194% -  212%)
>                OrHighLow       27.38      (2.6%)       84.94      (1.1%)  
> 210.3% ( 201% -  219%)
> {noformat}
> This is essentially a scaled back attempt at LUCENE-1594 in that it's
> "hardwired" to "just" the "OR of TermQuery" case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5049) Native (C++) implementation of "pure OR" BooleanQuery

Reply via email to