[jira] [Commented] (LUCENE-5049) Native (C++) implementation of "pure OR" BooleanQuery

Michael McCandless (JIRA) Sun, 09 Jun 2013 16:30:23 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679219#comment-13679219
 ]


Michael McCandless commented on LUCENE-5049:
--------------------------------------------

bq. Okay, I'll be the first to ask it: C++? Really? Is this the beginning of 
the end for Java for the world of high-performance search?

I don't think so.  This is just an option and it only matches users
doing OR BQ over TQ against one field, with default codec/sim, etc.
I certainly don't think we should switch Lucene to C++.

bq. Seriously, a second question: What about alternative JVM-based languages? I 
mean, maybe Java does have excess baggage related to its quirky semantics, but 
could the raw JVM support a lower-level implementation of BQ, without leaving 
the JVM... "bubble"? OTOH, maybe different JVM's could have different 
performance characteristics.

We should explore that!  I have no idea.

bq. Oh, and what compiler/machine architecture was this for?

Linux / x86 is what I tested on, but I think the code would work fine on other 
OS's / CPUs.

bq. Another question: might there be alternative representations of BQ based on 
what exactly the clauses are?

?

bq. OTOH, for us Solr guys, there is somewhat the impression that raw Lucene 
search is blazing fast already and not the bottleneck for Solr where other 
things, like caches and facets and highlighting are the concern.

Faceting/highlighting are definitely costly...

bq. Finally, some of these gains seem... marginal if not outright disappointing 
considering the raw expectation that bare C++ should be a LOT faster. So, is 
this maybe more of a "See, C++ doesn't have THAT big an advantage over Java 
even for core search operations?

Actually I think ~300% gains are unexpectedly high: they were more
than I expected.

It would be nice if most of those gains were from code spec and not
from Java/C++ ... then we could say "C++ doesn't have that big an
advantage over Java", but it's not clear now where the gains come
from.

                
> Native (C++) implementation of "pure OR" BooleanQuery
> -----------------------------------------------------
>
>                 Key: LUCENE-5049
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5049
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5049.patch
>
>
> I've been playing with a C++ implementation of BooleanQuery containing
> only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.
> The results are impressive: ~3X speedup for BQ OR over two terms, and
> also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
> to BQ OR over N terms:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev      
>           Pct diff
>                  MedTerm       69.47     (15.8%)       68.61     (13.4%)   
> -1.2% ( -26% -   33%)
>                 HighTerm       55.25     (16.2%)       54.63     (13.9%)   
> -1.1% ( -26% -   34%)
>                  LowTerm      333.10      (9.6%)      329.43      (8.0%)   
> -1.1% ( -17% -   18%)
>                   IntNRQ        3.37      (2.6%)        3.36      (4.6%)   
> -0.2% (  -7% -    7%)
>                  Prefix3       18.91      (2.0%)       19.04      (3.5%)    
> 0.7% (  -4% -    6%)
>                 Wildcard       29.40      (1.7%)       29.70      (2.8%)    
> 1.0% (  -3% -    5%)
>                MedPhrase      132.69      (6.2%)      134.66      (7.0%)    
> 1.5% ( -11% -   15%)
>         HighSloppyPhrase        0.82      (3.6%)        0.83      (3.5%)    
> 1.9% (  -5% -    9%)
>              AndHighHigh       19.65      (0.6%)       20.02      (0.8%)    
> 1.9% (   0% -    3%)
>               HighPhrase       11.74      (6.6%)       11.96      (7.1%)    
> 1.9% ( -11% -   16%)
>          MedSloppyPhrase       29.09      (1.2%)       29.76      (1.9%)    
> 2.3% (   0% -    5%)
>          LowSloppyPhrase       25.71      (1.4%)       26.98      (1.7%)    
> 4.9% (   1% -    8%)
>                  Respell      173.78      (3.0%)      182.41      (3.7%)    
> 5.0% (  -1% -   12%)
>              MedSpanNear       27.67      (2.5%)       29.07      (2.4%)    
> 5.1% (   0% -   10%)
>             HighSpanNear        2.95      (2.4%)        3.10      (2.8%)    
> 5.4% (   0% -   10%)
>              LowSpanNear        8.29      (3.4%)        8.82      (3.3%)    
> 6.4% (   0% -   13%)
>               AndHighMed       79.32      (1.6%)       84.44      (1.0%)    
> 6.5% (   3% -    9%)
>                LowPhrase       23.20      (2.0%)       25.14      (1.6%)    
> 8.4% (   4% -   12%)
>               AndHighLow      594.17      (3.4%)      660.32      (1.9%)   
> 11.1% (   5% -   16%)
>                   Fuzzy2       88.32      (6.4%)      121.44      (1.7%)   
> 37.5% (  27% -   48%)
>                   Fuzzy1       86.34      (6.0%)      153.49      (1.7%)   
> 77.8% (  66% -   90%)
>               OrHighHigh       16.29      (2.5%)       48.29      (1.3%)  
> 196.5% ( 188% -  205%)
>                OrHighMed       28.98      (2.7%)       87.81      (0.9%)  
> 203.0% ( 194% -  212%)
>                OrHighLow       27.38      (2.6%)       84.94      (1.1%)  
> 210.3% ( 201% -  219%)
> {noformat}
> This is essentially a scaled back attempt at LUCENE-1594 in that it's
> "hardwired" to "just" the "OR of TermQuery" case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5049) Native (C++) implementation of "pure OR" BooleanQuery

Reply via email to