[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Mon, 28 Jul 2014 22:32:17 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Da Huang updated LUCENE-4396:
-----------------------------

    Attachment: LUCENE-4396.patch

This is a patch based on git mirror commit 
ce7d0578b30981d15687bf76aec595274efccbad
I've tried to merge all explored methods to get a better performance for 
boolean retrieval.

In this patch, I just mix methods in BooleanQuery.BooleanWeight.scorer()
I have tried to mix methods in .bulkScorer(), but it fails to pass the ant-test.

It took me lots of time to figure out the cause.
It turned out that I'm not supposed to call w.bulkScorer() to get optional 
scorer,
as well as prohibited scorer, in BooleanQuery.BooleanWeight.bulkScorer(), 
or the TestBooleanScorer.testEmbeddedBooleanScorer will throws an 
UnsupportedOperationException
because it calls an unimplemented .scorer() method.

It makes me embarrassed that I'm not able to get the cost of a scorer 
without an instance of Scorer.

Therefore, my next step is to check whether I can get optional scorer in 
.bulkScorer().
If yes, do the similar things as .scorer(). If no, just call BooleanScorer();

Besides, I'm very sorry that the code in this patch may looks ugly, 
as I haven't spared enough time to rearrange the code.

{code}
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndTonsLowNot        4.06      (4.0%)        3.44      (5.1%)  
-15.5% ( -23% -   -6%)
       HighAndSomeLowNot       17.02      (5.3%)       15.61      (9.2%)   
-8.3% ( -21% -    6%)
        HighAndTonsLowOr        5.82      (5.0%)        5.67      (1.5%)   
-2.6% (  -8% -    4%)
        LowAndSomeHighOr       55.03      (3.0%)       54.39      (2.2%)   
-1.2% (  -6% -    4%)
      HighAndSomeHighNot        1.24      (2.3%)        1.23      (2.3%)   
-1.0% (  -5% -    3%)
         LowAndSomeLowOr      231.48      (1.8%)      229.47      (2.1%)   
-0.9% (  -4% -    3%)
                PKLookup       97.60      (2.1%)       97.63      (2.2%)    
0.0% (  -4% -    4%)
        LowAndSomeLowNot      312.07      (2.0%)      312.28      (2.1%)    
0.1% (  -3% -    4%)
       HighAndSomeHighOr        1.69      (2.6%)        1.69      (1.2%)    
0.4% (  -3% -    4%)
        HighAndSomeLowOr       14.28     (11.7%)       14.81      (4.7%)    
3.7% ( -11% -   22%)
       LowAndSomeHighNot       34.74      (2.9%)       36.83      (2.6%)    
6.0% (   0% -   11%)
        LowAndTonsHighOr        2.34      (2.7%)        2.90      (3.2%)   
24.3% (  17% -   30%)
         LowAndTonsLowOr       18.88      (1.0%)       25.14      (3.0%)   
33.2% (  28% -   37%)
        LowAndTonsLowNot       15.78      (1.4%)       22.29      (2.0%)   
41.2% (  37% -   45%)
       HighAndTonsHighOr        0.06      (0.6%)        0.17      (5.8%)  
179.9% ( 172% -  187%)
       LowAndTonsHighNot        1.33      (2.4%)        4.29      (8.1%)  
223.5% ( 207% -  239%)
      HighAndTonsHighNot        0.06      (1.8%)        0.34     (17.3%)  
495.0% ( 467% -  523%)
{code}   

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf, 
> luceneutil-score-equal.patch, luceneutil-score-equal.patch, stat.cpp, stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to