[ 
https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919528#action_12919528
 ] 

Michael McCandless commented on LUCENE-2690:
--------------------------------------------

We have to sort the terms coming out of the BytesRefHash, else we get bad seek 
performance because the within-block seek opto will otherwise often fail to 
apply...

So I used a TreeMap instead of HashMap.

Then ran a quick perf test on 10 M Wikipedia index:

||Query||QPS clean||QPS mtqseg||Pct diff||||
|unit*|11.83|11.80|{color:red}-0.3%{color}|
|un*d|13.64|16.95|{color:green}24.3%{color}|
|u*d|2.67|3.77|{color:green}41.1%{color}|
|un*ed|34.85|74.94|{color:green}115.0%{color}|
|uni*ed|183.37|437.13|{color:green}138.4%{color}|

So these are good gains!  I can't run FuzzyQuery until we fix the tie-break 
problem...

I'm really not sure why the prefix query sees no gain yet the others do (I 
would have actually expected the reverse, because PrefixTermsEnum's accept 
method is so simple).

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using 
> TopTermsBooleanQueryRewrite), the auto constant rewrite method and the 
> ScoringBQ rewrite methods using a MultiFields wrapper on the top-level 
> reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses 
> some additional datastructures (hashed sets/maps) to exclude duplicate terms. 
> All tests currently pass, but FuzzyQuery's tests should not, because it 
> depends for the minimum score handling, that the terms are collected in 
> order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to