[jira] Updated: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Uwe Schindler (JIRA) Thu, 14 Oct 2010 13:01:55 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-2690:
----------------------------------

    Attachment: LUCENE-2690.patch

Attached is a new patch with two changes:

- moved the BQ reordering to MTQ for now. A general reordering of 
BooleanQueries should be done in a separate issue (with more performant 
rewrite). Currently this uses the same comparator like BQ before. You may 
wonder: why not simply use a sorted map? - the idea is that sorting at the end 
is faster than using a TreeMap where all terms are compared against (even those 
falling out of queue). I sort the BQ clauses directly like BQ, to not create an 
additional array to hold all terms again. Maybe its still faster by copying all 
BytesRefs to an array before and then build BQ? For now this should be enough. 
To improve we need SorterTemplate again (for the BytesRefHash case) :-)
- fixed an issue with the PQ in TopTermsRewrite: The bottom information was 
previously only set when the PQ was overflowing. In the past and now its set 
once the queue is full. This was an optimization bug, its now as it was always. 
Maybe this explains Mike's score changes on wikipedia index?

Mike: can you test?

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, 
> LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch, 
> LUCENE-2690-hack.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using 
> TopTermsBooleanQueryRewrite), the auto constant rewrite method and the 
> ScoringBQ rewrite methods using a MultiFields wrapper on the top-level 
> reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses 
> some additional datastructures (hashed sets/maps) to exclude duplicate terms. 
> All tests currently pass, but FuzzyQuery's tests should not, because it 
> depends for the minimum score handling, that the terms are collected in 
> order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Reply via email to