[jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Simon Willnauer (JIRA) Thu, 14 Oct 2010 03:02:31 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920909#action_12920909
 ]


Simon Willnauer commented on LUCENE-2690:
-----------------------------------------

bq. The code in BooleanQueryRewrite uses += for the boost and docFreq in the 
case of (>=0, no entry in BytesRefHash), but this should only be an assignment. 
The update and comparison in the assert should be done only when an entry is 
already in the hash. Boosts should never be sumed up.
ah yeah - true for sure! it did not break since that only happens once when it 
is initially added. but you are right for sure that this should only be an 
assignment

{quote}
But there is also a problem with the current code in TermFreqBoostByteStart: 
The arrays may not use the exact same size as expected (depending how 
oversize/grow works). As they are parallel arrays, all should be equal size, so 
we should only use grow/oversize only for the base array and resize the others 
to same size. Do we have an ArrayUtil method for that? Currently it (may) be 
broken. Any comments?
{quote}

good catch man! this won't happen here but its cleaner to use the exact same 
size. The bigger problem is that I missed to add the right constant to the grow 
method though. I can fix in a minute

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-hack.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using 
> TopTermsBooleanQueryRewrite), the auto constant rewrite method and the 
> ScoringBQ rewrite methods using a MultiFields wrapper on the top-level 
> reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses 
> some additional datastructures (hashed sets/maps) to exclude duplicate terms. 
> All tests currently pass, but FuzzyQuery's tests should not, because it 
> depends for the minimum score handling, that the terms are collected in 
> order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Reply via email to