[jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Simon Willnauer (JIRA) Sat, 09 Oct 2010 22:54:01 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919573#action_12919573
 ]


Simon Willnauer commented on LUCENE-2690:
-----------------------------------------

Guys, awesome improvements!! Here are some comments...

* In CutOffTermCollector:
{code} final BytesRefHash pendingTerms = new BytesRefHash(new ByteBlockPool(new 
RecyclingByteBlockAllocator()));{code}
Sice we do not reuse the allocator we don't need to use the synced one here. 
There is no reset call anywhere to free the allocated blocks too. We should 
just use new BytesRefHash() here.


* BooleanQueryRewrite#rewrite uses a HashMap to keep track of BytesRef and 
TermFreqBoost. I wonder if we should make use of the ParallelArray technique we 
us in the indexing chain together with a BytesRefHash which could safe us lots 
of object creation and GC cost would be lower to once MTQ gets under load. 
Those MTQ can create a very large amount of objects though and this seems to be 
a hot spot. I currently have use-cases for direct support of something like a 
ParallelArray base class in LUCENE-2186 and it seems we can use it here too.

* In FloatsUtil#nextAfter I wonder if we need the following lines:  {code}
return new Float(direction)
...
return Double.valueOf(direction).floatValue();
{code} since those methods do nothing else than a (float) direction case really.

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using 
> TopTermsBooleanQueryRewrite), the auto constant rewrite method and the 
> ScoringBQ rewrite methods using a MultiFields wrapper on the top-level 
> reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses 
> some additional datastructures (hashed sets/maps) to exclude duplicate terms. 
> All tests currently pass, but FuzzyQuery's tests should not, because it 
> depends for the minimum score handling, that the terms are collected in 
> order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Reply via email to