[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

Uwe Schindler (JIRA) Tue, 11 Sep 2012 11:26:09 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453259#comment-13453259
 ]


Uwe Schindler commented on LUCENE-2684:
---------------------------------------

An idea (separate issue!) would be:
BS1 completely violates the scorer interface, the only method you can call is 
the one taking a Collector. In my opinion, BS1 should *not* implement the 
Scorer interface, that the whole bug! It should maybe some separate class like 
OutOfOrderDocIdReporter (name is just an example) that only implements 
collect(Collector). And the navigation api (advance, next) should be separated 
from score() and freq() - a simple java interface Scorer. So the current 
in-order scorer would be a simple DocIdSetIterator that additionally implements 
the Scorer interface (to provide score() and freq()) and current out-of-order 
scorers would implement only the OutOfOrderDocIdReporter API and pass a inlined 
Scorer interface (without advance and next) to the setScorer() method (like 
BucketScorer currently).
                
> it's not possible to access sub-query's freq information if BooleanScorer is 
> use
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-2684
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2684
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>            Reporter: Michael McCandless
>             Fix For: 4.1
>
>
> LUCENE-2590 added an advanced feature, allowing an app to gather all 
> sub-scorers for any Query.
> This is powerful because then, during collection, the app can get some 
> details about how each sub-query "participated" in the overall match for the 
> given document.
> However, I think this is completely broken if the BooleanQuery uses 
> BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
> processes chunks of 2048 sequential docIDs per scorer.  This is a big 
> performance gain, but it means that the sub scorers will all be positioned to 
> the end of the 2048 doc chunk while the docs that matched within that chunk 
> are collected.
> I don't think we can easily fix this... likely the "fix" is to make it 
> easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
> actually possible to force this, today, by having your collector return false 
> from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

Reply via email to