[ 
https://issues.apache.org/jira/browse/LUCENE-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049941#comment-14049941
 ] 

Terry Smith commented on LUCENE-5796:
-------------------------------------

Thanks for taking the time to review my patch and comment on the approach.

The reason that I advocated changing FilterScorer and BoostedScorer is to allow 
some of my custom Query implementations to use a regular BooleanQuery for 
recall and optionally scoring while taking advantage of the actual Scorers used 
on a per document, per clause basis.

This has been working great across quite a few Lucene releases but failed when 
I upgraded to 4.9 due to the two regressions in behavior for 
Scorer.getChildren() as described in this ticket.

In this scenario, a BooleanQuery containing two TermQueries (one a miss and the 
other a hit) returns the following from BooleanWeight.scorer():

* BoostedScorer
** TermScorer (hit)

Calling getChildren() on this returns an empty list because the BoostedScorer 
just returns in.getChildren() and thus you are unable to navigate to the actual 
TermScorer in play. This would impact any classes that extend FilterScorer and 
don't override getChildren(). In other words, the current wiring does make the 
BoostedScorer transparent but with the disadvantage of hiding the actual scorer 
that performs the work.

If this is an unsupported workflow, I'm happy to move the discussion over to 
the user mailing list.    

> Scorer.getChildren() can throw or hide a subscorer for some boolean queries
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-5796
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5796
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.9
>            Reporter: Terry Smith
>            Priority: Minor
>         Attachments: LUCENE-5796.patch
>
>
> I've isolated two example boolean queries that don't behave with release 4.9 
> of Lucene.
> # A BooleanQuery with three SHOULD clauses and a minimumNumberShouldMatch of 
> 2 will throw an ArrayIndexOutOfBoundsException.
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 2
>       at 
> __randomizedtesting.SeedInfo.seed([2F79B3DF917D071B:2539E6DBC4DF793C]:0)
>       at 
> org.apache.lucene.search.MinShouldMatchSumScorer.getChildren(MinShouldMatchSumScorer.java:119)
>       at 
> org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.summarizeScorer(TestBooleanQueryVisitSubscorers.java:261)
>       at 
> org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.setScorer(TestBooleanQueryVisitSubscorers.java:238)
>       at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:161)
>       at 
> org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:64)
>       at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
>       at 
> org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94)
>       at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
>       at 
> org.apache.lucene.search.TestBooleanQueryVisitSubscorers.testGetChildrenMinShouldMatchSumScorer(TestBooleanQueryVisitSubscorers.java:196)
> {noformat}
> # A BooleanQuery with two should clauses, one of which is a miss for all 
> documents in the current segment will accidentally mask the scorer that was a 
> hit.
> Unit tests and patch based on {{branch_4x}} are available and will be 
> attached as soon as this ticket has a number.
> They are immediately available on GitHub on branch 
> [shebiki/bqgetchildren|https://github.com/shebiki/lucene-solr/commits/bqgetchildren]
>  as commit 
> [c64bb6f|https://github.com/shebiki/lucene-solr/commit/c64bb6f2df8f33dd8daafc953d9c27b5cbf29fa3].
> I took the liberty of naming the relationship in BoostingScorer.getChildren() 
> {{BOOSTING}}. Suspect someone will offer a better name for this. Here is a 
> summary of the various relationships in play for all Scorer.getChildren() 
> implementations on {{branch_4x}} to help choose.
> || class                                                               || 
> relationships
> | org.apache.lucene.search.AssertingScorer                             | 
> SHOULD
> | org.apache.lucene.search.join.ToParentBlockJoinQuery.BlockJoinScorer | 
> BLOCK_JOIN
> | org.apache.lucene.search.ConjunctionScorer                           | MUST
> | org.apache.lucene.search.ConstantScoreQuery.ConstantScorer           | 
> constant
> | org.apache.lucene.queries.function.BoostedQuery.CustomScorer         | 
> CUSTOM
> | org.apache.lucene.queries.CustomScoreQuery.CustomScorer              | 
> CUSTOM
> | org.apache.lucene.search.DisjunctionScorer                           | 
> SHOULD
> | org.apache.lucene.facet.DrillSidewaysScorer.FakeScorer               | MUST
> | org.apache.lucene.search.FilterScorer                                | 
> calls in.getChildren() 
> | org.apache.lucene.search.ScoreCachingWrappingScorer                  | 
> CACHED
> | org.apache.lucene.search.FilteredQuery.LeapFrogScorer                | 
> FILTERED
> | org.apache.lucene.search.MinShouldMatchSumScorer                     | 
> SHOULD
> | org.apache.lucene.search.FilteredQuery                               | 
> FILTERED
> | org.apache.lucene.search.ReqExclScorer                               | MUST
> | org.apache.lucene.search.ReqOptSumScorer                             | 
> MUST, SHOULD
> | org.apache.lucene.search.join.ToChildBlockJoinQuery                  | 
> BLOCK_JOIN
> I also removed FilterScorer.getChildren() to prevent mistakes and force 
> subclasses to provide a correct implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to