[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2019-03-27 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802884#comment-16802884
 ] 

Alan Woodward commented on LUCENE-6229:
---

I think it's worth revisiting this now?  The Matches API can provide 
information about which clauses of a Boolean query match for a given hit, and 
custom scoring is doable via FunctionScoreQuery and 
DoubleValuesSource.fromQuery()

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-11-21 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020732#comment-15020732
 ] 

Mikhail Khludnev commented on LUCENE-6229:
--

[~jpountz] our custom logic also extensively uses {{getChildren()}} we analyze 
hits across all hits, not only top ones. 
It somehow similar to LUCENE-1999 
http://lucene.472066.n3.nabble.com/Highlighting-catering-for-all-query-types-td572693.html
  
Surely. We have to deal with suppressing term-at-time scorer, and sub-scorers 
positioning. It lacks of design beauty but it works while we have 
{{getChildren()}}.

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-11-20 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018538#comment-15018538
 ] 

Adrien Grand commented on LUCENE-6229:
--

Couldn't you do it by evaluating each clause on the top hits?

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-11-20 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018531#comment-15018531
 ] 

Karl Wright commented on LUCENE-6229:
-

We're using getChildren() solely for inspection purposes, so I'd be very sad if 
this capability went away.  If it *does* go away, FWIW we use this to figure 
out which clause(s) of a BooleanQuery a document comes from, so we'd need 
equivalent functionality at that point.

Thanks!

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-24 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335022#comment-14335022
 ] 

Terry Smith commented on LUCENE-6229:
-

Understood.

If you end up keeping getChildren(), how do you feel about making it well 
defined by capturing these constraints in the Javadoc?



> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-24 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334956#comment-14334956
 ] 

Robert Muir commented on LUCENE-6229:
-

{quote}
Can I position the returned scorers on the current document by calling freq(), 
score() or something else?
{quote}

No: this positioning is the whole problem here!

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-24 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334937#comment-14334937
 ] 

Terry Smith commented on LUCENE-6229:
-

[~rcmuir] Sorry for excluding that scenario, it wasn't intentional.

If you all decide to keep getChildren(), then I'd love to get the contract 
described so people know what to expect.

I think these statements are correct:

# Scorer.getChildren() returns the immediate child scorers
# A returned scorer may be 
## unpositioned (never had next() or advance() called on it)
## positioned on a valid document that is before, on, or after the current 
document
## exhausted and thus positioned at NO_MORE_DOCS
# You MUST NOT call next() or advance() on the returned scorers yourself

And have these questions:

# Can I walk the returned scorers to get to all non-null leaf scorers?
# Can I position the returned scorers on the current document by calling 
freq(), score() or something else?




> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-20 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329108#comment-14329108
 ] 

Adrien Grand commented on LUCENE-6229:
--

bq. just let getChildren() be completely transparent about where scorers are 
positioned

I agree this is the only reasonable definition of getChildren that we could 
have if we want to keep it, otherwise it puts too many constraints on the 
scorers.

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329082#comment-14329082
 ] 

Robert Muir commented on LUCENE-6229:
-

I don't understand why we quickly threw out the third option i presented: just 
let getChildren() be completely transparent about where scorers are positioned, 
it reflects reality and means it will not infringe on the actual query 
performance. This still allows it to be used at least for debugging.

If we insist that it returns correctly positioned scorers, then honestly i 
don't think it will ever happen. It will stay half-broken and undefined like 
today instead. The performance cost is far too high, and specializing scors via 
an option is too expensive in terms of maintenance. We already have a hard 
enough time with the scorers we have. 

For extremely expert use cases like using getChildren() to "subclass" a query, 
there are other alternatives, considering the code is open source, like just 
forking that query or whatever. At some point, something is just expert enough 
where that is the correct solution to the problem.

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-20 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329069#comment-14329069
 ] 

Terry Smith commented on LUCENE-6229:
-

I’ll summarize this as two options:

# Remove getChildren() as it complicates the code hurting the ability to 
maintain it and make performance enhancements.
# Make getChildren() a more well defined API that gives you the ability to 
retrieve child scorers that are correctly positioned.

You are looking for data to backup option 2 to determine if this is an API that 
is worth fixing/keeping.

Here are the use cases that I have:

# Custom scoring of a BooleanQuery. A query that wraps any BooleanQuery which 
it uses for recall but supplies it’s own scoring algorithm to aggregate the 
scores from the original clauses.
# Custom DrillSidewaysQuery. A query that can use the sideways scorers for 
precision instead of just recall.
# Recursive DrillSidewaysQuery (not implemented, tricky). A query that could 
perform DrillSideways for union or in a nested fashion.
# Auxillary metadata. An enhancement that can augment the current recall 
(boolean match) and precision (float score) for a document in the search 
pipeline to add extra information that can be used from Query and FunctionValue 
instances (collected via a custom Collector) and supported by a custom 
SortField.

These can be categorized into two camps:

# Using an existing Query (typically BooleanQuery) to find matches but 
providing some combination of
## custom scoring that isn’t compatible with the Similarity API.
## custom recall (think DrillSideways)
# Adding extra information to the search pipeline that can be 
## generated by leaf queries and value sources
## aggregated by composing queries (BooleanQuery, DisjunctionMaxQuery, etc)
## survive wrapping queries and value sources that don’t know about it
## collected and sorted on

Hope this helps.

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324759#comment-14324759
 ] 

Robert Muir commented on LUCENE-6229:
-

I think a lot of the problem i have with the current API is that it is so 
undefined, for example when getChildren() or freq() are allowed to be called.

In some cases like ReqOptSumScorer, we can work around the issue partially, but 
it makes code more confusing (it calls score() in its freq(), which might 
invoke a deferred advance() on the optional part). And it doesn't really fix 
the issue, if you navigate to the "leaf" nodes you will find the optional ones 
positioned on the wrong docID often. I think the optimization in this case is 
simple and effective, so it would be a shame to limit our performance because 
of this API.

On the other hand, I like the idea of being able to navigate the scorers tree, 
at least for debugging. If the API was completely "transparent" it would still 
meet that use case. So the main question is, what are the other important use 
cases? Do they justify limiting our performance or completely specializing 
potentially many scorers to support, or is there a more restrictive API that 
would still work.

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-17 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324686#comment-14324686
 ] 

Terry Smith commented on LUCENE-6229:
-

[~rcmuir] Thanks for the backstory. I've been trying to wrap my head around 
where Lucene is going and this kind of information really helps.

It sounds like both [~rcmuir] and [~thetaphi] agree that Scorer.getChildren() 
is not an API that Lucene should support. Reading between the lines, this 
implies to me that scoring is moving to a bulk-only approach, which will bring 
great performance gains.

A best effort implementation of Scorer.getChildren() would be something that 
I'd be uncomfortable adding features on top of, although it could be useful for 
debugging. Unfortunately this is a showstopper for me as I rely on 
Scorer.getChildren() for some critical features, and need to do some serious 
thinking to figure out if I can formulate an alternative approach.


> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318298#comment-14318298
 ] 

Uwe Schindler commented on LUCENE-6229:
---

I agree,
actually only one of my customers really used this. And at the time they did 
this, the API was already not perfect, and as Robert said - incomplete and not 
guaranteed to work (Lucene 3 times) - e.g. it only worked if bulk scoring was 
not enabled (the customer just passed correct booleans to weight, so it thought 
that it was not top-level query). The API was really cool in Lucene 4+, but I 
was never using it for real use cases.

The other (current) customer split ther query up already before the scorer was 
invoved (they are interested in counts on sub querys) - so removal is not an 
issue. The code used was actually written by [~simonw], I just inherited it :-)

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318287#comment-14318287
 ] 

Robert Muir commented on LUCENE-6229:
-

I dont think this is guaranteed. Lets be clear about how this went down.

The functionality was added on LUCENE-2590, but was hardcoded to only work with 
booleanscorer.  Just look at the patch, you dont have to believe me. There was 
also a lot of inconsistencies regarding various scorers. For example issues 
like LUCENE-2686.

But we didnt slow down our core functionality, instead we waited until there 
was a way to clean this stuff up non-intrusively. I added the getChildren API 
to more scorers and fixed lots of bugs like this in issues like LUCENE-3505.

However, this api is *always* best effort. The one and only purpose of scorers 
is to score hits. If someone can make disjunctionscorer 5% faster, and 
completely break this getChildren API, I will be +1.


> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318279#comment-14318279
 ] 

Uwe Schindler commented on LUCENE-6229:
---

I had/have 2 customers using that... But I don't think it is really in wide use.

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-12 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318272#comment-14318272
 ] 

Terry Smith commented on LUCENE-6229:
-

[~jpountz] - I'm going to split the freq() vs score() thing into a separate 
ticket so it doesn't hijack this one. I intend to take the unit test I 
previously pasted and extend it to create some randomized BooleanQuerys to try 
and locate possibly broken edge cases and give a safety blanket for future 
refactoring.

I'll make these assumptions, shout out if they are incorrect.

For a BooleanQuery I should be able to perform doc-at-a-time scoring, meaning 
that in a Collector or Scorer I can

1. Find all Scorers from the child clauses of the BooleanQuery
2. Have those Scorers be positioned for me by calling score() or freq()


> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314361#comment-14314361
 ] 

Terry Smith commented on LUCENE-6229:
-

h2. freq() vs score()

I think the lazy positioning in MinShouldMatchSumScorer is misbehaving.

Drop these three methods into TestBooleanMinShouldMatch.java to see.
{code:java}
public void testMinNrShouldMatchFreq() throws Exception {
  BooleanQuery q = new BooleanQuery();
  q.add(new TermQuery(new Term("data", "1")), Occur.SHOULD);
  q.add(new TermQuery(new Term("data", "2")), Occur.SHOULD);
  q.add(new TermQuery(new Term("data", "3")), Occur.SHOULD);
  q.add(new TermQuery(new Term("id", "0")), Occur.MUST);
  q.setMinimumNumberShouldMatch(2);
  verifyNrHits(q, 1);
  s.search(q, new SimpleCollector() {
private Scorer scorer;
private Collection leafScorers;
@Override
public void setScorer(Scorer scorer) throws IOException {
  this.scorer = scorer;
  this.leafScorers = leafScorers(new ArrayList(), scorer);
  assertEquals(4, leafScorers.size());
}

@Override
public void collect(int doc) throws IOException {
  assertEquals(0, doc);
  scorer.freq(); // position leaf scorers
  for (Scorer leafScorer : leafScorers) {
assertEquals(0, leafScorer.docID());
  }
}
  });
}

public void testMinNrShouldMatchScore() throws Exception {
  BooleanQuery q = new BooleanQuery();
  q.add(new TermQuery(new Term("data", "1")), Occur.SHOULD);
  q.add(new TermQuery(new Term("data", "2")), Occur.SHOULD);
  q.add(new TermQuery(new Term("data", "3")), Occur.SHOULD);
  q.add(new TermQuery(new Term("id", "0")), Occur.MUST);
  q.setMinimumNumberShouldMatch(2);
  verifyNrHits(q, 1);
  s.search(q, new SimpleCollector() {
private Scorer scorer;
private Collection leafScorers;
@Override
public void setScorer(Scorer scorer) throws IOException {
  this.scorer = scorer;
  this.leafScorers = leafScorers(new ArrayList(), scorer);
  assertEquals(4, leafScorers.size());
}

@Override
public void collect(int doc) throws IOException {
  assertEquals(0, doc);
  scorer.score(); // position leaf scorers
  for (Scorer leafScorer : leafScorers) {
assertEquals(0, leafScorer.docID());
  }
}
  });
}

private static Collection leafScorers(Collection target, 
Scorer scorer) {
  Collection childScorers = scorer.getChildren();
  if (childScorers.isEmpty()) {
target.add(scorer);
  } else {
for (ChildScorer childScorer : childScorers) {
  leafScorers(target, childScorer.child);
}
  }
  return target;
}
{code}

Here the one that uses freq() to position the sub scorers fails but the one 
that uses score() succeeds.

h2. middle ground

I have Scorer constructors, Weight.scorer(), Weight.explain() and Collectors 
all calling Scorer.getChildren(). But when using my custom Collectors I'm 
careful to wrap the Query in a custom NonBulkScoringQuery that prevents bulk 
scoring to work around the trap. The NonBulkScoringQuery I mention is a simple 
delegating Query that allows Weight.bulkScorer() to use it's default 
implementation instead of allowing the wrapped Query to override it.

I like removing the trap for bulk scoring queries, it's really subtle and it 
took me a while to diagnose the first time I hit it.

Having a separate entry point into IndexSearcher to achieve doc-at-a-time 
scoring that supports getChildren() would be awesome. I'm not so hot on having 
to cast the collector, do you think there could be a way to preserve type 
safety here?


> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314299#comment-14314299
 ] 

Adrien Grand commented on LUCENE-6229:
--

OK, I think I understand. The new MinShouldMatchSumScorer advances sub scorers 
lazily until freq() or score() is called. I believe you are wrapping this 
min-should-match query into another query that does not propagate calls to 
freq() (like another boolean query?).

Note that you can work-around the issue by calling freq() on sub-scorers as 
well, but I wouldn't consider it a bug. This is exactly the kind of things that 
worries me about this API: there are assumptions that are made about this API 
that nothing else relies upon. And I think the min-should-match query is a good 
example: we know that there is a match as soon as we found minShouldMatch 
matching clauses. Forcefully trying to advance ALL clauses to the matching doc 
would be wasteful, which is why this scorer only does it when it is required, 
ie. when freq() or score() are called.

I think one acceptable middle ground would be to:
 - hide getChildren from the scorer that is passed to LeafCollector to remove 
the trap of calling getChildren on a scorer that does not support it 
(LUCENE-6228)
 - add a new IndexSearcher specializating that only uses scorers (not bulk 
scorers). Collectors called through this indexSearcher could cast the scorer 
that is passed in order to have access to children.

Stefan, Terry: what do you think?

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314234#comment-14314234
 ] 

Adrien Grand commented on LUCENE-6229:
--

bq. 

bq. It'd be great to simplify this workflow as I've been calling Scorer.freq() 
to position all the child scorers (from a BooleanQuery) and as of the 5.1 
nightly builds am needing to call Scorer.score() instead for positioning due to 
changes in MinShouldMatchSumScorer.

This is weird, both freq() and score() should position all sub scorers the same 
way with MinShouldMatchSumScorer. What do your query and collector look like?

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314219#comment-14314219
 ] 

Terry Smith commented on LUCENE-6229:
-

Like Stefan, I'm also using this functionality to access child scorers on a per 
document basis. Currently for some custom query enhancements and a custom drill 
sideways implementation.

Like Adrien, I've also had to wrap queries in a custom NonBulkScoringQuery to 
force doc-at-a-time scoring.

It'd be great to simplify this workflow as I've been calling Scorer.freq() to 
position all the child scorers (from a BooleanQuery) and as of the 5.1 nightly 
builds am needing to call Scorer.score() instead for positioning due to changes 
in MinShouldMatchSumScorer.

I'd love to have a way to not only get the child scorers but be confident that 
they were all correctly positioned.


> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313800#comment-14313800
 ] 

Adrien Grand commented on LUCENE-6229:
--

bq. Could this functionality be provided in a different way that doesn't have 
the problems you want to address here? E.g. could users hint the search to 
require this functionality (needClauses?), which in turn leads to not using 
optimized implementations that cannot (easily) provide this information?

I'm not sure I would like an additional parameter. In the join module, we 
worked around the fact that boolean queries do not always expose getChildren() 
by creating a dedicated IndexSearcher impl that always uses scorers instead of 
bulk scorers (ToParentBlockJoinIndexSearcher in lucene/join). Maybe this thing 
could be moved to a more general-purpose like lucene/misc and we could document 
that this is the way to go if you want to have access to information about 
child scorers from a collector, at the cost of some potential slow down?

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-09 Thread Stefan Pohl (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313722#comment-14313722
 ] 

Stefan Pohl commented on LUCENE-6229:
-

Hi Adrian, Mike,
thanks for your recent efforts in cleaning up with many outstanding 
refactorings.

I'm using this functionality in analytics/debugging contexts where it's not 
necessary to have best possible performance (e.g. using BooleanScorer).
LUCENE-2590 doesn't seems to be a feature that you would assume Lucene to use 
internally, and I doubt many actual users of this functionality track JIRA and 
would speak up here.

Could this functionality be provided in a different way that doesn't have the 
problems you want to address here? E.g. could users hint the search to require 
this functionality (needClauses?), which in turn leads to not using optimized 
implementations that cannot (easily) provide this information?

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312899#comment-14312899
 ] 

Michael McCandless commented on LUCENE-6229:


LUCENE-2590 was the original motivation for this API, so that per-hit an app 
could see which sub-clauses matched.

But it has not really been used much as far as I can tell, and it has just 
caused hassle over time, so +1 to remove.

> Remove Scorer.getChildren?
> --
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org