[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802884#comment-16802884 ] Alan Woodward commented on LUCENE-6229: --- I think it's worth revisiting this now? The Matches API can provide information about which clauses of a Boolean query match for a given hit, and custom scoring is doable via FunctionScoreQuery and DoubleValuesSource.fromQuery() > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020732#comment-15020732 ] Mikhail Khludnev commented on LUCENE-6229: -- [~jpountz] our custom logic also extensively uses {{getChildren()}} we analyze hits across all hits, not only top ones. It somehow similar to LUCENE-1999 http://lucene.472066.n3.nabble.com/Highlighting-catering-for-all-query-types-td572693.html Surely. We have to deal with suppressing term-at-time scorer, and sub-scorers positioning. It lacks of design beauty but it works while we have {{getChildren()}}. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018538#comment-15018538 ] Adrien Grand commented on LUCENE-6229: -- Couldn't you do it by evaluating each clause on the top hits? > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018531#comment-15018531 ] Karl Wright commented on LUCENE-6229: - We're using getChildren() solely for inspection purposes, so I'd be very sad if this capability went away. If it *does* go away, FWIW we use this to figure out which clause(s) of a BooleanQuery a document comes from, so we'd need equivalent functionality at that point. Thanks! > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335022#comment-14335022 ] Terry Smith commented on LUCENE-6229: - Understood. If you end up keeping getChildren(), how do you feel about making it well defined by capturing these constraints in the Javadoc? > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334956#comment-14334956 ] Robert Muir commented on LUCENE-6229: - {quote} Can I position the returned scorers on the current document by calling freq(), score() or something else? {quote} No: this positioning is the whole problem here! > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334937#comment-14334937 ] Terry Smith commented on LUCENE-6229: - [~rcmuir] Sorry for excluding that scenario, it wasn't intentional. If you all decide to keep getChildren(), then I'd love to get the contract described so people know what to expect. I think these statements are correct: # Scorer.getChildren() returns the immediate child scorers # A returned scorer may be ## unpositioned (never had next() or advance() called on it) ## positioned on a valid document that is before, on, or after the current document ## exhausted and thus positioned at NO_MORE_DOCS # You MUST NOT call next() or advance() on the returned scorers yourself And have these questions: # Can I walk the returned scorers to get to all non-null leaf scorers? # Can I position the returned scorers on the current document by calling freq(), score() or something else? > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329108#comment-14329108 ] Adrien Grand commented on LUCENE-6229: -- bq. just let getChildren() be completely transparent about where scorers are positioned I agree this is the only reasonable definition of getChildren that we could have if we want to keep it, otherwise it puts too many constraints on the scorers. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329082#comment-14329082 ] Robert Muir commented on LUCENE-6229: - I don't understand why we quickly threw out the third option i presented: just let getChildren() be completely transparent about where scorers are positioned, it reflects reality and means it will not infringe on the actual query performance. This still allows it to be used at least for debugging. If we insist that it returns correctly positioned scorers, then honestly i don't think it will ever happen. It will stay half-broken and undefined like today instead. The performance cost is far too high, and specializing scors via an option is too expensive in terms of maintenance. We already have a hard enough time with the scorers we have. For extremely expert use cases like using getChildren() to "subclass" a query, there are other alternatives, considering the code is open source, like just forking that query or whatever. At some point, something is just expert enough where that is the correct solution to the problem. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329069#comment-14329069 ] Terry Smith commented on LUCENE-6229: - I’ll summarize this as two options: # Remove getChildren() as it complicates the code hurting the ability to maintain it and make performance enhancements. # Make getChildren() a more well defined API that gives you the ability to retrieve child scorers that are correctly positioned. You are looking for data to backup option 2 to determine if this is an API that is worth fixing/keeping. Here are the use cases that I have: # Custom scoring of a BooleanQuery. A query that wraps any BooleanQuery which it uses for recall but supplies it’s own scoring algorithm to aggregate the scores from the original clauses. # Custom DrillSidewaysQuery. A query that can use the sideways scorers for precision instead of just recall. # Recursive DrillSidewaysQuery (not implemented, tricky). A query that could perform DrillSideways for union or in a nested fashion. # Auxillary metadata. An enhancement that can augment the current recall (boolean match) and precision (float score) for a document in the search pipeline to add extra information that can be used from Query and FunctionValue instances (collected via a custom Collector) and supported by a custom SortField. These can be categorized into two camps: # Using an existing Query (typically BooleanQuery) to find matches but providing some combination of ## custom scoring that isn’t compatible with the Similarity API. ## custom recall (think DrillSideways) # Adding extra information to the search pipeline that can be ## generated by leaf queries and value sources ## aggregated by composing queries (BooleanQuery, DisjunctionMaxQuery, etc) ## survive wrapping queries and value sources that don’t know about it ## collected and sorted on Hope this helps. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324759#comment-14324759 ] Robert Muir commented on LUCENE-6229: - I think a lot of the problem i have with the current API is that it is so undefined, for example when getChildren() or freq() are allowed to be called. In some cases like ReqOptSumScorer, we can work around the issue partially, but it makes code more confusing (it calls score() in its freq(), which might invoke a deferred advance() on the optional part). And it doesn't really fix the issue, if you navigate to the "leaf" nodes you will find the optional ones positioned on the wrong docID often. I think the optimization in this case is simple and effective, so it would be a shame to limit our performance because of this API. On the other hand, I like the idea of being able to navigate the scorers tree, at least for debugging. If the API was completely "transparent" it would still meet that use case. So the main question is, what are the other important use cases? Do they justify limiting our performance or completely specializing potentially many scorers to support, or is there a more restrictive API that would still work. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324686#comment-14324686 ] Terry Smith commented on LUCENE-6229: - [~rcmuir] Thanks for the backstory. I've been trying to wrap my head around where Lucene is going and this kind of information really helps. It sounds like both [~rcmuir] and [~thetaphi] agree that Scorer.getChildren() is not an API that Lucene should support. Reading between the lines, this implies to me that scoring is moving to a bulk-only approach, which will bring great performance gains. A best effort implementation of Scorer.getChildren() would be something that I'd be uncomfortable adding features on top of, although it could be useful for debugging. Unfortunately this is a showstopper for me as I rely on Scorer.getChildren() for some critical features, and need to do some serious thinking to figure out if I can formulate an alternative approach. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318298#comment-14318298 ] Uwe Schindler commented on LUCENE-6229: --- I agree, actually only one of my customers really used this. And at the time they did this, the API was already not perfect, and as Robert said - incomplete and not guaranteed to work (Lucene 3 times) - e.g. it only worked if bulk scoring was not enabled (the customer just passed correct booleans to weight, so it thought that it was not top-level query). The API was really cool in Lucene 4+, but I was never using it for real use cases. The other (current) customer split ther query up already before the scorer was invoved (they are interested in counts on sub querys) - so removal is not an issue. The code used was actually written by [~simonw], I just inherited it :-) > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318287#comment-14318287 ] Robert Muir commented on LUCENE-6229: - I dont think this is guaranteed. Lets be clear about how this went down. The functionality was added on LUCENE-2590, but was hardcoded to only work with booleanscorer. Just look at the patch, you dont have to believe me. There was also a lot of inconsistencies regarding various scorers. For example issues like LUCENE-2686. But we didnt slow down our core functionality, instead we waited until there was a way to clean this stuff up non-intrusively. I added the getChildren API to more scorers and fixed lots of bugs like this in issues like LUCENE-3505. However, this api is *always* best effort. The one and only purpose of scorers is to score hits. If someone can make disjunctionscorer 5% faster, and completely break this getChildren API, I will be +1. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318279#comment-14318279 ] Uwe Schindler commented on LUCENE-6229: --- I had/have 2 customers using that... But I don't think it is really in wide use. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318272#comment-14318272 ] Terry Smith commented on LUCENE-6229: - [~jpountz] - I'm going to split the freq() vs score() thing into a separate ticket so it doesn't hijack this one. I intend to take the unit test I previously pasted and extend it to create some randomized BooleanQuerys to try and locate possibly broken edge cases and give a safety blanket for future refactoring. I'll make these assumptions, shout out if they are incorrect. For a BooleanQuery I should be able to perform doc-at-a-time scoring, meaning that in a Collector or Scorer I can 1. Find all Scorers from the child clauses of the BooleanQuery 2. Have those Scorers be positioned for me by calling score() or freq() > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314361#comment-14314361 ] Terry Smith commented on LUCENE-6229: - h2. freq() vs score() I think the lazy positioning in MinShouldMatchSumScorer is misbehaving. Drop these three methods into TestBooleanMinShouldMatch.java to see. {code:java} public void testMinNrShouldMatchFreq() throws Exception { BooleanQuery q = new BooleanQuery(); q.add(new TermQuery(new Term("data", "1")), Occur.SHOULD); q.add(new TermQuery(new Term("data", "2")), Occur.SHOULD); q.add(new TermQuery(new Term("data", "3")), Occur.SHOULD); q.add(new TermQuery(new Term("id", "0")), Occur.MUST); q.setMinimumNumberShouldMatch(2); verifyNrHits(q, 1); s.search(q, new SimpleCollector() { private Scorer scorer; private Collection leafScorers; @Override public void setScorer(Scorer scorer) throws IOException { this.scorer = scorer; this.leafScorers = leafScorers(new ArrayList(), scorer); assertEquals(4, leafScorers.size()); } @Override public void collect(int doc) throws IOException { assertEquals(0, doc); scorer.freq(); // position leaf scorers for (Scorer leafScorer : leafScorers) { assertEquals(0, leafScorer.docID()); } } }); } public void testMinNrShouldMatchScore() throws Exception { BooleanQuery q = new BooleanQuery(); q.add(new TermQuery(new Term("data", "1")), Occur.SHOULD); q.add(new TermQuery(new Term("data", "2")), Occur.SHOULD); q.add(new TermQuery(new Term("data", "3")), Occur.SHOULD); q.add(new TermQuery(new Term("id", "0")), Occur.MUST); q.setMinimumNumberShouldMatch(2); verifyNrHits(q, 1); s.search(q, new SimpleCollector() { private Scorer scorer; private Collection leafScorers; @Override public void setScorer(Scorer scorer) throws IOException { this.scorer = scorer; this.leafScorers = leafScorers(new ArrayList(), scorer); assertEquals(4, leafScorers.size()); } @Override public void collect(int doc) throws IOException { assertEquals(0, doc); scorer.score(); // position leaf scorers for (Scorer leafScorer : leafScorers) { assertEquals(0, leafScorer.docID()); } } }); } private static Collection leafScorers(Collection target, Scorer scorer) { Collection childScorers = scorer.getChildren(); if (childScorers.isEmpty()) { target.add(scorer); } else { for (ChildScorer childScorer : childScorers) { leafScorers(target, childScorer.child); } } return target; } {code} Here the one that uses freq() to position the sub scorers fails but the one that uses score() succeeds. h2. middle ground I have Scorer constructors, Weight.scorer(), Weight.explain() and Collectors all calling Scorer.getChildren(). But when using my custom Collectors I'm careful to wrap the Query in a custom NonBulkScoringQuery that prevents bulk scoring to work around the trap. The NonBulkScoringQuery I mention is a simple delegating Query that allows Weight.bulkScorer() to use it's default implementation instead of allowing the wrapped Query to override it. I like removing the trap for bulk scoring queries, it's really subtle and it took me a while to diagnose the first time I hit it. Having a separate entry point into IndexSearcher to achieve doc-at-a-time scoring that supports getChildren() would be awesome. I'm not so hot on having to cast the collector, do you think there could be a way to preserve type safety here? > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314299#comment-14314299 ] Adrien Grand commented on LUCENE-6229: -- OK, I think I understand. The new MinShouldMatchSumScorer advances sub scorers lazily until freq() or score() is called. I believe you are wrapping this min-should-match query into another query that does not propagate calls to freq() (like another boolean query?). Note that you can work-around the issue by calling freq() on sub-scorers as well, but I wouldn't consider it a bug. This is exactly the kind of things that worries me about this API: there are assumptions that are made about this API that nothing else relies upon. And I think the min-should-match query is a good example: we know that there is a match as soon as we found minShouldMatch matching clauses. Forcefully trying to advance ALL clauses to the matching doc would be wasteful, which is why this scorer only does it when it is required, ie. when freq() or score() are called. I think one acceptable middle ground would be to: - hide getChildren from the scorer that is passed to LeafCollector to remove the trap of calling getChildren on a scorer that does not support it (LUCENE-6228) - add a new IndexSearcher specializating that only uses scorers (not bulk scorers). Collectors called through this indexSearcher could cast the scorer that is passed in order to have access to children. Stefan, Terry: what do you think? > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314234#comment-14314234 ] Adrien Grand commented on LUCENE-6229: -- bq. bq. It'd be great to simplify this workflow as I've been calling Scorer.freq() to position all the child scorers (from a BooleanQuery) and as of the 5.1 nightly builds am needing to call Scorer.score() instead for positioning due to changes in MinShouldMatchSumScorer. This is weird, both freq() and score() should position all sub scorers the same way with MinShouldMatchSumScorer. What do your query and collector look like? > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314219#comment-14314219 ] Terry Smith commented on LUCENE-6229: - Like Stefan, I'm also using this functionality to access child scorers on a per document basis. Currently for some custom query enhancements and a custom drill sideways implementation. Like Adrien, I've also had to wrap queries in a custom NonBulkScoringQuery to force doc-at-a-time scoring. It'd be great to simplify this workflow as I've been calling Scorer.freq() to position all the child scorers (from a BooleanQuery) and as of the 5.1 nightly builds am needing to call Scorer.score() instead for positioning due to changes in MinShouldMatchSumScorer. I'd love to have a way to not only get the child scorers but be confident that they were all correctly positioned. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313800#comment-14313800 ] Adrien Grand commented on LUCENE-6229: -- bq. Could this functionality be provided in a different way that doesn't have the problems you want to address here? E.g. could users hint the search to require this functionality (needClauses?), which in turn leads to not using optimized implementations that cannot (easily) provide this information? I'm not sure I would like an additional parameter. In the join module, we worked around the fact that boolean queries do not always expose getChildren() by creating a dedicated IndexSearcher impl that always uses scorers instead of bulk scorers (ToParentBlockJoinIndexSearcher in lucene/join). Maybe this thing could be moved to a more general-purpose like lucene/misc and we could document that this is the way to go if you want to have access to information about child scorers from a collector, at the cost of some potential slow down? > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313722#comment-14313722 ] Stefan Pohl commented on LUCENE-6229: - Hi Adrian, Mike, thanks for your recent efforts in cleaning up with many outstanding refactorings. I'm using this functionality in analytics/debugging contexts where it's not necessary to have best possible performance (e.g. using BooleanScorer). LUCENE-2590 doesn't seems to be a feature that you would assume Lucene to use internally, and I doubt many actual users of this functionality track JIRA and would speak up here. Could this functionality be provided in a different way that doesn't have the problems you want to address here? E.g. could users hint the search to require this functionality (needClauses?), which in turn leads to not using optimized implementations that cannot (easily) provide this information? > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312899#comment-14312899 ] Michael McCandless commented on LUCENE-6229: LUCENE-2590 was the original motivation for this API, so that per-hit an app could see which sub-clauses matched. But it has not really been used much as far as I can tell, and it has just caused hassle over time, so +1 to remove. > Remove Scorer.getChildren? > -- > > Key: LUCENE-6229 > URL: https://issues.apache.org/jira/browse/LUCENE-6229 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > This API is used in a single place in our code base: > ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that > using this API from a collector only works if setScorer is called with an > actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in > disjunctions) so it needs a custom IndexSearcher that does not use the > BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org