Re: ScorerSupplier and cost() avoidance
Got it. I thought you were talking about the class method, not the static function. Le jeu. 7 juin 2018 à 22:34, David Smiley a écrit : > I'll have to look further with the team on what these queries look like in > terms of relative cheapness and throughput. > > RE MinShouldMatchSumScorer.cost() (static method called by > Boolean2ScorerSupplier.computeCost) -- it's pretty easy to see that this > is called with minShouldMatch==0 (or 1). Set a conditional breakpoint in > your IDE and run TestBooleanMinShouldMatch.testRandomQueries > > On Thu, Jun 7, 2018 at 3:43 PM Adrien Grand wrote: > >> I suspect this could only show up as a bottleneck if they run very cheap >> queries (low cost) at a very high throughput? Is it the case? I've seen a >> couple workloads like that in the past and profilers suggested that things >> that usually do not matter were bottleneck like creating scorers or >> deciding whether a query should be cached. But trying to fix it didn't >> really help as there are lots of things that we need to do to decide how to >> run a query that run in O(num_segments * num_clauses) >> >> I'm confused why MinShouldMatchSumScorer would be used when >> minShouldMatch is 0 or 1. DisjunctionSumScorer should be used instead for >> such values of minShouldMatch? >> >> Le jeu. 7 juin 2018 à 19:38, Michael McCandless < >> luc...@mikemccandless.com> a écrit : >> >>> Doesn't BQ rewrite itself if it has only one clause? >>> >>> Or maybe if there were more than one clause, and then all but one of >>> them had null scorers (on SHOULD clauses) you could wind up in that state? >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Thu, Jun 7, 2018 at 1:21 PM, David Smiley >>> wrote: >>> I'm working with some folks who did some profiling and noticed that ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says only to call it if necessary. Unfortunately, a BooleanQuery is going to call cost() (via BooleanWeight.scorer() even if ultimately no Query in the tree cares what the cost is. I'm not sure if that's a perf bug or not; it's hard to tell. The expensive part of cost() for Boolean2ScorerSupplier is over in MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even if trivially numScorers == 1. That's a weird case... why do we even need a Boolean2ScorerSupplier around one clause; couldn't that clause be returned from the outer weight, BooleanWeight.scorerSupplier() close to the end as an optimization? I could file an issue. ~ David -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com >>> >>> -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com >
Re: ScorerSupplier and cost() avoidance
I'll have to look further with the team on what these queries look like in terms of relative cheapness and throughput. RE MinShouldMatchSumScorer.cost() (static method called by Boolean2ScorerSupplier.computeCost) -- it's pretty easy to see that this is called with minShouldMatch==0 (or 1). Set a conditional breakpoint in your IDE and run TestBooleanMinShouldMatch.testRandomQueries On Thu, Jun 7, 2018 at 3:43 PM Adrien Grand wrote: > I suspect this could only show up as a bottleneck if they run very cheap > queries (low cost) at a very high throughput? Is it the case? I've seen a > couple workloads like that in the past and profilers suggested that things > that usually do not matter were bottleneck like creating scorers or > deciding whether a query should be cached. But trying to fix it didn't > really help as there are lots of things that we need to do to decide how to > run a query that run in O(num_segments * num_clauses) > > I'm confused why MinShouldMatchSumScorer would be used when minShouldMatch > is 0 or 1. DisjunctionSumScorer should be used instead for such values of > minShouldMatch? > > Le jeu. 7 juin 2018 à 19:38, Michael McCandless > a écrit : > >> Doesn't BQ rewrite itself if it has only one clause? >> >> Or maybe if there were more than one clause, and then all but one of them >> had null scorers (on SHOULD clauses) you could wind up in that state? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Thu, Jun 7, 2018 at 1:21 PM, David Smiley >> wrote: >> >>> I'm working with some folks who did some profiling and noticed that >>> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says >>> only to call it if necessary. Unfortunately, a BooleanQuery is going to >>> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the >>> tree cares what the cost is. I'm not sure if that's a perf bug or not; >>> it's hard to tell. >>> >>> The expensive part of cost() for Boolean2ScorerSupplier is over in >>> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even >>> if trivially numScorers == 1. That's a weird case... why do we even need a >>> Boolean2ScorerSupplier around one clause; couldn't that clause be returned >>> from the outer weight, BooleanWeight.scorerSupplier() close to the end as >>> an optimization? I could file an issue. >>> >>> ~ David >>> -- >>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >>> http://www.solrenterprisesearchserver.com >>> >> >> -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
Re: ScorerSupplier and cost() avoidance
I suspect this could only show up as a bottleneck if they run very cheap queries (low cost) at a very high throughput? Is it the case? I've seen a couple workloads like that in the past and profilers suggested that things that usually do not matter were bottleneck like creating scorers or deciding whether a query should be cached. But trying to fix it didn't really help as there are lots of things that we need to do to decide how to run a query that run in O(num_segments * num_clauses) I'm confused why MinShouldMatchSumScorer would be used when minShouldMatch is 0 or 1. DisjunctionSumScorer should be used instead for such values of minShouldMatch? Le jeu. 7 juin 2018 à 19:38, Michael McCandless a écrit : > Doesn't BQ rewrite itself if it has only one clause? > > Or maybe if there were more than one clause, and then all but one of them > had null scorers (on SHOULD clauses) you could wind up in that state? > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Jun 7, 2018 at 1:21 PM, David Smiley > wrote: > >> I'm working with some folks who did some profiling and noticed that >> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says >> only to call it if necessary. Unfortunately, a BooleanQuery is going to >> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the >> tree cares what the cost is. I'm not sure if that's a perf bug or not; >> it's hard to tell. >> >> The expensive part of cost() for Boolean2ScorerSupplier is over in >> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even >> if trivially numScorers == 1. That's a weird case... why do we even need a >> Boolean2ScorerSupplier around one clause; couldn't that clause be returned >> from the outer weight, BooleanWeight.scorerSupplier() close to the end as >> an optimization? I could file an issue. >> >> ~ David >> -- >> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >> http://www.solrenterprisesearchserver.com >> > >
Re: ScorerSupplier and cost() avoidance
I think that's right Michael -- some null scorers. Started with 3 in one case and wound up with 1. Also, in MinShouldMatchSumScorer.cost, sometimes the size of the PriorityQueue isn't needed at all, which is when minShouldMatch is 0 or 1. In that case, simply sum the costs. On Thu, Jun 7, 2018 at 1:38 PM Michael McCandless wrote: > Doesn't BQ rewrite itself if it has only one clause? > > Or maybe if there were more than one clause, and then all but one of them > had null scorers (on SHOULD clauses) you could wind up in that state? > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Jun 7, 2018 at 1:21 PM, David Smiley > wrote: > >> I'm working with some folks who did some profiling and noticed that >> ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says >> only to call it if necessary. Unfortunately, a BooleanQuery is going to >> call cost() (via BooleanWeight.scorer() even if ultimately no Query in the >> tree cares what the cost is. I'm not sure if that's a perf bug or not; >> it's hard to tell. >> >> The expensive part of cost() for Boolean2ScorerSupplier is over in >> MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even >> if trivially numScorers == 1. That's a weird case... why do we even need a >> Boolean2ScorerSupplier around one clause; couldn't that clause be returned >> from the outer weight, BooleanWeight.scorerSupplier() close to the end as >> an optimization? I could file an issue. >> >> ~ David >> -- >> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >> http://www.solrenterprisesearchserver.com >> > > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
Re: ScorerSupplier and cost() avoidance
Doesn't BQ rewrite itself if it has only one clause? Or maybe if there were more than one clause, and then all but one of them had null scorers (on SHOULD clauses) you could wind up in that state? Mike McCandless http://blog.mikemccandless.com On Thu, Jun 7, 2018 at 1:21 PM, David Smiley wrote: > I'm working with some folks who did some profiling and noticed that > ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says > only to call it if necessary. Unfortunately, a BooleanQuery is going to > call cost() (via BooleanWeight.scorer() even if ultimately no Query in the > tree cares what the cost is. I'm not sure if that's a perf bug or not; > it's hard to tell. > > The expensive part of cost() for Boolean2ScorerSupplier is over in > MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even > if trivially numScorers == 1. That's a weird case... why do we even need a > Boolean2ScorerSupplier around one clause; couldn't that clause be returned > from the outer weight, BooleanWeight.scorerSupplier() close to the end as > an optimization? I could file an issue. > > ~ David > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www. > solrenterprisesearchserver.com >
ScorerSupplier and cost() avoidance
I'm working with some folks who did some profiling and noticed that ScorerSupplier.cost() can be expensive (as the javadocs say). cost() says only to call it if necessary. Unfortunately, a BooleanQuery is going to call cost() (via BooleanWeight.scorer() even if ultimately no Query in the tree cares what the cost is. I'm not sure if that's a perf bug or not; it's hard to tell. The expensive part of cost() for Boolean2ScorerSupplier is over in MinShouldMatchSumScorer.cost which creates a PriorityQueue every time, even if trivially numScorers == 1. That's a weird case... why do we even need a Boolean2ScorerSupplier around one clause; couldn't that clause be returned from the outer weight, BooleanWeight.scorerSupplier() close to the end as an optimization? I could file an issue. ~ David -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com