Hello Developers, I appreciate your feedback. I raised the ticket for the last one which is more common and reachable gain. https://issues.apache.org/jira/browse/LUCENE-4571 Stefan kindly agreed to add more details into it.
Happy Black Friday! On Tue, Nov 13, 2012 at 11:59 PM, Mikhail Khludnev < [email protected]> wrote: > Developers, > > I want to discuss few points regarding disjunction form of BooleanQuery > with minShouldMatch constraint. I'm talking about doc-at-time evaluation > only (BooleanScorer2). > Look into conjunction query which has disjunction in one of its' clauses > e.g. +foo +(bar baz …). if disjunction "(bar baz … )" has high > minShouldMatch constraint even if conjunction clause +foo is highly > selective this query performs quite bad. It also happens if instead of +foo > you have a filter. Once again: it's reasonable that disjunction even with > high minShouldMatch is expensive, if "core" disjunction with > minShouldMatch=1 matches few millions of docs. The problem is that I can't > speed it up by supplying highly selective filter. > From my POV there two points in Lucene API which make leapfrog impossible: > - advance() is obliged to return next matching doc that causes scan in > nextDoc() loop. It's great to have something like advanceExact(), or return > some magic value from advance says - "failed to advance, propose next for > leapfrog"; > - Scorer is obliged to jump on the first matching doc after it's created > that leads to scan many docs in nextDoc() loop; > - ConjunctiveScorer can't know which of its legs are not able to leapfrog > and prefer to decline advance() > - Stefan spotted one more gain for minShouldMatch in DisjunctionSumScorer: > We don't need to walk through top of heap after top scorer is advanced. > Instead of this, let's pop minShouldMatch scores from the heap, look at the > top after this - top doc can be evaluated as potential match which exceeds > minShouldMatch constraint. After that, let's push those guys back to the > heap advancing them to that candidate docnum. It's not cohesioned with > leapfrog problem, though. > > This last one looks impressing, but I'm not smart enough to realize that > it gives performance gain. Do you think it's valuable optimization for > Lucene users? How minShouldMatch is popular,btw? To be honest I'm not > really suffer form minShouldMatch itself, I have query with my own match > verification logic and therefore lack of leapfrog bothers me much. > > Looking forward for you feedback! > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <[email protected]> > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <[email protected]>
