Hello Developers,

I appreciate your feedback. I raised the ticket for the last one which is
more common and reachable gain.
https://issues.apache.org/jira/browse/LUCENE-4571
Stefan kindly agreed to add more details into it.

Happy Black Friday!


On Tue, Nov 13, 2012 at 11:59 PM, Mikhail Khludnev <
[email protected]> wrote:

> Developers,
>
> I want to discuss few points regarding disjunction form of BooleanQuery
> with minShouldMatch constraint. I'm talking about doc-at-time evaluation
> only (BooleanScorer2).
> Look into conjunction query which has disjunction in one of its' clauses
> e.g. +foo +(bar baz …). if disjunction "(bar baz … )" has high
> minShouldMatch constraint even if conjunction clause +foo is highly
> selective this query performs quite bad. It also happens if instead of +foo
> you have a filter. Once again: it's reasonable that disjunction even with
> high minShouldMatch is expensive, if "core" disjunction with
> minShouldMatch=1 matches few millions of docs. The problem is that I can't
> speed it up by supplying highly selective filter.
> From my POV there two points in Lucene API which make leapfrog impossible:
> - advance() is obliged to return next matching doc that causes scan in
> nextDoc() loop. It's great to have something like advanceExact(), or return
> some magic value from advance says - "failed to advance, propose next for
> leapfrog";
> - Scorer is obliged to jump on the first matching doc after it's created
> that leads to scan many docs in nextDoc() loop;
> - ConjunctiveScorer can't know which of its legs are not able to leapfrog
> and prefer to decline advance()
> - Stefan spotted one more gain for minShouldMatch in DisjunctionSumScorer:
> We don't need to walk through top of heap after top scorer is advanced.
> Instead of this, let's pop minShouldMatch scores from the heap, look at the
> top after this - top doc can be evaluated as potential match which exceeds
> minShouldMatch constraint. After that, let's push those guys back to the
> heap advancing them to that candidate docnum. It's not cohesioned with
> leapfrog problem, though.
>
> This last one looks impressing, but I'm not smart enough to realize that
> it gives performance gain. Do you think it's valuable optimization for
> Lucene users? How minShouldMatch is popular,btw? To be honest I'm not
> really suffer form minShouldMatch itself, I have query with my own match
> verification logic and therefore lack of leapfrog bothers me much.
>
> Looking forward for you feedback!
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <[email protected]>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <[email protected]>

Reply via email to