[
https://issues.apache.org/jira/browse/LUCENE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503281#comment-13503281
]
Mikhail Khludnev commented on LUCENE-4571:
------------------------------------------
It was a bad idea to reply to jira's mail. moving dialogue here:
[~mkhludnev]
{quote}
Robert, am I right that establishing the perf test is the first necessary step,
rather than the implementation itself.
Also, (don't really important but let me mention) what I'm really looking for
is the disjunction query with an user supplied verification strategy, where
minShouldMatch is just one of the way to verify match.
{quote}
[~rcmuir]
{quote}
Right, the best way to do this is to extend luceneutil
(http://code.google.com/a/apache-extras.org/p/luceneutil) to test this case.
Keep in mind that I'd also be interested to see how BooleanScorer compares to
BooleanScorer2 for this situation. I already mentioned on the solr list (nobody
replied) that solr *never* gets BooleanScorer, but from time to time I hear
solr users complaining about BooleanScorer2's performance for min-should-match
So when trying to improve the performance of min-should-match, I think a very
early step should be to see if we already have a better performing alternative
that is just not being used: if thats the case then the best solution is to fix
Solr's collectors to be able to cope with BooleanScorer.
Intuitively I think its going to be like everything else, BS1 is better in some
situations, BS2 in others.
>>> Also, (don't really important but let me mention) what I'm really looking
>>> for is the disjunction query with an user supplied verification strategy,
>>> where minShouldMatch is just one of the way to verify match.
I don't think our concrete scorers should have such a hook: they should be as
dead simple as possible.
If you want to do this, I recommend just extending the abstract
DisjunctionScorer (Currently DisjunctionSum and DisjunctionMax extend this, as
I suggested we should think about splitting out a MinShouldMatchScorer as well:
its confusing that pure disjunctions are all mixed up with min-should-match and
the algorithms should actually work differently).
{quote}
> speedup disjunction with minShouldMatch
> ----------------------------------------
>
> Key: LUCENE-4571
> URL: https://issues.apache.org/jira/browse/LUCENE-4571
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 4.1
> Reporter: Mikhail Khludnev
>
> even minShouldMatch is supplied to DisjunctionSumScorer it enumerates whole
> disjunction, and verifies minShouldMatch condition [on every
> doc|https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java#L70]:
> {code}
> public int nextDoc() throws IOException {
> assert doc != NO_MORE_DOCS;
> while(true) {
> while (subScorers[0].docID() == doc) {
> if (subScorers[0].nextDoc() != NO_MORE_DOCS) {
> heapAdjust(0);
> } else {
> heapRemoveRoot();
> if (numScorers < minimumNrMatchers) {
> return doc = NO_MORE_DOCS;
> }
> }
> }
> afterNext();
> if (nrMatchers >= minimumNrMatchers) {
> break;
> }
> }
>
> return doc;
> }
> {code}
> [~spo] proposes (as well as I get it) to pop nrMatchers-1 scorers from the
> heap first, and then push them back advancing behind that top doc. For me the
> question no.1 is there a performance test for minShouldMatch constrained
> disjunction.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]