[ 
https://issues.apache.org/jira/browse/LUCENE-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985318#action_12985318
 ] 

Doron Cohen commented on LUCENE-2879:
-------------------------------------

+1 for fixing this inconsistent behavior.
BTW also SpanWeight calls idfExplain() for same reason.
Patch looks good, new test case passes with the fix and fails without it.

A small thing that bothered me was that an explanation is created although the 
user did not call explain(), and in general explain() is considered slower, but 
it is called once per query, so it should not be a perf issue, and that's the 
case already for two other queries so anyhow this one (MFQ) should first be 
made consistent, which is done by this patch.

It is interesting that the implementation of a similar logic in SpanWeight is 
more compact:
{code:title=SpanWeight: calls extractTerms()}
terms=new HashSet<Term>();
query.extractTerms(terms);
idfExp = similarity.idfExplain(terms, searcher);
{code}

But doing the same in MFQ would change its logic, as it would consider each 
term only once. 
Not saying that the patch should change, just pointing out the difference in 
sum-of-square-weights computation between SpanWeight and MFQ.
Boolean Query fore example, would iterate over its sub queries and sum theirs, 
and so, if it so happens that the same term appears in two descendant queries 
that term would contribute twice to the sum. In this sense, MFQ and BQ behave 
similarly, both differ from SpanQuery... well I guess this falls to the "black 
magic" area :)

> MultiPhraseQuery sums its own idf instead of Similarity.
> --------------------------------------------------------
>
>                 Key: LUCENE-2879
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2879
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 2.9.5, 3.0.4, 3.1, 4.0
>
>         Attachments: LUCENE-2879.patch
>
>
> MultiPhraseQuery is a generalized version of PhraseQuery, and computes IDF 
> the same way by default (by summing across the terms).
> The problem is it doesn't let the Similarity do this: PhraseQuery calls 
> Similarity.idfExplain(Collection<Term> terms, IndexSearcher searcher),
> but MultiPhraseQuery just sums itself, calling Similarity.idf(int, int) for 
> each term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to