[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

Doron Cohen (JIRA) Wed, 04 May 2011 12:01:04 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028895#comment-13028895
 ]


Doron Cohen commented on LUCENE-3068:
-------------------------------------

bq. specifically when the doc itself has tokens at the same position.

I am not convinced yet that there is a bug here - I think the code does allow 
this? 

There is another assumption in the code, that any two different PPs are in 
different TPs - which underlines the assumption that originally each PP differs 
in position, This seems a valid assumption, because QP will create MFQ if there 
are two terms in the (phrase) query with same position. 

bq. maybe any time a *PhraseQuery has overlapping positions, we should rewrite 
to a MultiPhraseQuery and let it handle the same positions...? Is there any 
downside to that?

I think this is the correct behavior - in particular this will be the query 
that a QP will create. The only way to create a PQ (not MPQ) for PPs in same 
positions is to create it manually. But why would anyone do that? And they did, 
wouldn't such a rewrite be a surprise to them?

A patch to follow with a revised version of this test - one that uses the QP. 
In this patch the QP indeed creates an MFQ, and I am yet unable to make it 
fail. Still trying.

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> ------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3068
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3068
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.3, 3.1, 4.0
>            Reporter: Michael McCandless
>            Assignee: Doron Cohen
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

Reply via email to