[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

Michael McCandless (JIRA) Wed, 04 May 2011 03:05:50 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-3068:
---------------------------------------

    Attachment: LUCENE-3068.patch

Patch w/ test case showing the problem.

If you set slop to 0 for the PhraseQuery, the test passes.  The 
MultiPhraseQuery passes with slop or no slop because it handles the 
same-position case itself (Union*Enum).

That got me thinking... maybe any time a *PhraseQuery has overlapping 
positions, we should rewrite to a MultiPhraseQuery and let it handle the same 
positions...?  Is there any downside to that?

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> ------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3068
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3068
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.3, 3.1, 4.0
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

Reply via email to