[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

Robert Muir (JIRA) Mon, 15 Feb 2010 07:01:53 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833828#action_12833828
 ]


Robert Muir commented on LUCENE-329:
------------------------------------

bq. The problem with ignoring IDF completely is that it doesn't help balance 
partial matches where there is >1 fuzzy element in the query e.g.in a query for 
John~ Patitucci~ I'm probably more interested in a partial match on the rarer 
surname than a partial match on the common forename. Obliterating IDF 
completely as a factor would lose this feature (available in FuzzyLikeThisQuery)

Mark, it wouldn't lose any features. we simply provide another option, just 
like we do for other MultiTermQuery rewrites for other queries, so users can 
choose what they want to use. its just an additional choice.

> Fuzzy query scoring issues
> --------------------------
>
>                 Key: LUCENE-329
>                 URL: https://issues.apache.org/jira/browse/LUCENE-329
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.2rc5
>         Environment: Operating System: All
> Platform: All
>            Reporter: Mark Harwood
>            Assignee: Lucene Developers
>            Priority: Minor
>         Attachments: patch.txt
>
>
> Queries which automatically produce multiple terms (wildcard, range, prefix, 
> fuzzy etc)currently suffer from two problems:
> 1) Scores for matching documents are significantly smaller than term queries 
> because of the volume of terms introduced (A match on query Foo~ is 0.1 
> whereas a match on query Foo is 1).
> 2) The rarer forms of expanded terms are favoured over those of more common 
> forms because of the IDF. When using Fuzzy queries for example, rare mis-
> spellings typically appear in results before the more common correct 
> spellings.
> I will attach a patch that corrects the issues identified above by 
> 1) Overriding Similarity.coord to counteract the downplaying of scores 
> introduced by expanding terms.
> 2) Taking the IDF factor of the most common form of expanded terms as the 
> basis of scoring all other expanded terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

Reply via email to