[ 
https://issues.apache.org/jira/browse/LUCENE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086936#comment-13086936
 ] 

Mark Harwood commented on LUCENE-3381:
--------------------------------------

It's more nuanced than averaging IDF of variants (as discussed at length in 
LUCENE-329).
To summarise: the original search term is the closest thing we have to the 
user's intent. If we average its IDF against all fuzzy variants it is most 
likely to dilute this value with a set of rare terms (most terms in the 
termEnum are e.g. typos) that happen to share some characters.
When sitting this sort of expanded fuzzy query alongside other search terms in 
a BooleanQuery (e.g. robert~ OR muir) we end up making the "robert~" branch of 
the query look comparatively rare compared to the straight "muir" term thanks 
to the IDF dilution from a hundred rare "robert" variations. In my view the 
correct fix is to use the root term's IDF only (assuming "robert" actually 
exists in the index otherwise we must drop back to the average of variants).

That's the trick employed by FuzzyLikeThis that stops my customers complaining 
about "bad fuzzy matches".


> Sandbox remaining contrib queries
> ---------------------------------
>
>                 Key: LUCENE-3381
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3381
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Chris Male
>         Attachments: LUCENE-3381.patch
>
>
> In LUCENE-3271, I moved the 'good' queries from the queries contrib to new 
> destinations (primarily the queries module).  The remnants now need to find 
> their home.  As suggested in LUCENE-3271, these classes are not bad per se, 
> just odd.  So lets create a sandbox contrib that they and other 'odd' contrib 
> classes can go to.  We can then decide their fate at another time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to