[
https://issues.apache.org/jira/browse/LUCENE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086936#comment-13086936
]
Mark Harwood commented on LUCENE-3381:
--------------------------------------
It's more nuanced than averaging IDF of variants (as discussed at length in
LUCENE-329).
To summarise: the original search term is the closest thing we have to the
user's intent. If we average its IDF against all fuzzy variants it is most
likely to dilute this value with a set of rare terms (most terms in the
termEnum are e.g. typos) that happen to share some characters.
When sitting this sort of expanded fuzzy query alongside other search terms in
a BooleanQuery (e.g. robert~ OR muir) we end up making the "robert~" branch of
the query look comparatively rare compared to the straight "muir" term thanks
to the IDF dilution from a hundred rare "robert" variations. In my view the
correct fix is to use the root term's IDF only (assuming "robert" actually
exists in the index otherwise we must drop back to the average of variants).
That's the trick employed by FuzzyLikeThis that stops my customers complaining
about "bad fuzzy matches".
> Sandbox remaining contrib queries
> ---------------------------------
>
> Key: LUCENE-3381
> URL: https://issues.apache.org/jira/browse/LUCENE-3381
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Chris Male
> Attachments: LUCENE-3381.patch
>
>
> In LUCENE-3271, I moved the 'good' queries from the queries contrib to new
> destinations (primarily the queries module). The remnants now need to find
> their home. As suggested in LUCENE-3271, these classes are not bad per se,
> just odd. So lets create a sandbox contrib that they and other 'odd' contrib
> classes can go to. We can then decide their fate at another time.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]