[ https://issues.apache.org/jira/browse/LUCENE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086936#comment-13086936 ]
Mark Harwood commented on LUCENE-3381: -------------------------------------- It's more nuanced than averaging IDF of variants (as discussed at length in LUCENE-329). To summarise: the original search term is the closest thing we have to the user's intent. If we average its IDF against all fuzzy variants it is most likely to dilute this value with a set of rare terms (most terms in the termEnum are e.g. typos) that happen to share some characters. When sitting this sort of expanded fuzzy query alongside other search terms in a BooleanQuery (e.g. robert~ OR muir) we end up making the "robert~" branch of the query look comparatively rare compared to the straight "muir" term thanks to the IDF dilution from a hundred rare "robert" variations. In my view the correct fix is to use the root term's IDF only (assuming "robert" actually exists in the index otherwise we must drop back to the average of variants). That's the trick employed by FuzzyLikeThis that stops my customers complaining about "bad fuzzy matches". > Sandbox remaining contrib queries > --------------------------------- > > Key: LUCENE-3381 > URL: https://issues.apache.org/jira/browse/LUCENE-3381 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Chris Male > Attachments: LUCENE-3381.patch > > > In LUCENE-3271, I moved the 'good' queries from the queries contrib to new > destinations (primarily the queries module). The remnants now need to find > their home. As suggested in LUCENE-3271, these classes are not bad per se, > just odd. So lets create a sandbox contrib that they and other 'odd' contrib > classes can go to. We can then decide their fate at another time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org