[ 
https://issues.apache.org/jira/browse/LUCENE-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-1124:
----------------------------------------


This fix breaks the case when the exact term is present in the index.

> short circuit FuzzyQuery.rewrite when input token length is small compared to 
> minSimilarity
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1124
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1124
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Query/Scoring
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>            Priority: Trivial
>             Fix For: 2.9
>
>         Attachments: LUCENE-1124.patch, LUCENE-1124.patch, LUCENE-1124.patch, 
> LUCENE-1124.patch
>
>
> I found this (unreplied to) email floating around in my Lucene folder from 
> during the holidays...
> {noformat}
> From: Timo Nentwig
> To: java-dev
> Subject: Fuzzy makes no sense for short tokens
> Date: Mon, 31 Dec 2007 16:01:11 +0100
> Message-Id: <200712311601.12255.luc...@nitwit.de>
> Hi!
> it generally makes no sense to search fuzzy for short tokens because changing
> even only a single character of course already results in a high edit
> distance. So it actually only makes sense in this case:
>            if( token.length() > 1f / (1f - minSimilarity) )
> E.g. changing one character in a 3-letter token (foo) results in an edit
> distance of 0.6. And if minSimilarity (which is by default: 0.5 :-) is higher
> we can save all the expensive rewrite() logic.
> {noformat}
> I don't know much about FuzzyQueries, but this reasoning seems sound ... 
> FuzzyQuery.rewrite should be able to completely skip all TermEnumeration in 
> the event that the input token is shorter then some simple math on the 
> minSimilarity.  (i'm not smart enough to be certain that the math above is 
> right however ... it's been a while since i looked at Levenstein distances 
> ... tests needed)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to