Michael McCandless created LUCENE-7439:
------------------------------------------

             Summary: Should FuzzyQuery match short terms too?
                 Key: LUCENE-7439
                 URL: https://issues.apache.org/jira/browse/LUCENE-7439
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: master (7.0), 6.3


Today, if you ask {{FuzzyQuery}} to match {{abcd}} with edit distance 2, it 
will fail to match the term {{ab}} even though it's 2 edits away.

Its javadocs explain this:

{noformat}
 * <p>NOTE: terms of length 1 or 2 will sometimes not match because of how the 
scaled
 * distance between two terms is computed.  For a term to match, the edit 
distance between
 * the terms must be less than the minimum length term (either the input term, 
or
 * the candidate term).  For example, FuzzyQuery on term "abcd" with maxEdits=2 
will
 * not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 
will not
 * match an indexed term "abc".
{noformat}

On the one hand, I can see that this behavior is sort of justified in that 50% 
of the characters are different and so this is a very "weak" match, but on the 
other hand, it's quite unexpected since edit distance is such an exact measure 
so the terms should have matched.

It seems like the behavior is caused by internal implementation details about 
how the relative (floating point) score is computed.  I think we should fix it, 
so that edit distance 2 does in fact match all terms with edit distance <= 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to