Michael McCandless created LUCENE-7439:
------------------------------------------
Summary: Should FuzzyQuery match short terms too?
Key: LUCENE-7439
URL: https://issues.apache.org/jira/browse/LUCENE-7439
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: master (7.0), 6.3
Today, if you ask {{FuzzyQuery}} to match {{abcd}} with edit distance 2, it
will fail to match the term {{ab}} even though it's 2 edits away.
Its javadocs explain this:
{noformat}
* <p>NOTE: terms of length 1 or 2 will sometimes not match because of how the
scaled
* distance between two terms is computed. For a term to match, the edit
distance between
* the terms must be less than the minimum length term (either the input term,
or
* the candidate term). For example, FuzzyQuery on term "abcd" with maxEdits=2
will
* not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2
will not
* match an indexed term "abc".
{noformat}
On the one hand, I can see that this behavior is sort of justified in that 50%
of the characters are different and so this is a very "weak" match, but on the
other hand, it's quite unexpected since edit distance is such an exact measure
so the terms should have matched.
It seems like the behavior is caused by internal implementation details about
how the relative (floating point) score is computed. I think we should fix it,
so that edit distance 2 does in fact match all terms with edit distance <= 2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]