[ 
https://issues.apache.org/jira/browse/LUCENE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676894#comment-13676894
 ] 

Michael McCandless commented on LUCENE-5033:
--------------------------------------------

bq. I, too, was hoping to avoid calcSimilarity if raw is true, but I think we 
need it to calculate the boost. Let me know if I'm missing something.

Ahh, you're right ... I missed that.  OK.

bq. The bug in the original code was that FilteredTermsEnum sets minSimilarity 
to 0 when the user-specified minSimilarity is >= 1.0f. So, in 
SlowFuzzyTermsEnum, similarity (unless it was Float.NEGATIVE_INFINITY) was 
typically > minSimilarity no matter its value. In other words, when the client 
code made the call with minSimilarity >=1.0f, that value was correctly recorded 
in maxEdits, but maxEdits wasn't the determining factor in whether 
SlowFuzzyTerms accepted a term.

Oh, I see: FuzzyTermsEnum does this in its ctor, and SlowFuzzyTermsEnum extends 
that.  Now I understand the bug ... thanks.

bq. Doing an explicit levenshtein calculation here sort of defeats the entire 
purpose of having levenshtein automata at all!

But this fix only applies in cases (edit distance > 2) where automaton's don't, 
I think?  (The fixes are to LinearFuzzyTermsEnum).
                
> SlowFuzzyQuery appears to fail with edit distance >=3 in some cases
> -------------------------------------------------------------------
>
>                 Key: LUCENE-5033
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5033
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/other
>    Affects Versions: 4.3
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: LUCENE-5033.patch
>
>
> Levenshtein edit btwn "monday" and "montugu" should be 4.  The following 
> shows a query with "sim" set to 3, and there is a hit.
>   public void testFuzzinessLong2() throws Exception {
>      Directory directory = newDirectory();
>      RandomIndexWriter writer = new RandomIndexWriter(random(), directory);
>      addDoc("monday", writer);
>      
>      IndexReader reader = writer.getReader();
>      IndexSearcher searcher = newSearcher(reader);
>      writer.close();
>      SlowFuzzyQuery query;
>      query = new SlowFuzzyQuery(new Term("field", "montugu"), 3, 0);   
>      ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
>      assertEquals(0, hits.length);
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to