[ 
https://issues.apache.org/jira/browse/LUCENE-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454759#comment-13454759
 ] 

Uwe Schindler commented on LUCENE-2667:
---------------------------------------

Hi Francisco: The core FuzzyQuery does not support edit distances > 2, because 
the automatons used for this would be too big and slow. If you really want 
distances > 2, use 
http://lucene.apache.org/core/4_0_0-BETA/sandbox/org/apache/lucene/sandbox/queries/SlowFuzzyQuery.html
 from the sandbox module (lucene-sandbox.jar). This one is the same algorithm 
as the old 3.x FuzzyQuery (and is as slow).
                
> Fix FuzzyQuery's defaults, so its fast.
> ---------------------------------------
>
>                 Key: LUCENE-2667
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2667
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 4.0-ALPHA
>
>         Attachments: LUCENE-2667_contrib.patch, LUCENE-2667.patch, 
> LUCENE-2667.patch
>
>
> We worked a lot on FuzzyQuery, but you need to be a rocket scientist to 
> ensure good results.
> The main problem is that the default distance is 0.5f, which doesn't take 
> into account the length of the string.
> To add insult to injury, the default number of expansions is 1024 
> (traditionally from BooleanQuery maxClauseCount)
> I propose:
> * The syntax of FuzzyQuery is enhanced, so that you can specify raw edits 
> too: such as foobar~2 (all terms within 2 levenshtein edits of foobar). 
> Previously if you specified any amount >=1, you got IllegalArgumentException, 
> so this won't break anyone. You can still use foobar~0.5, and it works just 
> as before
> * The default for minimumSimilarity then becomes 
> LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, which is 2. This way if you 
> just do foobar~, its always fast.
> * The size of the priority queue is reduced by default from 1024 to a much 
> more reasonable value: 50. This is what FuzzyLikeThis uses.
> I think its best to just change the defaults for this query, since it was so 
> aweful before. We can add notes in migrate.txt that if you care about using 
> the old values, then you should provide them explicitly, and you will get the 
> same results!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to