[
https://issues.apache.org/jira/browse/LUCENE-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454759#comment-13454759
]
Uwe Schindler commented on LUCENE-2667:
---------------------------------------
Hi Francisco: The core FuzzyQuery does not support edit distances > 2, because
the automatons used for this would be too big and slow. If you really want
distances > 2, use
http://lucene.apache.org/core/4_0_0-BETA/sandbox/org/apache/lucene/sandbox/queries/SlowFuzzyQuery.html
from the sandbox module (lucene-sandbox.jar). This one is the same algorithm
as the old 3.x FuzzyQuery (and is as slow).
> Fix FuzzyQuery's defaults, so its fast.
> ---------------------------------------
>
> Key: LUCENE-2667
> URL: https://issues.apache.org/jira/browse/LUCENE-2667
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 4.0-ALPHA
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 4.0-ALPHA
>
> Attachments: LUCENE-2667_contrib.patch, LUCENE-2667.patch,
> LUCENE-2667.patch
>
>
> We worked a lot on FuzzyQuery, but you need to be a rocket scientist to
> ensure good results.
> The main problem is that the default distance is 0.5f, which doesn't take
> into account the length of the string.
> To add insult to injury, the default number of expansions is 1024
> (traditionally from BooleanQuery maxClauseCount)
> I propose:
> * The syntax of FuzzyQuery is enhanced, so that you can specify raw edits
> too: such as foobar~2 (all terms within 2 levenshtein edits of foobar).
> Previously if you specified any amount >=1, you got IllegalArgumentException,
> so this won't break anyone. You can still use foobar~0.5, and it works just
> as before
> * The default for minimumSimilarity then becomes
> LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, which is 2. This way if you
> just do foobar~, its always fast.
> * The size of the priority queue is reduced by default from 1024 to a much
> more reasonable value: 50. This is what FuzzyLikeThis uses.
> I think its best to just change the defaults for this query, since it was so
> aweful before. We can add notes in migrate.txt that if you care about using
> the old values, then you should provide them explicitly, and you will get the
> same results!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]