The limit of 2 is hard-coded precisely because good performance for editing
distances above 2 cannot be guaranteed.
-- Jack Krupansky
-----Original Message-----
From: Michael Tobias
Sent: Wednesday, August 14, 2013 1:00 AM
To: java-user@lucene.apache.org
Subject: Fuzzy Searching on Lucene / Solr
My first post so please be gentle with me.
I am about to start 'playing' with Solr to see if it will be the correct
tool for a new searchable database development. One of my requirements is
the ability to do 'fuzzy' searches and I understand that the latest versions
of Lucene / Solr use an improved version of indexing and the Levenshtein
distance formula (or rather a modified version of Levenshtein if wished for,
treating letter transpositions as a single difference rather than 2).
Levenshtein is precisely what I need, but I also understand that the maximum
distance currently implemented is a distance of just TWO. That is not
really adequate for my purposes. I need to be able to handle at least a
distance of 3 and probably 4.
Is the current maximum distance of 2 hard-coded in the system? Can it be
over-ridden? How?
I understand that performance (both indexing and querying) may be impaired
significantly by doing this but that might be a price worth paying. If it
IS possible to change the max distance to 3 or 4 does anybody have any idea
what the performance implications might be?
Many thanks for any/all assistance you can provide.
Regards
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org