On Mon, Dec 27, 2010 at 10:31 AM, Biedermann,S.,Fa. Post Direkt <[email protected]> wrote: > > As for our problem: we are trying to build reference data against which > requests shall be matched. In this case we need quite a huge amount of string > distance measurements for preparing this reference. >
If this is your problem, i wouldn't recommend using the StringDistance directly. As i mentioned, its not designed for your use case because the way its used by spellchecker, it only needs something like 20-50 comparisons... If you try to use it the way you describe, it will be very slow, it must do O(k) comparisons, where k is the number of strings, and each comparison is O(mn), where m and n are the lengths of the input string and string being compared, respectively. Easier would be to index your terms and simply do FuzzyQuery (with trunk), specifying the exact max edit distance you want. Or if you care about getting all exact results within Levenshtein distance of some degree N, use AutomatonQuery built from LevenshteinAutomata. This will give you a sublinear number of comparisons, something complicated but more like O(sqrt(k)) where k is the number of strings, and each comparison is O(n), where n is the length of the target string. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
