Fuzzy matching...now I really won't be able to sleep tonight. I was pretty
obsessive for some time. There are a bunch of good papers about using
n-gram tables (2, 3, or 4-grams) in SQL databases to perform highly
optimized comparisons. It takes that little bit of extra setup, but then
you can get solid coverage of huge data sets with excellent performance. I
love Levenshtein, but n-grams are better at detecting names entered out of
order, like "Adam David" instead of "David Adams." LCS and Levenshtein
won't find this as a match, but even a 3-gram will flag it.

Let me know if you're interested and I'll dig up papers or references.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Reply via email to