I found a Levenshtein function somewhere last year and have been using it with MariaDB as a function on the MariaDB server, called from my VFP 9 application. It's exceptionally fast and works pretty well.

My application needs to get "as close as" matches to a random string (for manufacturer product SKUs, which can be any length, any alphanumeric mishmash...example: RGB745WEHWW).

Sometimes the user enters everything right except one character, and this function returns a weighted list of "as close as I can find" known SKUs from a table of 45,000+.

I can send the function to anyone who is interested.

Here's an example of how I call it (the Levenshtien function is named klose, pcSearch is the text string to search for)
    select sku, klose(sku,?pcSearch) as score
from (select sku from skus where soundex(sku) like soundex(?pcSearch)) as hits
            order by score desc limit 10

Mike Copeland


Garrett Fitzgerald wrote:
I wrote a FLL to do Levenshtein distances for fuzzy name matching, but
everything was posted to my blog, which is no longer online. It wasn't
amazingly hard to figure out, though, so it might be worth finding the
algorithm in C and recreating my steps. It ran much faster than equivalent
Fox code did.

On Wed, Apr 12, 2017 at 12:49 PM, Stephen Russell <srussell...@gmail.com>
wrote:

I remember this joy of searching names in a system that had 2+ million
customers and names were all varchar() instead of a key to a secondary
table.  My indexes sure took a beating when I got another "Williams", the
number one last name in the system, and it had to tear a page to make a new
page in this area.

I found that making a table called NAMES fixed the search time I was
experiencing.  Two text boxes had input for whatever they keyed.  I added
the % for wildcard after any text in each box and one of the keypress
events was the trigger to run it.

Select <field_list>
from customer
where lNameID in (
select nameID from names
where Name like @Lname)
and
fNameID in (
select nameID from names na
where na.Name like @Fname)

That has been 10-13 years ago.




On Wed, Apr 12, 2017 at 9:55 AM, Ken Dibble <krdib...@stny.rr.com> wrote:

Hi folks,

I've been thinking of how I can improve the ability of my users to find
people's names in a system that has over 30,000 people in it.

I've looked at soundex, and I've considered munging names to remove
spaces, apostrophes, hyphens, etc. The thing about those approaches is
that
in order to be efficient, they require pre-processing all of the names in
the system and storing the results, which can then be queried to find
matches.

Unfortunately, that would require modifications to the database, which I
try to avoid due to the downtime they require.

I'm looking for suggestions on how to produce results that include close
matches on last names that doesn't require pre-processing.

I've played with various schemes to assign "weights" to matches based on
the number of matching letters, but they all end up being very slooooow
and
also producing too many false positives.

I suppose there are no easy answers, but if anyone has an algorithm for
this kind of thing that they would be willing to share, I'd be grateful.

Thanks.

Ken Dibble
www.stic-cil.org



_______________________________________________
Post Messages to: ProFox@leafe.com
Subscription Maintenance: http://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: http://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: 
http://leafe.com/archives/byMID/profox/fb983af6-5c07-9184-ca20-96d81a0dd...@ggisoft.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Reply via email to