Re: Fuzzy matching of postal addresses

Aaron Bingham Tue, 18 Jan 2005 00:08:23 -0800

Andrew McLean wrote:

I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success.

Basically, I have two databases containing lists of postal addresses and need to look for matching addresses in the two databases. More precisely, for each address in database A I want to find a single matching address in database B.

I had a similar problem to solve a while ago. I can't give you my code, but I used this paper as the basis for my solution (BibTeX entry from http://citeseer.ist.psu.edu/monge00adaptive.html):

@misc{ monge-adaptive, author = "Alvaro E. Monge", title = "An Adaptive and Efficient Algorithm for Detecting Approximately Duplicate Database Records", url = "citeseer.ist.psu.edu/monge00adaptive.html" }

There is a lot of literature--try a google search for "approximate string match"--but very little publically available code in this area, from what I could gather. Removing punctuation, etc., as others have suggested in this thread, is _not_sufficient_. Presumably you want to be able to match typos or phonetic errors as well. This paper's algorithm deals with those problems quite nicely,

--
--------------------------------------------------------------------
Aaron Bingham
Application Developer
Cenix BioScience GmbH
--------------------------------------------------------------------

--
http://mail.python.org/mailman/listinfo/python-list

Re: Fuzzy matching of postal addresses

Reply via email to