Andrew McLean wrote:
I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success.

Basically, I have two databases containing lists of postal addresses and need to look for matching addresses in the two databases. More precisely, for each address in database A I want to find a single matching address in database B.

I had a similar problem to solve a while ago. I can't give you my code, but I used this paper as the basis for my solution (BibTeX entry from http://citeseer.ist.psu.edu/monge00adaptive.html):


@misc{ monge-adaptive,
author = "Alvaro E. Monge",
title = "An Adaptive and Efficient Algorithm for Detecting Approximately Duplicate
Database Records",
url = "citeseer.ist.psu.edu/monge00adaptive.html" }


There is a lot of literature--try a google search for "approximate string match"--but very little publically available code in this area, from what I could gather. Removing punctuation, etc., as others have suggested in this thread, is _not_sufficient_. Presumably you want to be able to match typos or phonetic errors as well. This paper's algorithm deals with those problems quite nicely,

--
--------------------------------------------------------------------
Aaron Bingham
Application Developer
Cenix BioScience GmbH
--------------------------------------------------------------------

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to