Iain King wrote:
Not sure on the volume of addresses you're working with, but as an
alternative you could try grabbing the zip code, looking up all
addresses in that zip code, and then finding whatever one of those
address strings most closely resembles your address string (smallest
Levenshtein distance?).

   The parser doesn't have to be perfect, but it should
reliably reports when it fails.  Then I can run the hard cases through
one of the commercial online address standardizers.  I'd like to
be able to knock off the easy cases cheaply.

   What I want to do is to first extract the street number and
undecorated street name only, match that to a large database of US businesses
stored in MySQL, and then find the best match from the database
hits.  So I need reliable extraction of undecorated street name and number.  The
other fields are less important.

                                John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to