Iain King wrote:
Not sure on the volume of addresses you're working with, but as an alternative you could try grabbing the zip code, looking up all addresses in that zip code, and then finding whatever one of those address strings most closely resembles your address string (smallest Levenshtein distance?).
The parser doesn't have to be perfect, but it should reliably reports when it fails. Then I can run the hard cases through one of the commercial online address standardizers. I'd like to be able to knock off the easy cases cheaply. What I want to do is to first extract the street number and undecorated street name only, match that to a large database of US businesses stored in MySQL, and then find the best match from the database hits. So I need reliable extraction of undecorated street name and number. The other fields are less important. John Nagle -- http://mail.python.org/mailman/listinfo/python-list