John Nagle wrote:
  Is there a usable street address parser available?  There are some
bad ones out there, but nothing good that I've found other than commercial
products with large databases.  I don't need 100% accuracy, but I'd like
to be able to extract street name and street number for at least 98% of
US mailing addresses.

  There's pyparsing, of course. There's a street address parser as an
example at "http://pyparsing.wikispaces.com/file/view/streetAddressParser.py";.

  The author of that module has changed the code, and it has some
new features.  This is much better.

  Unfortunately, now it won't run with the released
version of "pyparsing" (1.5.2, from April 2009), because it uses
"originalTextFor", a feature introduced since then.  I worked around that,
but discovered that the new version is case-sensitive.  Changed
"Keyword" to "CaselessKeyword" where appropriate.

  I put in the full list of USPS street types, and discovered
that "1500 DEER CREEK LANE" still parses with a street name
of "DEER", and a street type fo "CREEK", because "CREEK" is a
USPS street type.  Need to do something to pick up the last street
type, not the first.  I'm not sure how to do that with pyparsing.
Maybe if I buy the book...

  There's still a problem with: "2081 N Webb Rd", where the street name
comes out as "N WEBB".
Addresses like "1234 5th St. S." yield a street name of "5 TH",
but if the directional is before the name, it ends up with the name.

  Getting closer, though.  If I can get to 95% of common cases, I'll
be happy.


                                John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to