John Nagle <na...@animats.com> wrote: > > Unfortunately, now it won't run with the released >version of "pyparsing" (1.5.2, from April 2009), because it uses >"originalTextFor", a feature introduced since then. I worked around that, >but discovered that the new version is case-sensitive. Changed >"Keyword" to "CaselessKeyword" where appropriate. > > I put in the full list of USPS street types, and discovered >that "1500 DEER CREEK LANE" still parses with a street name >of "DEER", and a street type fo "CREEK", because "CREEK" is a >USPS street type. Need to do something to pick up the last street >type, not the first. I'm not sure how to do that with pyparsing. >Maybe if I buy the book... > > There's still a problem with: "2081 N Webb Rd", where the street name >comes out as "N WEBB". >Addresses like "1234 5th St. S." yield a street name of "5 TH", >but if the directional is before the name, it ends up with the name. > > Getting closer, though. If I can get to 95% of common cases, I'll >be happy.
This is a very tricky problem. Consider Salem, Oregon, which puts the direction after the street: 3340 Astoria Way NE Salem, OR 97303 Consider northern Los Angeles County, which use directions both before and after. I used to live at: 44720 N 2nd St E Lancaster, CA 93534 Consider much of Utah, which is both easy (because of its very neat grid) and a pain, because of addresses like: 389 W 1700 S Salt Lake City, UT 84115 -- Tim Roberts, t...@probo.com Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list