Re: Getting pyparsing to backtrack
I'm working on street address parsing again, and I'm trying to deal with some of the harder cases. For yet another test case my actual address includes ... East South Mountain Avenue Sometimes written as ... E. South Mtn Ave -- Stanley C. Kitching Human Being Phoenix, Arizona -- http://mail.python.org/mailman/listinfo/python-list
Re: Getting pyparsing to backtrack
On 07/06/2010 04:21 AM, Dennis Lee Bieber wrote: On Mon, 05 Jul 2010 15:19:53 -0700, John Nagle na...@animats.com declaimed the following in gmane.comp.python.general: I'm working on street address parsing again, and I'm trying to deal with some of the harder cases. Hasn't it been suggested before, that the sanest method to parse addresses is from the end backwards... So that: 123 N South St. is parsed as St. South N 123 You will of course need some trickery for that to work with Hauptstr. 12 -- http://mail.python.org/mailman/listinfo/python-list
Getting pyparsing to backtrack
I'm working on street address parsing again, and I'm trying to deal with some of the harder cases. Here's a subparser, intended to take in things like N MAIN and SOUTH, and break out the directional from street name. Directionals = ['southeast', 'northeast', 'north', 'northwest', 'west', 'east', 'south', 'southwest', 'SE', 'NE', 'N', 'NW', 'W', 'E', 'S', 'SW'] direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) + Optional(.).suppress()) streetNameParser = Optional(direction.setResultsName(predirectional)) + Combine(OneOrMore(Word(alphanums)), adjacent=False, joinString= ).setResultsName(streetname) This parses something like N WEBB fine; N is the predirectional, and WEBB is the street name. SOUTH (which, when not followed by another word, is a streetname, not a predirectional), raises a parsing exception: Street address line parse failed for SOUTH : Expected W:(abcd...) (at char 5), (line:1, col:6) The problem is that direction matched SOUTH, and even though direction is within an Optional and followed by another word, the parser didn't back up when it hit the end of the expression without satisfying the OneOrMore clause. Pyparsing does some backup, but I'm not clear on how much, or how to force it to happen. There's some discussion at http://www.mail-archive.com/python-list@python.org/msg169559.html;. Apparently the Or operator will force some backup, but it's not clear how much lookahead and backtracking is supported. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Getting pyparsing to backtrack
On 7/5/2010 3:19 PM, John Nagle wrote: I'm working on street address parsing again, and I'm trying to deal with some of the harder cases. The approach below works for the cases given. The Or operator (^) supports backtracking, but Optional() apparently does not. direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) + Optional(.).suppress()) streetNameOnly = Combine(OneOrMore(Word(alphanums)), adjacent=False, joinString= ).setResultsName(streetname) streetNameParser = ((direction.setResultsName(predirectional) + streetNameOnly) ^ streetNameOnly) John Nagle -- http://mail.python.org/mailman/listinfo/python-list