Re: Getting pyparsing to backtrack

2010-07-07 Thread Cousin Stanley

 I'm working on street address parsing again, 
 and I'm trying to deal with some of the harder cases.
  

  For yet another test case
  my actual address includes 

  ... East South Mountain Avenue


  Sometimes written as 

  ... E. South Mtn Ave


-- 
Stanley C. Kitching
Human Being
Phoenix, Arizona

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Getting pyparsing to backtrack

2010-07-06 Thread Thomas Jollans
On 07/06/2010 04:21 AM, Dennis Lee Bieber wrote:
 On Mon, 05 Jul 2010 15:19:53 -0700, John Nagle na...@animats.com
 declaimed the following in gmane.comp.python.general:
 
I'm working on street address parsing again, and I'm trying to deal
 with some of the harder cases.

 
   Hasn't it been suggested before, that the sanest method to parse
 addresses is from the end backwards... 
 
   So that: 
 
 123 N South St.
 
 is parsed as
 
 St. South N 123

You will of course need some trickery for that to work with

Hauptstr. 12





-- 
http://mail.python.org/mailman/listinfo/python-list


Getting pyparsing to backtrack

2010-07-05 Thread John Nagle

  I'm working on street address parsing again, and I'm trying to deal
with some of the harder cases.

  Here's a subparser, intended to take in things like N MAIN and 
SOUTH, and break out the directional from street name.


Directionals =  ['southeast', 'northeast', 'north', 'northwest',
 'west', 'east', 'south', 'southwest', 'SE', 'NE', 'N', 'NW',
 'W', 'E', 'S', 'SW']

direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) + 
Optional(.).suppress())


streetNameParser = Optional(direction.setResultsName(predirectional)) 
+ Combine(OneOrMore(Word(alphanums)),

adjacent=False, joinString= ).setResultsName(streetname)



This parses something like N WEBB fine; N is the predirectional,
and WEBB is the street name.

SOUTH (which, when not followed by another word, is a streetname,
not a predirectional), raises a parsing exception:

 Street address line parse failed for SOUTH : Expected W:(abcd...)
  (at  char 5), (line:1, col:6)

The problem is that direction matched SOUTH, and even though
direction is within an Optional and followed by another word,
the parser didn't back up when it hit the end of the expression
without satisfying the OneOrMore clause.

Pyparsing does some backup, but I'm not clear on how much,
or how to force it to happen.  There's some discussion at
http://www.mail-archive.com/python-list@python.org/msg169559.html;.
Apparently the Or operator will force some backup, but it's not
clear how much lookahead and backtracking is supported.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Getting pyparsing to backtrack

2010-07-05 Thread John Nagle

On 7/5/2010 3:19 PM, John Nagle wrote:

  I'm working on street address parsing again, and I'm trying to deal
with some of the harder cases.


The approach below works for the cases given.  The Or operator (^) 
supports backtracking, but Optional() apparently does not.



direction = Combine(MatchFirst(map(CaselessKeyword, directionals)) +
Optional(.).suppress())

streetNameOnly = Combine(OneOrMore(Word(alphanums)), adjacent=False,
joinString= ).setResultsName(streetname)

streetNameParser =
((direction.setResultsName(predirectional) + streetNameOnly)
^ streetNameOnly)



John Nagle
--
http://mail.python.org/mailman/listinfo/python-list