Re: PyParsing and Headaches

2006-11-23 Thread Bytter
Heya there,

Ok, found the solution. I just needed to use leaveWhiteSpace() in the
places I want pyparsing to take into consideration the spaces.
Thx for the help.

Cheers!

Hugo Ferreira

On Nov 23, 11:57 am, Bytter [EMAIL PROTECTED] wrote:
 (This message has already been sent to the mailing-list, but I don't
 have sure this is arriving well since it doesn't come up in the usenet,
 so I'm posting it through here now.)

 Chris,

 Thanks for your quick answer. That changes a lot of stuff, and now I'm
 able to do my parsing as I intended to.

 Still, there's a remaining problem. By using Combine(), everything is
 interpreted as a single token. Though what I need is that
 'include_bool' and 'literal' be parsed as separated tokens, though
 without a space in the middle...

 Paul,

 Thanks for your detailed explanation. One of the things I think is
 missing from the documentation (or that I couldn't find easy) is the
 kind of explanation you give about 'The Way of PyParsing'. For example,
 It took me a while to understand that I could easily implement simple
 recursions using OneOrMany(Group()). Or maybe things were out there and
 I didn't searched enough...

 Still, fwiw, congratulations for the library. PyParsing allowed me to
 do in just a couple of hours, including learning about it's API (minus
 this little inconvenient) what would have taken me a couple of days
 with, for example,  ANTLR (in fact, I've already put aside ANTLR more
 than once in the past for a built-from-scratch parser).

 Cheers,

 Hugo Ferreira

 On Nov 22, 7:50 pm, Chris Lambacher [EMAIL PROTECTED] wrote:

  On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote:
   Hi,

   I'm trying to construct a parser, but I'm stuck with some basic
   stuff... For example, I want to match the following:

   letter = A...Z | a...z
   literal = letter+
   include_bool := + | -
   term = [include_bool] literal

   So I defined this as:

   literal = Word(alphas)
   include_bool = Optional(oneOf(+ -))
   term = include_bool + literal+ here means that you allow a space.  You 
   need to explicitly override this.
  Try:

  term = Combine(include_bool + literal)

   The problem is that:

   term.parseString(+a) - (['+', 'a'], {}) # OK
   term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't
   recognize any token since I didn't said the SPACE was allowed between
   include_bool and literal.

   Can anyone give me an hand here?

   Cheers!

   Hugo Ferreira

   BTW, the following is the complete grammar I'm trying to implement with
   pyparsing:

   ## L ::= expr | expr L
   ## expr ::= term | binary_expr
   ## binary_expr ::= term   binary_op   term
   ## binary_op ::= * | OR | AND
   ## include_bool ::= + | -
   ## term ::= ([include_bool] [modifier :] (literal | range)) | (~
   literal)
   ## modifier ::= (letter | _)+
   ## literal ::= word | quoted_words
   ## quoted_words ::= '' word (  word)* ''
   ## word ::= (letter | digit | _)+
   ## number ::= digit+
   ## range ::= number (.. | ...) number
   ## letter ::= A...Z | a...z
   ## digit ::= 0...9

   And this is where I got so far:

   word = Word(nums + alphas + _)
   binary_op = oneOf(* and or, caseless=True).setResultsName(operator)
   include_bool = oneOf(+ -)
   literal = (word | quotedString).setResultsName(literal)
   modifier = Word(alphas + _)
   rng = Word(nums) + (Literal(..) | Literal(...)) + Word(nums)
   term = ((Optional(include_bool) + Optional(modifier + :) + (literal |
   rng)) | (~ + literal)).setResultsName(Term)
   binary_expr = (term + binary_op + term).setResultsName(binary)
   expr = (binary_expr | term).setResultsName(Expr)
   L = OneOrMore(expr)

   --
   GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85
 
   --
  http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


PyParsing and Headaches

2006-11-22 Thread Hugo Ferreira

Hi,

I'm trying to construct a parser, but I'm stuck with some basic stuff... For
example, I want to match the following:

letter = A...Z | a...z
literal = letter+
include_bool := + | -
term = [include_bool] literal

So I defined this as:

literal = Word(alphas)
include_bool = Optional(oneOf(+ -))
term = include_bool + literal

The problem is that:

term.parseString(+a) - (['+', 'a'], {}) # OK
term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't recognize any
token since I didn't said the SPACE was allowed between include_bool and
literal.

Can anyone give me an hand here?

Cheers!

Hugo Ferreira

BTW, the following is the complete grammar I'm trying to implement with
pyparsing:

## L ::= expr | expr L
## expr ::= term | binary_expr
## binary_expr ::= term   binary_op   term
## binary_op ::= * | OR | AND
## include_bool ::= + | -
## term ::= ([include_bool] [modifier :] (literal | range)) | (~
literal)
## modifier ::= (letter | _)+
## literal ::= word | quoted_words
## quoted_words ::= '' word (  word)* ''
## word ::= (letter | digit | _)+
## number ::= digit+
## range ::= number (.. | ...) number
## letter ::= A...Z | a...z
## digit ::= 0...9

And this is where I got so far:

word = Word(nums + alphas + _)
binary_op = oneOf(* and or, caseless=True).setResultsName(operator)
include_bool = oneOf(+ -)
literal = (word | quotedString).setResultsName(literal)
modifier = Word(alphas + _)
rng = Word(nums) + (Literal(..) | Literal(...)) + Word(nums)
term = ((Optional(include_bool) + Optional(modifier + :) + (literal |
rng)) | (~ + literal)).setResultsName(Term)
binary_expr = (term + binary_op + term).setResultsName(binary)
expr = (binary_expr | term).setResultsName(Expr)
L = OneOrMore(expr)


--
GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85
-- 
http://mail.python.org/mailman/listinfo/python-list

PyParsing and Headaches

2006-11-22 Thread Bytter
Hi,

I'm trying to construct a parser, but I'm stuck with some basic
stuff... For example, I want to match the following:

letter = A...Z | a...z
literal = letter+
include_bool := + | -
term = [include_bool] literal

So I defined this as:

literal = Word(alphas)
include_bool = Optional(oneOf(+ -))
term = include_bool + literal

The problem is that:

term.parseString(+a) - (['+', 'a'], {}) # OK
term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't
recognize any token since I didn't said the SPACE was allowed between
include_bool and literal.

Can anyone give me an hand here?

Cheers!

Hugo Ferreira

BTW, the following is the complete grammar I'm trying to implement with
pyparsing:

## L ::= expr | expr L
## expr ::= term | binary_expr
## binary_expr ::= term   binary_op   term
## binary_op ::= * | OR | AND
## include_bool ::= + | -
## term ::= ([include_bool] [modifier :] (literal | range)) | (~
literal)
## modifier ::= (letter | _)+
## literal ::= word | quoted_words
## quoted_words ::= '' word (  word)* ''
## word ::= (letter | digit | _)+
## number ::= digit+
## range ::= number (.. | ...) number
## letter ::= A...Z | a...z
## digit ::= 0...9

And this is where I got so far:

word = Word(nums + alphas + _)
binary_op = oneOf(* and or, caseless=True).setResultsName(operator)
include_bool = oneOf(+ -)
literal = (word | quotedString).setResultsName(literal)
modifier = Word(alphas + _)
rng = Word(nums) + (Literal(..) | Literal(...)) + Word(nums)
term = ((Optional(include_bool) + Optional(modifier + :) + (literal |
rng)) | (~ + literal)).setResultsName(Term)
binary_expr = (term + binary_op + term).setResultsName(binary)
expr = (binary_expr | term).setResultsName(Expr)
L = OneOrMore(expr)


-- 
GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyParsing and Headaches

2006-11-22 Thread Chris Lambacher
On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote:
 Hi,
 
 I'm trying to construct a parser, but I'm stuck with some basic
 stuff... For example, I want to match the following:
 
 letter = A...Z | a...z
 literal = letter+
 include_bool := + | -
 term = [include_bool] literal
 
 So I defined this as:
 
 literal = Word(alphas)
 include_bool = Optional(oneOf(+ -))
 term = include_bool + literal
+ here means that you allow a space.  You need to explicitly override this.
Try:

term = Combine(include_bool + literal)

 
 The problem is that:
 
 term.parseString(+a) - (['+', 'a'], {}) # OK
 term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't
 recognize any token since I didn't said the SPACE was allowed between
 include_bool and literal.
 
 Can anyone give me an hand here?
 
 Cheers!
 
 Hugo Ferreira
 
 BTW, the following is the complete grammar I'm trying to implement with
 pyparsing:
 
 ## L ::= expr | expr L
 ## expr ::= term | binary_expr
 ## binary_expr ::= term   binary_op   term
 ## binary_op ::= * | OR | AND
 ## include_bool ::= + | -
 ## term ::= ([include_bool] [modifier :] (literal | range)) | (~
 literal)
 ## modifier ::= (letter | _)+
 ## literal ::= word | quoted_words
 ## quoted_words ::= '' word (  word)* ''
 ## word ::= (letter | digit | _)+
 ## number ::= digit+
 ## range ::= number (.. | ...) number
 ## letter ::= A...Z | a...z
 ## digit ::= 0...9
 
 And this is where I got so far:
 
 word = Word(nums + alphas + _)
 binary_op = oneOf(* and or, caseless=True).setResultsName(operator)
 include_bool = oneOf(+ -)
 literal = (word | quotedString).setResultsName(literal)
 modifier = Word(alphas + _)
 rng = Word(nums) + (Literal(..) | Literal(...)) + Word(nums)
 term = ((Optional(include_bool) + Optional(modifier + :) + (literal |
 rng)) | (~ + literal)).setResultsName(Term)
 binary_expr = (term + binary_op + term).setResultsName(binary)
 expr = (binary_expr | term).setResultsName(Expr)
 L = OneOrMore(expr)
 
 
 -- 
 GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85
 
 -- 
 http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyParsing and Headaches

2006-11-22 Thread Paul McGuire
Bytter [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Hi,

 I'm trying to construct a parser, but I'm stuck with some basic
 stuff... For example, I want to match the following:

 letter = A...Z | a...z
 literal = letter+
 include_bool := + | -
 term = [include_bool] literal

 So I defined this as:

 literal = Word(alphas)
 include_bool = Optional(oneOf(+ -))
 term = include_bool + literal

 The problem is that:

 term.parseString(+a) - (['+', 'a'], {}) # OK
 term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't
 recognize any token since I didn't said the SPACE was allowed between
 include_bool and literal.


As Chris pointed out in his post, the most direct way to fix this is to use 
Combine.  Note that Combine does two things: it requires the expressions to 
be adjacent, and it combines the results into a single token.  For instance, 
when defining the expression for a real number, something like:

realnum = Optional(oneOf(+ -)) + Word(nums) + . + Word(nums)

Pyparsing would parse 3.14159 into the separate tokens ['', '3', '.', 
'14159'].  For this grammar, pyparsing would also accept 2. 23 as ['', 
'2', '.', '23'], even though there is a space between the decimal point and 
23.  But by wrapping it inside Combine, as in:

realnum = Combine(Optional(oneOf(+ -)) + Word(nums) + . + Word(nums))

we accomplish two things: pyparsing only matches if all the elements are 
adjacent, with no whitespace or comments; and the matched token is returned 
as ['3.14159'].  (Yes, I left off scientific notation, but it is an 
extension of the same issue.)

Pyparsing in general does implicit whitespace skipping; it is part of the 
zen of pyparsing, and distinguishes it from conventional regexps (although I 
think there is a new '?' switch for re's that puts '\s*'s between re terms 
for you).  This is to simplify the grammar definition, so that it doesn't 
need to be littered with optional whitespace or comments could go here 
expressions; instead, whitespace and comments (or ignorables in pyparsing 
terminology) are parsed over before every grammar expression.  I instituted 
this out of recoil from a previous project, in which a co-developer 
implemented a boolean parser by first tokenizing by whitespace, then parsing 
out the tokens.  Unfortunately, this meant that color=='blue'  
size=='medium' would not parse successfully, instead requiring color == 
'blue'  size == 'medium'.  It doesn't seem like much, but our support 
guys got many calls asking why the boolean clauses weren't matching.  I 
decided that when I wrote a parser, y=m*x+b would be just as parseable as 
y = m * x + b.  For that matter, you'd be surprised where whitespace and 
comments sneak in to people's source code: spaces after left parentheses and 
comments after semicolons, for example, are easily forgotten when spec'ing 
out the syntax for a C for statement; whitespace inside HTML tags is 
another unanticipated surprise.

So looking at your grammar, you say you don't want to have this be a 
successful parse:
term.parseString(+ a) - (['+', 'a'], {})

because, It shouldn't recognize any token since I didn't said the SPACE was 
allowed between include_bool and literal.  In fact, pyparsing allows spaces 
by default, that's why the given parse succeeds.  I would turn this question 
around, and ask you in terms of your grammar - what SHOULD be allowed 
between include_bool and literal?  If spaces are not a problem, then your 
grammar as-is is sufficient.  If spaces are absolutely verboten, then there 
are 2 or 3 different techniques in pyparsing to disable the 
whitespace-skipping behavior, depending on whether you want all whitespace 
skipping disabled, just for literals of a certain type, or just for literals 
when following a leading include_bool sign.

Thanks for giving pyparsing a try; if you want further help, you can post 
here, or on the pyparsing wiki - the discussion threads on the Home page are 
a pretty good support and message log.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyParsing and Headaches

2006-11-22 Thread Hugo Ferreira

Chris,

Thanks for your quick answer. That changes a lot of stuff, and now I'm able
to do my parsing as I intended to.

Paul,

Thanks for your detailed explanation. One of the things I think is missing
from the documentation (or that I couldn't find easy) is the kind of
explanation you give about 'The Way of PyParsing'. For example, It took me a
while to understand that I could easily implement simple recursions using
OneOrMany(Group()). Or maybe things were out there and I didn't searched
enough...

Still, fwiw, congratulations for the library. PyParsing allowed me to do in
just a couple of hours, including learning about it's API (minus this little
inconvenient) what would have taken me a couple of days with, for example,
ANTLR (in fact, I've already put aside ANTLR more than once in the past for
a built-from-scratch parser).

Cheers,

Hugo Ferreira

On 11/22/06, Paul McGuire [EMAIL PROTECTED] wrote:


Bytter [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 Hi,

 I'm trying to construct a parser, but I'm stuck with some basic
 stuff... For example, I want to match the following:

 letter = A...Z | a...z
 literal = letter+
 include_bool := + | -
 term = [include_bool] literal

 So I defined this as:

 literal = Word(alphas)
 include_bool = Optional(oneOf(+ -))
 term = include_bool + literal

 The problem is that:

 term.parseString(+a) - (['+', 'a'], {}) # OK
 term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't
 recognize any token since I didn't said the SPACE was allowed between
 include_bool and literal.


As Chris pointed out in his post, the most direct way to fix this is to
use
Combine.  Note that Combine does two things: it requires the expressions
to
be adjacent, and it combines the results into a single token.  For
instance,
when defining the expression for a real number, something like:

realnum = Optional(oneOf(+ -)) + Word(nums) + . + Word(nums)

Pyparsing would parse 3.14159 into the separate tokens ['', '3', '.',
'14159'].  For this grammar, pyparsing would also accept 2. 23 as ['',
'2', '.', '23'], even though there is a space between the decimal point
and
23.  But by wrapping it inside Combine, as in:

realnum = Combine(Optional(oneOf(+ -)) + Word(nums) + . + Word(nums))

we accomplish two things: pyparsing only matches if all the elements are
adjacent, with no whitespace or comments; and the matched token is
returned
as ['3.14159'].  (Yes, I left off scientific notation, but it is an
extension of the same issue.)

Pyparsing in general does implicit whitespace skipping; it is part of the
zen of pyparsing, and distinguishes it from conventional regexps (although
I
think there is a new '?' switch for re's that puts '\s*'s between re terms
for you).  This is to simplify the grammar definition, so that it doesn't
need to be littered with optional whitespace or comments could go here
expressions; instead, whitespace and comments (or ignorables in
pyparsing
terminology) are parsed over before every grammar expression.  I
instituted
this out of recoil from a previous project, in which a co-developer
implemented a boolean parser by first tokenizing by whitespace, then
parsing
out the tokens.  Unfortunately, this meant that color=='blue' 
size=='medium' would not parse successfully, instead requiring color ==
'blue'  size == 'medium'.  It doesn't seem like much, but our support
guys got many calls asking why the boolean clauses weren't matching.  I
decided that when I wrote a parser, y=m*x+b would be just as parseable
as
y = m * x + b.  For that matter, you'd be surprised where whitespace and
comments sneak in to people's source code: spaces after left parentheses
and
comments after semicolons, for example, are easily forgotten when spec'ing
out the syntax for a C for statement; whitespace inside HTML tags is
another unanticipated surprise.

So looking at your grammar, you say you don't want to have this be a
successful parse:
term.parseString(+ a) - (['+', 'a'], {})

because, It shouldn't recognize any token since I didn't said the SPACE
was
allowed between include_bool and literal.  In fact, pyparsing allows
spaces
by default, that's why the given parse succeeds.  I would turn this
question
around, and ask you in terms of your grammar - what SHOULD be allowed
between include_bool and literal?  If spaces are not a problem, then your
grammar as-is is sufficient.  If spaces are absolutely verboten, then
there
are 2 or 3 different techniques in pyparsing to disable the
whitespace-skipping behavior, depending on whether you want all whitespace
skipping disabled, just for literals of a certain type, or just for
literals
when following a leading include_bool sign.

Thanks for giving pyparsing a try; if you want further help, you can post
here, or on the pyparsing wiki - the discussion threads on the Home page
are
a pretty good support and message log.

-- Paul


--
http://mail.python.org/mailman/listinfo/python-list





--
GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C