Re: PyParsing and Headaches

2006-11-23 Thread Bytter
Heya there,

Ok, found the solution. I just needed to use leaveWhiteSpace() in the
places I want pyparsing to take into consideration the spaces.
Thx for the help.

Cheers!

Hugo Ferreira

On Nov 23, 11:57 am, "Bytter" <[EMAIL PROTECTED]> wrote:
> (This message has already been sent to the mailing-list, but I don't
> have sure this is arriving well since it doesn't come up in the usenet,
> so I'm posting it through here now.)
>
> Chris,
>
> Thanks for your quick answer. That changes a lot of stuff, and now I'm
> able to do my parsing as I intended to.
>
> Still, there's a remaining problem. By using Combine(), everything is
> interpreted as a single token. Though what I need is that
> 'include_bool' and 'literal' be parsed as separated tokens, though
> without a space in the middle...
>
> Paul,
>
> Thanks for your detailed explanation. One of the things I think is
> missing from the documentation (or that I couldn't find easy) is the
> kind of explanation you give about 'The Way of PyParsing'. For example,
> It took me a while to understand that I could easily implement simple
> recursions using OneOrMany(Group()). Or maybe things were out there and
> I didn't searched enough...
>
> Still, fwiw, congratulations for the library. PyParsing allowed me to
> do in just a couple of hours, including learning about it's API (minus
> this little inconvenient) what would have taken me a couple of days
> with, for example,  ANTLR (in fact, I've already put aside ANTLR more
> than once in the past for a built-from-scratch parser).
>
> Cheers,
>
> Hugo Ferreira
>
> On Nov 22, 7:50 pm, Chris Lambacher <[EMAIL PROTECTED]> wrote:
>
> > On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote:
> > > Hi,
>
> > > I'm trying to construct a parser, but I'm stuck with some basic
> > > stuff... For example, I want to match the following:
>
> > > letter = "A"..."Z" | "a"..."z"
> > > literal = letter+
> > > include_bool := "+" | "-"
> > > term = [include_bool] literal
>
> > > So I defined this as:
>
> > > literal = Word(alphas)
> > > include_bool = Optional(oneOf("+ -"))
> > > term = include_bool + literal+ here means that you allow a space.  You 
> > > need to explicitly override this.
> > Try:
>
> > term = Combine(include_bool + literal)
>
> > > The problem is that:
>
> > > term.parseString("+a") -> (['+', 'a'], {}) # OK
> > > term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't
> > > recognize any token since I didn't said the SPACE was allowed between
> > > include_bool and literal.
>
> > > Can anyone give me an hand here?
>
> > > Cheers!
>
> > > Hugo Ferreira
>
> > > BTW, the following is the complete grammar I'm trying to implement with
> > > pyparsing:
>
> > > ## L ::= expr | expr L
> > > ## expr ::= term | binary_expr
> > > ## binary_expr ::= term " " binary_op " " term
> > > ## binary_op ::= "*" | "OR" | "AND"
> > > ## include_bool ::= "+" | "-"
> > > ## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~"
> > > literal)
> > > ## modifier ::= (letter | "_")+
> > > ## literal ::= word | quoted_words
> > > ## quoted_words ::= '"' word (" " word)* '"'
> > > ## word ::= (letter | digit | "_")+
> > > ## number ::= digit+
> > > ## range ::= number (".." | "...") number
> > > ## letter ::= "A"..."Z" | "a"..."z"
> > > ## digit ::= "0"..."9"
>
> > > And this is where I got so far:
>
> > > word = Word(nums + alphas + "_")
> > > binary_op = oneOf("* and or", caseless=True).setResultsName("operator")
> > > include_bool = oneOf("+ -")
> > > literal = (word | quotedString).setResultsName("literal")
> > > modifier = Word(alphas + "_")
> > > rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums)
> > > term = ((Optional(include_bool) + Optional(modifier + ":") + (literal |
> > > rng)) | ("~" + literal)).setResultsName("Term")
> > > binary_expr = (term + binary_op + term).setResultsName("binary")
> > > expr = (binary_expr | term).setResultsName("Expr")
> > > L = OneOrMore(expr)
>
> > > --
> > > GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85
> 
> > > --
> > >http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyParsing and Headaches

2006-11-23 Thread Bytter
(This message has already been sent to the mailing-list, but I don't
have sure this is arriving well since it doesn't come up in the usenet,
so I'm posting it through here now.)

Chris,

Thanks for your quick answer. That changes a lot of stuff, and now I'm
able to do my parsing as I intended to.

Still, there's a remaining problem. By using Combine(), everything is
interpreted as a single token. Though what I need is that
'include_bool' and 'literal' be parsed as separated tokens, though
without a space in the middle...

Paul,

Thanks for your detailed explanation. One of the things I think is
missing from the documentation (or that I couldn't find easy) is the
kind of explanation you give about 'The Way of PyParsing'. For example,
It took me a while to understand that I could easily implement simple
recursions using OneOrMany(Group()). Or maybe things were out there and
I didn't searched enough...

Still, fwiw, congratulations for the library. PyParsing allowed me to
do in just a couple of hours, including learning about it's API (minus
this little inconvenient) what would have taken me a couple of days
with, for example,  ANTLR (in fact, I've already put aside ANTLR more
than once in the past for a built-from-scratch parser).

Cheers,

Hugo Ferreira

On Nov 22, 7:50 pm, Chris Lambacher <[EMAIL PROTECTED]> wrote:
> On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote:
> > Hi,
>
> > I'm trying to construct a parser, but I'm stuck with some basic
> > stuff... For example, I want to match the following:
>
> > letter = "A"..."Z" | "a"..."z"
> > literal = letter+
> > include_bool := "+" | "-"
> > term = [include_bool] literal
>
> > So I defined this as:
>
> > literal = Word(alphas)
> > include_bool = Optional(oneOf("+ -"))
> > term = include_bool + literal+ here means that you allow a space.  You need 
> > to explicitly override this.
> Try:
>
> term = Combine(include_bool + literal)
>
>
>
> > The problem is that:
>
> > term.parseString("+a") -> (['+', 'a'], {}) # OK
> > term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't
> > recognize any token since I didn't said the SPACE was allowed between
> > include_bool and literal.
>
> > Can anyone give me an hand here?
>
> > Cheers!
>
> > Hugo Ferreira
>
> > BTW, the following is the complete grammar I'm trying to implement with
> > pyparsing:
>
> > ## L ::= expr | expr L
> > ## expr ::= term | binary_expr
> > ## binary_expr ::= term " " binary_op " " term
> > ## binary_op ::= "*" | "OR" | "AND"
> > ## include_bool ::= "+" | "-"
> > ## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~"
> > literal)
> > ## modifier ::= (letter | "_")+
> > ## literal ::= word | quoted_words
> > ## quoted_words ::= '"' word (" " word)* '"'
> > ## word ::= (letter | digit | "_")+
> > ## number ::= digit+
> > ## range ::= number (".." | "...") number
> > ## letter ::= "A"..."Z" | "a"..."z"
> > ## digit ::= "0"..."9"
>
> > And this is where I got so far:
>
> > word = Word(nums + alphas + "_")
> > binary_op = oneOf("* and or", caseless=True).setResultsName("operator")
> > include_bool = oneOf("+ -")
> > literal = (word | quotedString).setResultsName("literal")
> > modifier = Word(alphas + "_")
> > rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums)
> > term = ((Optional(include_bool) + Optional(modifier + ":") + (literal |
> > rng)) | ("~" + literal)).setResultsName("Term")
> > binary_expr = (term + binary_op + term).setResultsName("binary")
> > expr = (binary_expr | term).setResultsName("Expr")
> > L = OneOrMore(expr)
>
> > --
> > GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85
> 
> > --
> >http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyParsing and Headaches

2006-11-22 Thread Hugo Ferreira

Chris,

Thanks for your quick answer. That changes a lot of stuff, and now I'm able
to do my parsing as I intended to.

Paul,

Thanks for your detailed explanation. One of the things I think is missing
from the documentation (or that I couldn't find easy) is the kind of
explanation you give about 'The Way of PyParsing'. For example, It took me a
while to understand that I could easily implement simple recursions using
OneOrMany(Group()). Or maybe things were out there and I didn't searched
enough...

Still, fwiw, congratulations for the library. PyParsing allowed me to do in
just a couple of hours, including learning about it's API (minus this little
inconvenient) what would have taken me a couple of days with, for example,
ANTLR (in fact, I've already put aside ANTLR more than once in the past for
a built-from-scratch parser).

Cheers,

Hugo Ferreira

On 11/22/06, Paul McGuire <[EMAIL PROTECTED]> wrote:


"Bytter" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Hi,
>
> I'm trying to construct a parser, but I'm stuck with some basic
> stuff... For example, I want to match the following:
>
> letter = "A"..."Z" | "a"..."z"
> literal = letter+
> include_bool := "+" | "-"
> term = [include_bool] literal
>
> So I defined this as:
>
> literal = Word(alphas)
> include_bool = Optional(oneOf("+ -"))
> term = include_bool + literal
>
> The problem is that:
>
> term.parseString("+a") -> (['+', 'a'], {}) # OK
> term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't
> recognize any token since I didn't said the SPACE was allowed between
> include_bool and literal.
>

As Chris pointed out in his post, the most direct way to fix this is to
use
Combine.  Note that Combine does two things: it requires the expressions
to
be adjacent, and it combines the results into a single token.  For
instance,
when defining the expression for a real number, something like:

realnum = Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums)

Pyparsing would parse "3.14159" into the separate tokens ['', '3', '.',
'14159'].  For this grammar, pyparsing would also accept "2. 23" as ['',
'2', '.', '23'], even though there is a space between the decimal point
and
"23".  But by wrapping it inside Combine, as in:

realnum = Combine(Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums))

we accomplish two things: pyparsing only matches if all the elements are
adjacent, with no whitespace or comments; and the matched token is
returned
as ['3.14159'].  (Yes, I left off scientific notation, but it is an
extension of the same issue.)

Pyparsing in general does implicit whitespace skipping; it is part of the
zen of pyparsing, and distinguishes it from conventional regexps (although
I
think there is a new '?' switch for re's that puts '\s*'s between re terms
for you).  This is to simplify the grammar definition, so that it doesn't
need to be littered with "optional whitespace or comments could go here"
expressions; instead, whitespace and comments (or "ignorables" in
pyparsing
terminology) are parsed over before every grammar expression.  I
instituted
this out of recoil from a previous project, in which a co-developer
implemented a boolean parser by first tokenizing by whitespace, then
parsing
out the tokens.  Unfortunately, this meant that "color=='blue' &&
size=='medium'" would not parse successfully, instead requiring "color ==
'blue' && size == 'medium'".  It doesn't seem like much, but our support
guys got many calls asking why the boolean clauses weren't matching.  I
decided that when I wrote a parser, "y=m*x+b" would be just as parseable
as
"y = m * x + b".  For that matter, you'd be surprised where whitespace and
comments sneak in to people's source code: spaces after left parentheses
and
comments after semicolons, for example, are easily forgotten when spec'ing
out the syntax for a C "for" statement; whitespace inside HTML tags is
another unanticipated surprise.

So looking at your grammar, you say you don't want to have this be a
successful parse:
term.parseString("+ a") -> (['+', 'a'], {})

because, "It shouldn't recognize any token since I didn't said the SPACE
was
allowed between include_bool and literal."  In fact, pyparsing allows
spaces
by default, that's why the given parse succeeds.  I would turn this
question
around, and ask you in terms of your grammar - what SHOULD be allowed
between include_bool and literal?  If spaces are not a problem, then your
grammar as-is is sufficient.  If spaces are absolutely verboten, then
there
are 2 or 3 different techniques in pyparsing to disable the
whitespace-skipping behavior, depending on whether you want all whitespace
skipping disabled, just for literals of a certain type, or just for
literals
when following a leading include_bool sign.

Thanks for giving pyparsing a try; if you want further help, you can post
here, or on the pyparsing wiki - the discussion threads on the Home page
are
a pretty good support and message log.

-- Paul


--
http://mail.python.org/mailman/

Re: PyParsing and Headaches

2006-11-22 Thread Paul McGuire
"Bytter" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Hi,
>
> I'm trying to construct a parser, but I'm stuck with some basic
> stuff... For example, I want to match the following:
>
> letter = "A"..."Z" | "a"..."z"
> literal = letter+
> include_bool := "+" | "-"
> term = [include_bool] literal
>
> So I defined this as:
>
> literal = Word(alphas)
> include_bool = Optional(oneOf("+ -"))
> term = include_bool + literal
>
> The problem is that:
>
> term.parseString("+a") -> (['+', 'a'], {}) # OK
> term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't
> recognize any token since I didn't said the SPACE was allowed between
> include_bool and literal.
>

As Chris pointed out in his post, the most direct way to fix this is to use 
Combine.  Note that Combine does two things: it requires the expressions to 
be adjacent, and it combines the results into a single token.  For instance, 
when defining the expression for a real number, something like:

realnum = Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums)

Pyparsing would parse "3.14159" into the separate tokens ['', '3', '.', 
'14159'].  For this grammar, pyparsing would also accept "2. 23" as ['', 
'2', '.', '23'], even though there is a space between the decimal point and 
"23".  But by wrapping it inside Combine, as in:

realnum = Combine(Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums))

we accomplish two things: pyparsing only matches if all the elements are 
adjacent, with no whitespace or comments; and the matched token is returned 
as ['3.14159'].  (Yes, I left off scientific notation, but it is an 
extension of the same issue.)

Pyparsing in general does implicit whitespace skipping; it is part of the 
zen of pyparsing, and distinguishes it from conventional regexps (although I 
think there is a new '?' switch for re's that puts '\s*'s between re terms 
for you).  This is to simplify the grammar definition, so that it doesn't 
need to be littered with "optional whitespace or comments could go here" 
expressions; instead, whitespace and comments (or "ignorables" in pyparsing 
terminology) are parsed over before every grammar expression.  I instituted 
this out of recoil from a previous project, in which a co-developer 
implemented a boolean parser by first tokenizing by whitespace, then parsing 
out the tokens.  Unfortunately, this meant that "color=='blue' && 
size=='medium'" would not parse successfully, instead requiring "color == 
'blue' && size == 'medium'".  It doesn't seem like much, but our support 
guys got many calls asking why the boolean clauses weren't matching.  I 
decided that when I wrote a parser, "y=m*x+b" would be just as parseable as 
"y = m * x + b".  For that matter, you'd be surprised where whitespace and 
comments sneak in to people's source code: spaces after left parentheses and 
comments after semicolons, for example, are easily forgotten when spec'ing 
out the syntax for a C "for" statement; whitespace inside HTML tags is 
another unanticipated surprise.

So looking at your grammar, you say you don't want to have this be a 
successful parse:
term.parseString("+ a") -> (['+', 'a'], {})

because, "It shouldn't recognize any token since I didn't said the SPACE was 
allowed between include_bool and literal."  In fact, pyparsing allows spaces 
by default, that's why the given parse succeeds.  I would turn this question 
around, and ask you in terms of your grammar - what SHOULD be allowed 
between include_bool and literal?  If spaces are not a problem, then your 
grammar as-is is sufficient.  If spaces are absolutely verboten, then there 
are 2 or 3 different techniques in pyparsing to disable the 
whitespace-skipping behavior, depending on whether you want all whitespace 
skipping disabled, just for literals of a certain type, or just for literals 
when following a leading include_bool sign.

Thanks for giving pyparsing a try; if you want further help, you can post 
here, or on the pyparsing wiki - the discussion threads on the Home page are 
a pretty good support and message log.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyParsing and Headaches

2006-11-22 Thread Chris Lambacher
On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote:
> Hi,
> 
> I'm trying to construct a parser, but I'm stuck with some basic
> stuff... For example, I want to match the following:
> 
> letter = "A"..."Z" | "a"..."z"
> literal = letter+
> include_bool := "+" | "-"
> term = [include_bool] literal
> 
> So I defined this as:
> 
> literal = Word(alphas)
> include_bool = Optional(oneOf("+ -"))
> term = include_bool + literal
+ here means that you allow a space.  You need to explicitly override this.
Try:

term = Combine(include_bool + literal)

> 
> The problem is that:
> 
> term.parseString("+a") -> (['+', 'a'], {}) # OK
> term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't
> recognize any token since I didn't said the SPACE was allowed between
> include_bool and literal.
> 
> Can anyone give me an hand here?
> 
> Cheers!
> 
> Hugo Ferreira
> 
> BTW, the following is the complete grammar I'm trying to implement with
> pyparsing:
> 
> ## L ::= expr | expr L
> ## expr ::= term | binary_expr
> ## binary_expr ::= term " " binary_op " " term
> ## binary_op ::= "*" | "OR" | "AND"
> ## include_bool ::= "+" | "-"
> ## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~"
> literal)
> ## modifier ::= (letter | "_")+
> ## literal ::= word | quoted_words
> ## quoted_words ::= '"' word (" " word)* '"'
> ## word ::= (letter | digit | "_")+
> ## number ::= digit+
> ## range ::= number (".." | "...") number
> ## letter ::= "A"..."Z" | "a"..."z"
> ## digit ::= "0"..."9"
> 
> And this is where I got so far:
> 
> word = Word(nums + alphas + "_")
> binary_op = oneOf("* and or", caseless=True).setResultsName("operator")
> include_bool = oneOf("+ -")
> literal = (word | quotedString).setResultsName("literal")
> modifier = Word(alphas + "_")
> rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums)
> term = ((Optional(include_bool) + Optional(modifier + ":") + (literal |
> rng)) | ("~" + literal)).setResultsName("Term")
> binary_expr = (term + binary_op + term).setResultsName("binary")
> expr = (binary_expr | term).setResultsName("Expr")
> L = OneOrMore(expr)
> 
> 
> -- 
> GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list