Re: PyParsing and Headaches
Heya there, Ok, found the solution. I just needed to use leaveWhiteSpace() in the places I want pyparsing to take into consideration the spaces. Thx for the help. Cheers! Hugo Ferreira On Nov 23, 11:57 am, "Bytter" <[EMAIL PROTECTED]> wrote: > (This message has already been sent to the mailing-list, but I don't > have sure this is arriving well since it doesn't come up in the usenet, > so I'm posting it through here now.) > > Chris, > > Thanks for your quick answer. That changes a lot of stuff, and now I'm > able to do my parsing as I intended to. > > Still, there's a remaining problem. By using Combine(), everything is > interpreted as a single token. Though what I need is that > 'include_bool' and 'literal' be parsed as separated tokens, though > without a space in the middle... > > Paul, > > Thanks for your detailed explanation. One of the things I think is > missing from the documentation (or that I couldn't find easy) is the > kind of explanation you give about 'The Way of PyParsing'. For example, > It took me a while to understand that I could easily implement simple > recursions using OneOrMany(Group()). Or maybe things were out there and > I didn't searched enough... > > Still, fwiw, congratulations for the library. PyParsing allowed me to > do in just a couple of hours, including learning about it's API (minus > this little inconvenient) what would have taken me a couple of days > with, for example, ANTLR (in fact, I've already put aside ANTLR more > than once in the past for a built-from-scratch parser). > > Cheers, > > Hugo Ferreira > > On Nov 22, 7:50 pm, Chris Lambacher <[EMAIL PROTECTED]> wrote: > > > On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote: > > > Hi, > > > > I'm trying to construct a parser, but I'm stuck with some basic > > > stuff... For example, I want to match the following: > > > > letter = "A"..."Z" | "a"..."z" > > > literal = letter+ > > > include_bool := "+" | "-" > > > term = [include_bool] literal > > > > So I defined this as: > > > > literal = Word(alphas) > > > include_bool = Optional(oneOf("+ -")) > > > term = include_bool + literal+ here means that you allow a space. You > > > need to explicitly override this. > > Try: > > > term = Combine(include_bool + literal) > > > > The problem is that: > > > > term.parseString("+a") -> (['+', 'a'], {}) # OK > > > term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't > > > recognize any token since I didn't said the SPACE was allowed between > > > include_bool and literal. > > > > Can anyone give me an hand here? > > > > Cheers! > > > > Hugo Ferreira > > > > BTW, the following is the complete grammar I'm trying to implement with > > > pyparsing: > > > > ## L ::= expr | expr L > > > ## expr ::= term | binary_expr > > > ## binary_expr ::= term " " binary_op " " term > > > ## binary_op ::= "*" | "OR" | "AND" > > > ## include_bool ::= "+" | "-" > > > ## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~" > > > literal) > > > ## modifier ::= (letter | "_")+ > > > ## literal ::= word | quoted_words > > > ## quoted_words ::= '"' word (" " word)* '"' > > > ## word ::= (letter | digit | "_")+ > > > ## number ::= digit+ > > > ## range ::= number (".." | "...") number > > > ## letter ::= "A"..."Z" | "a"..."z" > > > ## digit ::= "0"..."9" > > > > And this is where I got so far: > > > > word = Word(nums + alphas + "_") > > > binary_op = oneOf("* and or", caseless=True).setResultsName("operator") > > > include_bool = oneOf("+ -") > > > literal = (word | quotedString).setResultsName("literal") > > > modifier = Word(alphas + "_") > > > rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums) > > > term = ((Optional(include_bool) + Optional(modifier + ":") + (literal | > > > rng)) | ("~" + literal)).setResultsName("Term") > > > binary_expr = (term + binary_op + term).setResultsName("binary") > > > expr = (binary_expr | term).setResultsName("Expr") > > > L = OneOrMore(expr) > > > > -- > > > GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85 > > > > -- > > >http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: PyParsing and Headaches
(This message has already been sent to the mailing-list, but I don't have sure this is arriving well since it doesn't come up in the usenet, so I'm posting it through here now.) Chris, Thanks for your quick answer. That changes a lot of stuff, and now I'm able to do my parsing as I intended to. Still, there's a remaining problem. By using Combine(), everything is interpreted as a single token. Though what I need is that 'include_bool' and 'literal' be parsed as separated tokens, though without a space in the middle... Paul, Thanks for your detailed explanation. One of the things I think is missing from the documentation (or that I couldn't find easy) is the kind of explanation you give about 'The Way of PyParsing'. For example, It took me a while to understand that I could easily implement simple recursions using OneOrMany(Group()). Or maybe things were out there and I didn't searched enough... Still, fwiw, congratulations for the library. PyParsing allowed me to do in just a couple of hours, including learning about it's API (minus this little inconvenient) what would have taken me a couple of days with, for example, ANTLR (in fact, I've already put aside ANTLR more than once in the past for a built-from-scratch parser). Cheers, Hugo Ferreira On Nov 22, 7:50 pm, Chris Lambacher <[EMAIL PROTECTED]> wrote: > On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote: > > Hi, > > > I'm trying to construct a parser, but I'm stuck with some basic > > stuff... For example, I want to match the following: > > > letter = "A"..."Z" | "a"..."z" > > literal = letter+ > > include_bool := "+" | "-" > > term = [include_bool] literal > > > So I defined this as: > > > literal = Word(alphas) > > include_bool = Optional(oneOf("+ -")) > > term = include_bool + literal+ here means that you allow a space. You need > > to explicitly override this. > Try: > > term = Combine(include_bool + literal) > > > > > The problem is that: > > > term.parseString("+a") -> (['+', 'a'], {}) # OK > > term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't > > recognize any token since I didn't said the SPACE was allowed between > > include_bool and literal. > > > Can anyone give me an hand here? > > > Cheers! > > > Hugo Ferreira > > > BTW, the following is the complete grammar I'm trying to implement with > > pyparsing: > > > ## L ::= expr | expr L > > ## expr ::= term | binary_expr > > ## binary_expr ::= term " " binary_op " " term > > ## binary_op ::= "*" | "OR" | "AND" > > ## include_bool ::= "+" | "-" > > ## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~" > > literal) > > ## modifier ::= (letter | "_")+ > > ## literal ::= word | quoted_words > > ## quoted_words ::= '"' word (" " word)* '"' > > ## word ::= (letter | digit | "_")+ > > ## number ::= digit+ > > ## range ::= number (".." | "...") number > > ## letter ::= "A"..."Z" | "a"..."z" > > ## digit ::= "0"..."9" > > > And this is where I got so far: > > > word = Word(nums + alphas + "_") > > binary_op = oneOf("* and or", caseless=True).setResultsName("operator") > > include_bool = oneOf("+ -") > > literal = (word | quotedString).setResultsName("literal") > > modifier = Word(alphas + "_") > > rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums) > > term = ((Optional(include_bool) + Optional(modifier + ":") + (literal | > > rng)) | ("~" + literal)).setResultsName("Term") > > binary_expr = (term + binary_op + term).setResultsName("binary") > > expr = (binary_expr | term).setResultsName("Expr") > > L = OneOrMore(expr) > > > -- > > GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85 > > > -- > >http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: PyParsing and Headaches
Chris, Thanks for your quick answer. That changes a lot of stuff, and now I'm able to do my parsing as I intended to. Paul, Thanks for your detailed explanation. One of the things I think is missing from the documentation (or that I couldn't find easy) is the kind of explanation you give about 'The Way of PyParsing'. For example, It took me a while to understand that I could easily implement simple recursions using OneOrMany(Group()). Or maybe things were out there and I didn't searched enough... Still, fwiw, congratulations for the library. PyParsing allowed me to do in just a couple of hours, including learning about it's API (minus this little inconvenient) what would have taken me a couple of days with, for example, ANTLR (in fact, I've already put aside ANTLR more than once in the past for a built-from-scratch parser). Cheers, Hugo Ferreira On 11/22/06, Paul McGuire <[EMAIL PROTECTED]> wrote: "Bytter" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > > I'm trying to construct a parser, but I'm stuck with some basic > stuff... For example, I want to match the following: > > letter = "A"..."Z" | "a"..."z" > literal = letter+ > include_bool := "+" | "-" > term = [include_bool] literal > > So I defined this as: > > literal = Word(alphas) > include_bool = Optional(oneOf("+ -")) > term = include_bool + literal > > The problem is that: > > term.parseString("+a") -> (['+', 'a'], {}) # OK > term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't > recognize any token since I didn't said the SPACE was allowed between > include_bool and literal. > As Chris pointed out in his post, the most direct way to fix this is to use Combine. Note that Combine does two things: it requires the expressions to be adjacent, and it combines the results into a single token. For instance, when defining the expression for a real number, something like: realnum = Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums) Pyparsing would parse "3.14159" into the separate tokens ['', '3', '.', '14159']. For this grammar, pyparsing would also accept "2. 23" as ['', '2', '.', '23'], even though there is a space between the decimal point and "23". But by wrapping it inside Combine, as in: realnum = Combine(Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums)) we accomplish two things: pyparsing only matches if all the elements are adjacent, with no whitespace or comments; and the matched token is returned as ['3.14159']. (Yes, I left off scientific notation, but it is an extension of the same issue.) Pyparsing in general does implicit whitespace skipping; it is part of the zen of pyparsing, and distinguishes it from conventional regexps (although I think there is a new '?' switch for re's that puts '\s*'s between re terms for you). This is to simplify the grammar definition, so that it doesn't need to be littered with "optional whitespace or comments could go here" expressions; instead, whitespace and comments (or "ignorables" in pyparsing terminology) are parsed over before every grammar expression. I instituted this out of recoil from a previous project, in which a co-developer implemented a boolean parser by first tokenizing by whitespace, then parsing out the tokens. Unfortunately, this meant that "color=='blue' && size=='medium'" would not parse successfully, instead requiring "color == 'blue' && size == 'medium'". It doesn't seem like much, but our support guys got many calls asking why the boolean clauses weren't matching. I decided that when I wrote a parser, "y=m*x+b" would be just as parseable as "y = m * x + b". For that matter, you'd be surprised where whitespace and comments sneak in to people's source code: spaces after left parentheses and comments after semicolons, for example, are easily forgotten when spec'ing out the syntax for a C "for" statement; whitespace inside HTML tags is another unanticipated surprise. So looking at your grammar, you say you don't want to have this be a successful parse: term.parseString("+ a") -> (['+', 'a'], {}) because, "It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal." In fact, pyparsing allows spaces by default, that's why the given parse succeeds. I would turn this question around, and ask you in terms of your grammar - what SHOULD be allowed between include_bool and literal? If spaces are not a problem, then your grammar as-is is sufficient. If spaces are absolutely verboten, then there are 2 or 3 different techniques in pyparsing to disable the whitespace-skipping behavior, depending on whether you want all whitespace skipping disabled, just for literals of a certain type, or just for literals when following a leading include_bool sign. Thanks for giving pyparsing a try; if you want further help, you can post here, or on the pyparsing wiki - the discussion threads on the Home page are a pretty good support and message log. -- Paul -- http://mail.python.org/mailman/
Re: PyParsing and Headaches
"Bytter" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > > I'm trying to construct a parser, but I'm stuck with some basic > stuff... For example, I want to match the following: > > letter = "A"..."Z" | "a"..."z" > literal = letter+ > include_bool := "+" | "-" > term = [include_bool] literal > > So I defined this as: > > literal = Word(alphas) > include_bool = Optional(oneOf("+ -")) > term = include_bool + literal > > The problem is that: > > term.parseString("+a") -> (['+', 'a'], {}) # OK > term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't > recognize any token since I didn't said the SPACE was allowed between > include_bool and literal. > As Chris pointed out in his post, the most direct way to fix this is to use Combine. Note that Combine does two things: it requires the expressions to be adjacent, and it combines the results into a single token. For instance, when defining the expression for a real number, something like: realnum = Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums) Pyparsing would parse "3.14159" into the separate tokens ['', '3', '.', '14159']. For this grammar, pyparsing would also accept "2. 23" as ['', '2', '.', '23'], even though there is a space between the decimal point and "23". But by wrapping it inside Combine, as in: realnum = Combine(Optional(oneOf("+ -")) + Word(nums) + "." + Word(nums)) we accomplish two things: pyparsing only matches if all the elements are adjacent, with no whitespace or comments; and the matched token is returned as ['3.14159']. (Yes, I left off scientific notation, but it is an extension of the same issue.) Pyparsing in general does implicit whitespace skipping; it is part of the zen of pyparsing, and distinguishes it from conventional regexps (although I think there is a new '?' switch for re's that puts '\s*'s between re terms for you). This is to simplify the grammar definition, so that it doesn't need to be littered with "optional whitespace or comments could go here" expressions; instead, whitespace and comments (or "ignorables" in pyparsing terminology) are parsed over before every grammar expression. I instituted this out of recoil from a previous project, in which a co-developer implemented a boolean parser by first tokenizing by whitespace, then parsing out the tokens. Unfortunately, this meant that "color=='blue' && size=='medium'" would not parse successfully, instead requiring "color == 'blue' && size == 'medium'". It doesn't seem like much, but our support guys got many calls asking why the boolean clauses weren't matching. I decided that when I wrote a parser, "y=m*x+b" would be just as parseable as "y = m * x + b". For that matter, you'd be surprised where whitespace and comments sneak in to people's source code: spaces after left parentheses and comments after semicolons, for example, are easily forgotten when spec'ing out the syntax for a C "for" statement; whitespace inside HTML tags is another unanticipated surprise. So looking at your grammar, you say you don't want to have this be a successful parse: term.parseString("+ a") -> (['+', 'a'], {}) because, "It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal." In fact, pyparsing allows spaces by default, that's why the given parse succeeds. I would turn this question around, and ask you in terms of your grammar - what SHOULD be allowed between include_bool and literal? If spaces are not a problem, then your grammar as-is is sufficient. If spaces are absolutely verboten, then there are 2 or 3 different techniques in pyparsing to disable the whitespace-skipping behavior, depending on whether you want all whitespace skipping disabled, just for literals of a certain type, or just for literals when following a leading include_bool sign. Thanks for giving pyparsing a try; if you want further help, you can post here, or on the pyparsing wiki - the discussion threads on the Home page are a pretty good support and message log. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: PyParsing and Headaches
On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote: > Hi, > > I'm trying to construct a parser, but I'm stuck with some basic > stuff... For example, I want to match the following: > > letter = "A"..."Z" | "a"..."z" > literal = letter+ > include_bool := "+" | "-" > term = [include_bool] literal > > So I defined this as: > > literal = Word(alphas) > include_bool = Optional(oneOf("+ -")) > term = include_bool + literal + here means that you allow a space. You need to explicitly override this. Try: term = Combine(include_bool + literal) > > The problem is that: > > term.parseString("+a") -> (['+', 'a'], {}) # OK > term.parseString("+ a") -> (['+', 'a'], {}) # KO. It shouldn't > recognize any token since I didn't said the SPACE was allowed between > include_bool and literal. > > Can anyone give me an hand here? > > Cheers! > > Hugo Ferreira > > BTW, the following is the complete grammar I'm trying to implement with > pyparsing: > > ## L ::= expr | expr L > ## expr ::= term | binary_expr > ## binary_expr ::= term " " binary_op " " term > ## binary_op ::= "*" | "OR" | "AND" > ## include_bool ::= "+" | "-" > ## term ::= ([include_bool] [modifier ":"] (literal | range)) | ("~" > literal) > ## modifier ::= (letter | "_")+ > ## literal ::= word | quoted_words > ## quoted_words ::= '"' word (" " word)* '"' > ## word ::= (letter | digit | "_")+ > ## number ::= digit+ > ## range ::= number (".." | "...") number > ## letter ::= "A"..."Z" | "a"..."z" > ## digit ::= "0"..."9" > > And this is where I got so far: > > word = Word(nums + alphas + "_") > binary_op = oneOf("* and or", caseless=True).setResultsName("operator") > include_bool = oneOf("+ -") > literal = (word | quotedString).setResultsName("literal") > modifier = Word(alphas + "_") > rng = Word(nums) + (Literal("..") | Literal("...")) + Word(nums) > term = ((Optional(include_bool) + Optional(modifier + ":") + (literal | > rng)) | ("~" + literal)).setResultsName("Term") > binary_expr = (term + binary_op + term).setResultsName("binary") > expr = (binary_expr | term).setResultsName("Expr") > L = OneOrMore(expr) > > > -- > GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85 > > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list