Re: PyParsing and Headaches
Heya there, Ok, found the solution. I just needed to use leaveWhiteSpace() in the places I want pyparsing to take into consideration the spaces. Thx for the help. Cheers! Hugo Ferreira On Nov 23, 11:57 am, Bytter [EMAIL PROTECTED] wrote: (This message has already been sent to the mailing-list, but I don't have sure this is arriving well since it doesn't come up in the usenet, so I'm posting it through here now.) Chris, Thanks for your quick answer. That changes a lot of stuff, and now I'm able to do my parsing as I intended to. Still, there's a remaining problem. By using Combine(), everything is interpreted as a single token. Though what I need is that 'include_bool' and 'literal' be parsed as separated tokens, though without a space in the middle... Paul, Thanks for your detailed explanation. One of the things I think is missing from the documentation (or that I couldn't find easy) is the kind of explanation you give about 'The Way of PyParsing'. For example, It took me a while to understand that I could easily implement simple recursions using OneOrMany(Group()). Or maybe things were out there and I didn't searched enough... Still, fwiw, congratulations for the library. PyParsing allowed me to do in just a couple of hours, including learning about it's API (minus this little inconvenient) what would have taken me a couple of days with, for example, ANTLR (in fact, I've already put aside ANTLR more than once in the past for a built-from-scratch parser). Cheers, Hugo Ferreira On Nov 22, 7:50 pm, Chris Lambacher [EMAIL PROTECTED] wrote: On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote: Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = A...Z | a...z literal = letter+ include_bool := + | - term = [include_bool] literal So I defined this as: literal = Word(alphas) include_bool = Optional(oneOf(+ -)) term = include_bool + literal+ here means that you allow a space. You need to explicitly override this. Try: term = Combine(include_bool + literal) The problem is that: term.parseString(+a) - (['+', 'a'], {}) # OK term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal. Can anyone give me an hand here? Cheers! Hugo Ferreira BTW, the following is the complete grammar I'm trying to implement with pyparsing: ## L ::= expr | expr L ## expr ::= term | binary_expr ## binary_expr ::= term binary_op term ## binary_op ::= * | OR | AND ## include_bool ::= + | - ## term ::= ([include_bool] [modifier :] (literal | range)) | (~ literal) ## modifier ::= (letter | _)+ ## literal ::= word | quoted_words ## quoted_words ::= '' word ( word)* '' ## word ::= (letter | digit | _)+ ## number ::= digit+ ## range ::= number (.. | ...) number ## letter ::= A...Z | a...z ## digit ::= 0...9 And this is where I got so far: word = Word(nums + alphas + _) binary_op = oneOf(* and or, caseless=True).setResultsName(operator) include_bool = oneOf(+ -) literal = (word | quotedString).setResultsName(literal) modifier = Word(alphas + _) rng = Word(nums) + (Literal(..) | Literal(...)) + Word(nums) term = ((Optional(include_bool) + Optional(modifier + :) + (literal | rng)) | (~ + literal)).setResultsName(Term) binary_expr = (term + binary_op + term).setResultsName(binary) expr = (binary_expr | term).setResultsName(Expr) L = OneOrMore(expr) -- GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85 -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
PyParsing and Headaches
Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = A...Z | a...z literal = letter+ include_bool := + | - term = [include_bool] literal So I defined this as: literal = Word(alphas) include_bool = Optional(oneOf(+ -)) term = include_bool + literal The problem is that: term.parseString(+a) - (['+', 'a'], {}) # OK term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal. Can anyone give me an hand here? Cheers! Hugo Ferreira BTW, the following is the complete grammar I'm trying to implement with pyparsing: ## L ::= expr | expr L ## expr ::= term | binary_expr ## binary_expr ::= term binary_op term ## binary_op ::= * | OR | AND ## include_bool ::= + | - ## term ::= ([include_bool] [modifier :] (literal | range)) | (~ literal) ## modifier ::= (letter | _)+ ## literal ::= word | quoted_words ## quoted_words ::= '' word ( word)* '' ## word ::= (letter | digit | _)+ ## number ::= digit+ ## range ::= number (.. | ...) number ## letter ::= A...Z | a...z ## digit ::= 0...9 And this is where I got so far: word = Word(nums + alphas + _) binary_op = oneOf(* and or, caseless=True).setResultsName(operator) include_bool = oneOf(+ -) literal = (word | quotedString).setResultsName(literal) modifier = Word(alphas + _) rng = Word(nums) + (Literal(..) | Literal(...)) + Word(nums) term = ((Optional(include_bool) + Optional(modifier + :) + (literal | rng)) | (~ + literal)).setResultsName(Term) binary_expr = (term + binary_op + term).setResultsName(binary) expr = (binary_expr | term).setResultsName(Expr) L = OneOrMore(expr) -- GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85 -- http://mail.python.org/mailman/listinfo/python-list
PyParsing and Headaches
Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = A...Z | a...z literal = letter+ include_bool := + | - term = [include_bool] literal So I defined this as: literal = Word(alphas) include_bool = Optional(oneOf(+ -)) term = include_bool + literal The problem is that: term.parseString(+a) - (['+', 'a'], {}) # OK term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal. Can anyone give me an hand here? Cheers! Hugo Ferreira BTW, the following is the complete grammar I'm trying to implement with pyparsing: ## L ::= expr | expr L ## expr ::= term | binary_expr ## binary_expr ::= term binary_op term ## binary_op ::= * | OR | AND ## include_bool ::= + | - ## term ::= ([include_bool] [modifier :] (literal | range)) | (~ literal) ## modifier ::= (letter | _)+ ## literal ::= word | quoted_words ## quoted_words ::= '' word ( word)* '' ## word ::= (letter | digit | _)+ ## number ::= digit+ ## range ::= number (.. | ...) number ## letter ::= A...Z | a...z ## digit ::= 0...9 And this is where I got so far: word = Word(nums + alphas + _) binary_op = oneOf(* and or, caseless=True).setResultsName(operator) include_bool = oneOf(+ -) literal = (word | quotedString).setResultsName(literal) modifier = Word(alphas + _) rng = Word(nums) + (Literal(..) | Literal(...)) + Word(nums) term = ((Optional(include_bool) + Optional(modifier + :) + (literal | rng)) | (~ + literal)).setResultsName(Term) binary_expr = (term + binary_op + term).setResultsName(binary) expr = (binary_expr | term).setResultsName(Expr) L = OneOrMore(expr) -- GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85 -- http://mail.python.org/mailman/listinfo/python-list
Re: PyParsing and Headaches
On Wed, Nov 22, 2006 at 11:17:52AM -0800, Bytter wrote: Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = A...Z | a...z literal = letter+ include_bool := + | - term = [include_bool] literal So I defined this as: literal = Word(alphas) include_bool = Optional(oneOf(+ -)) term = include_bool + literal + here means that you allow a space. You need to explicitly override this. Try: term = Combine(include_bool + literal) The problem is that: term.parseString(+a) - (['+', 'a'], {}) # OK term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal. Can anyone give me an hand here? Cheers! Hugo Ferreira BTW, the following is the complete grammar I'm trying to implement with pyparsing: ## L ::= expr | expr L ## expr ::= term | binary_expr ## binary_expr ::= term binary_op term ## binary_op ::= * | OR | AND ## include_bool ::= + | - ## term ::= ([include_bool] [modifier :] (literal | range)) | (~ literal) ## modifier ::= (letter | _)+ ## literal ::= word | quoted_words ## quoted_words ::= '' word ( word)* '' ## word ::= (letter | digit | _)+ ## number ::= digit+ ## range ::= number (.. | ...) number ## letter ::= A...Z | a...z ## digit ::= 0...9 And this is where I got so far: word = Word(nums + alphas + _) binary_op = oneOf(* and or, caseless=True).setResultsName(operator) include_bool = oneOf(+ -) literal = (word | quotedString).setResultsName(literal) modifier = Word(alphas + _) rng = Word(nums) + (Literal(..) | Literal(...)) + Word(nums) term = ((Optional(include_bool) + Optional(modifier + :) + (literal | rng)) | (~ + literal)).setResultsName(Term) binary_expr = (term + binary_op + term).setResultsName(binary) expr = (binary_expr | term).setResultsName(Expr) L = OneOrMore(expr) -- GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85 -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: PyParsing and Headaches
Bytter [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = A...Z | a...z literal = letter+ include_bool := + | - term = [include_bool] literal So I defined this as: literal = Word(alphas) include_bool = Optional(oneOf(+ -)) term = include_bool + literal The problem is that: term.parseString(+a) - (['+', 'a'], {}) # OK term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal. As Chris pointed out in his post, the most direct way to fix this is to use Combine. Note that Combine does two things: it requires the expressions to be adjacent, and it combines the results into a single token. For instance, when defining the expression for a real number, something like: realnum = Optional(oneOf(+ -)) + Word(nums) + . + Word(nums) Pyparsing would parse 3.14159 into the separate tokens ['', '3', '.', '14159']. For this grammar, pyparsing would also accept 2. 23 as ['', '2', '.', '23'], even though there is a space between the decimal point and 23. But by wrapping it inside Combine, as in: realnum = Combine(Optional(oneOf(+ -)) + Word(nums) + . + Word(nums)) we accomplish two things: pyparsing only matches if all the elements are adjacent, with no whitespace or comments; and the matched token is returned as ['3.14159']. (Yes, I left off scientific notation, but it is an extension of the same issue.) Pyparsing in general does implicit whitespace skipping; it is part of the zen of pyparsing, and distinguishes it from conventional regexps (although I think there is a new '?' switch for re's that puts '\s*'s between re terms for you). This is to simplify the grammar definition, so that it doesn't need to be littered with optional whitespace or comments could go here expressions; instead, whitespace and comments (or ignorables in pyparsing terminology) are parsed over before every grammar expression. I instituted this out of recoil from a previous project, in which a co-developer implemented a boolean parser by first tokenizing by whitespace, then parsing out the tokens. Unfortunately, this meant that color=='blue' size=='medium' would not parse successfully, instead requiring color == 'blue' size == 'medium'. It doesn't seem like much, but our support guys got many calls asking why the boolean clauses weren't matching. I decided that when I wrote a parser, y=m*x+b would be just as parseable as y = m * x + b. For that matter, you'd be surprised where whitespace and comments sneak in to people's source code: spaces after left parentheses and comments after semicolons, for example, are easily forgotten when spec'ing out the syntax for a C for statement; whitespace inside HTML tags is another unanticipated surprise. So looking at your grammar, you say you don't want to have this be a successful parse: term.parseString(+ a) - (['+', 'a'], {}) because, It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal. In fact, pyparsing allows spaces by default, that's why the given parse succeeds. I would turn this question around, and ask you in terms of your grammar - what SHOULD be allowed between include_bool and literal? If spaces are not a problem, then your grammar as-is is sufficient. If spaces are absolutely verboten, then there are 2 or 3 different techniques in pyparsing to disable the whitespace-skipping behavior, depending on whether you want all whitespace skipping disabled, just for literals of a certain type, or just for literals when following a leading include_bool sign. Thanks for giving pyparsing a try; if you want further help, you can post here, or on the pyparsing wiki - the discussion threads on the Home page are a pretty good support and message log. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: PyParsing and Headaches
Chris, Thanks for your quick answer. That changes a lot of stuff, and now I'm able to do my parsing as I intended to. Paul, Thanks for your detailed explanation. One of the things I think is missing from the documentation (or that I couldn't find easy) is the kind of explanation you give about 'The Way of PyParsing'. For example, It took me a while to understand that I could easily implement simple recursions using OneOrMany(Group()). Or maybe things were out there and I didn't searched enough... Still, fwiw, congratulations for the library. PyParsing allowed me to do in just a couple of hours, including learning about it's API (minus this little inconvenient) what would have taken me a couple of days with, for example, ANTLR (in fact, I've already put aside ANTLR more than once in the past for a built-from-scratch parser). Cheers, Hugo Ferreira On 11/22/06, Paul McGuire [EMAIL PROTECTED] wrote: Bytter [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = A...Z | a...z literal = letter+ include_bool := + | - term = [include_bool] literal So I defined this as: literal = Word(alphas) include_bool = Optional(oneOf(+ -)) term = include_bool + literal The problem is that: term.parseString(+a) - (['+', 'a'], {}) # OK term.parseString(+ a) - (['+', 'a'], {}) # KO. It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal. As Chris pointed out in his post, the most direct way to fix this is to use Combine. Note that Combine does two things: it requires the expressions to be adjacent, and it combines the results into a single token. For instance, when defining the expression for a real number, something like: realnum = Optional(oneOf(+ -)) + Word(nums) + . + Word(nums) Pyparsing would parse 3.14159 into the separate tokens ['', '3', '.', '14159']. For this grammar, pyparsing would also accept 2. 23 as ['', '2', '.', '23'], even though there is a space between the decimal point and 23. But by wrapping it inside Combine, as in: realnum = Combine(Optional(oneOf(+ -)) + Word(nums) + . + Word(nums)) we accomplish two things: pyparsing only matches if all the elements are adjacent, with no whitespace or comments; and the matched token is returned as ['3.14159']. (Yes, I left off scientific notation, but it is an extension of the same issue.) Pyparsing in general does implicit whitespace skipping; it is part of the zen of pyparsing, and distinguishes it from conventional regexps (although I think there is a new '?' switch for re's that puts '\s*'s between re terms for you). This is to simplify the grammar definition, so that it doesn't need to be littered with optional whitespace or comments could go here expressions; instead, whitespace and comments (or ignorables in pyparsing terminology) are parsed over before every grammar expression. I instituted this out of recoil from a previous project, in which a co-developer implemented a boolean parser by first tokenizing by whitespace, then parsing out the tokens. Unfortunately, this meant that color=='blue' size=='medium' would not parse successfully, instead requiring color == 'blue' size == 'medium'. It doesn't seem like much, but our support guys got many calls asking why the boolean clauses weren't matching. I decided that when I wrote a parser, y=m*x+b would be just as parseable as y = m * x + b. For that matter, you'd be surprised where whitespace and comments sneak in to people's source code: spaces after left parentheses and comments after semicolons, for example, are easily forgotten when spec'ing out the syntax for a C for statement; whitespace inside HTML tags is another unanticipated surprise. So looking at your grammar, you say you don't want to have this be a successful parse: term.parseString(+ a) - (['+', 'a'], {}) because, It shouldn't recognize any token since I didn't said the SPACE was allowed between include_bool and literal. In fact, pyparsing allows spaces by default, that's why the given parse succeeds. I would turn this question around, and ask you in terms of your grammar - what SHOULD be allowed between include_bool and literal? If spaces are not a problem, then your grammar as-is is sufficient. If spaces are absolutely verboten, then there are 2 or 3 different techniques in pyparsing to disable the whitespace-skipping behavior, depending on whether you want all whitespace skipping disabled, just for literals of a certain type, or just for literals when following a leading include_bool sign. Thanks for giving pyparsing a try; if you want further help, you can post here, or on the pyparsing wiki - the discussion threads on the Home page are a pretty good support and message log. -- Paul -- http://mail.python.org/mailman/listinfo/python-list -- GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C