Re: Pretty Scheme, ??? Python
On 2007-07-02, Paul McGuire [EMAIL PROTECTED] wrote: On Jul 2, 3:56 pm, Neil Cerutti [EMAIL PROTECTED] wrote: from pyparsing import * It's always good when your messages start like that. ;) Ok, here is the step-by-step, beginning with your posted BNF. (Based on your test cases, I think the '{}'s are really supposed to be '()'s.) Yeah, the Scheme version of WAE uses curlies in examples just so it looks slightly different from Scheme, although since the Scheme version is built on the Scheme read function, it actually accepts several different kinds of delimiters. Python didn't need to do that, I thought. My first impulse when programming this exercise was to ape the Scheme strategy, going with a syntax analogous to Python's, using Python's code or AST modules. But it turns out I'm not a clever enough language designer. Moreover, the fun of the book I mentioned is in designing the semantics of the programs. The book basically punts parsing, leaving it up to the 'read' function. So basically, Python gets up to speed (except for the define-type and type-case macros) simply by implementing a read with enough functionality for each mini-langauge. ; WAE ::= ; num ; | { + WAE WAE } ; | { - WAE WAE } ; | {with {id WAE} WAE} ; | id The most basic building blocks in pyparsing are Literal and Word. With these, you compose compound elements using And and MatchFirst, which are bound to the operators '+' and '|' (on occasion, Or is required, bound to operator '^', but not for this simple parser). Since you have a recursive grammar, you will also need Forward. Whitespace is skipped implicitly. Only slightly more advanced is the Group class, which will impart hierarchy and structure to the results - otherwise, everything just comes out as one flat list of tokens. You may be able to remove these in the final parser, depending on your results after steps 1 and 2 in the left for the student part below, but they are here to help show structure of the parsed tokens. As convenience functions go, I think the most common are oneOf and delimitedList. oneOf might be useful here if you want to express id as a single-char variable; otherwise, just use Word(alphas). At this point you should be able to write a parser for this WAE grammar. Like the following 9-liner: LPAR = Literal(().suppress() RPAR = Literal()).suppress() wae = Forward() num = Word(nums) id = oneOf( list(alphas) ) The above shadows 'id'; I suppose 'ident' would be better. addwae = Group( LPAR + + + wae + wae + RPAR ) subwae = Group( LPAR + - + wae + wae + RPAR ) withwae = Group( LPAR + with + LPAR + id + wae + RPAR + wae + RPAR ) wae (addwae | subwae | withwae | num | id) tests = \ 3 (+ 3 4) (with (x (+ 5 5)) (+ x x)).splitlines() for t in tests: print t waeTree = wae.parseString(t) print waeTree.asList() print If you extract and run this script, here are the results: 3 ['3'] (+ 3 4) [['+', '3', '4']] (with (x (+ 5 5)) (+ x x)) [['with', 'x', ['+', '5', '5'], ['+', 'x', 'x']]] How can I make it barf for testcases like '(+ 2 3))'? It doesn't seem to expect an Eof. Left as an exercise for the student: 1. Define classes NumWAE, IdWAE, AddWAE, SubWAE, and WithWAE whose __init__ methods take a ParseResults object named tokens (which you can treat as a list of tokens), and each with a calc() method to evaluate them accordingly. 2. Hook each class to the appropriate WAE class using setParseAction. Hint: here is one done for you: num.setParseAction(NumWAE) 3. Modify the test loop to insert an evaluation of the parsed tree. I use doctest, so it looks quite different. On the other hand, it actually checks that the results are correct. ;) Extra credit: why is id last in the set of alternatives defined for the wae expression? I'm not sure. When I moved it to first all my valid tests still passed, but one of the deliberately erroneous ones caused a different exception, e.g., '(+ 2)'. Writing my own parser does make error handling more predictable, but probably PyParsing can be configured to do what I want. My AST's from the first version of the program translated easily into your program, with almost no muss or fuss. The muss is that, since all the __init__ functions now expect a token list instead of named arguments, they are now cryptic, and creating AST's manually became annoying. The fuss is that I do have to create one in With's calc function. It should be unnecessary for the AST objects to be so dependent upon the grammar to work correctly. I suppose a solution would be to add another layer of abstraction in between. Read it and weep. The program hasn't changed much. It's still uglier than the Scheme. Attached at the end is my original sexp and WAE parser, for comparison with the PyParsing portion of the program. This time I've included all the tests, but no other useful docstrings. A PyParsing solution for WAE, composed by Paul MaGuire,
Re: Pretty Scheme, ??? Python
Neil - The above shadows 'id'; I suppose 'ident' would be better. Doh! I found the id() shadowing later, changed my var to id_ so as not to stray from your BNF too much. How can I make it barf for testcases like '(+ 2 3))'? It doesn't seem to expect an Eof. To force parsing to the end of string, add a StringEnd instance where you expect there to be the end of the input string. Change: waeTree = wae.parseString(t) to: waeTree = (wae + StringEnd()).parseString(t) The muss is that, since all the __init__ functions now expect a token list instead of named arguments, they are now cryptic, and creating AST's manually became annoying. The fuss is that I do have to create one in With's calc function. It should be unnecessary for the AST objects to be so dependent upon the grammar to work correctly. I agree 1000%. The pyparsing method for this is to use setResultsName. Here is the grammar, with results names defined to match those in your original. And you are absolutely correct, using named fields like this makes your code MUCH more robust, and less dependent on the grammar. num = Combine( Optional(-) + Word(nums) ).setResultsName(n) id_ = oneOf( list(alphas) ).setResultsName(v) addwae = Group( LPAR + + + wae.setResultsName(lhs) + wae.setResultsName(rhs) + RPAR ) subwae = Group( LPAR + - + wae.setResultsName(lhs) + wae.setResultsName(rhs) + RPAR ) withwae = Group( LPAR + with + LPAR + id_.setResultsName(bound_id) + wae.setResultsName(named_expr) + RPAR + wae.setResultsName(bound_body) + RPAR ) Now your calc methods can refer to them using: self.tokens.lhs self.tokens.bound_id etc. Here is my alternative solution (not using results names). I used the base WAE class to accept the tokens as the initialization var, then unpack the list into variables in each respective calc() method. I did not see the need for a subst() method. There is a complete s-exp parser at the pyparsing wiki examples page: http://pyparsing.wikispaces.com/space/showimage/sexpParser.py -- Paul class WAE(object): ids = {} def __init__(self,tokens): # need to deref element 0 because of Groups self.tokens = tokens[0] class NumWAE(WAE): def calc(self): return int(self.tokens) class IdWAE(WAE): def getId(self): return self.tokens def calc(self): return WAE.ids[self.getId()][-1] class BinOpWAE(WAE): def calc(self): op,a,b = self.tokens return self.opfn(a.calc(), b.calc()) class AddWAE(BinOpWAE): opfn = staticmethod(lambda a,b:a+b) class SubWAE(BinOpWAE): opfn = staticmethod(lambda a,b:a-b) class WithWAE(WAE): def calc(self): op,varid,varexpr,withexpr = self.tokens varname = varid.getId() if varname not in WAE.ids: WAE.ids[varname] = [] WAE.ids[varname].append( varexpr.calc() ) ret = withexpr.calc() WAE.ids[varname].pop() return ret for expr,cls in zip((num,id_,addwae,subwae,withwae), (NumWAE,IdWAE,AddWAE,SubWAE,WithWAE)): expr.setParseAction(cls) for t in tests: print t waeTree = wae.parseString(t)[0] print waeTree.calc() print -- http://mail.python.org/mailman/listinfo/python-list
Re: Pretty Scheme, ??? Python
On 2007-07-03, Paul McGuire [EMAIL PROTECTED] wrote: Here is my alternative solution (not using results names). I used the base WAE class to accept the tokens as the initialization var, then unpack the list into variables in each respective calc() method. I did not see the need for a subst() method. I used recursion to effect substitution of id's with their values, whereas you chose to use a dictionary of stacks. I prefer the functional solution for being simpler, and state-less. Thanks for the nice example. I think I've seen enough to create something satisfying. There seems to be a bug in my current grammar. The factory for BinOp is not getting the correct named results. Can you see something I've messed up? LPAR = Literal(().suppress() RPAR = Literal()).suppress() wae = Forward() num = Word(nums).setResultsName('number') id_ = Word(alphas).setResultsName('name') binop = Group( LPAR + oneOf(+ -).setResultsName('op') + wae.setResultsName('lhs') + wae.setResultsName('rhs') + RPAR ) with_ = Group( LPAR + with + LPAR + id_.setResultsName('bound_id') + wae.setResultsName('named_expr') + RPAR + wae.setResultsName('bound_body') + RPAR ) wae (binop | with_ | num | id_) num.setParseAction(lambda t: Num(int(t.number))) id_.setParseAction(lambda t: Id(t.name)) binop.setParseAction(lambda t: BinOp(t.op, t.lhs, t.rhs)) with_.setParseAction(lambda t: With(t.bound_id, t.named_expr, t.bound_body)) def parse(s): return (wae + StringEnd()).parseString(s).asList()[0] For test case: '(+ 3 45)' I get: C:\WINNT\system32\cmd.exe /c python wae.py ** File wae.py, line 6, in __main__ Failed example: parse('(+ 3 45)') Exception raised: Traceback (most recent call last): File c:\edconn32\python25\lib\doctest.py, line 1212, in __run compileflags, 1) in test.globs File doctest __main__[1], line 1, in module parse('(+ 3 45)') File wae.py, line 122, in parse return (wae + StringEnd()).parseString(s).asList()[0] File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 906, in p arseString loc, tokens = self._parse( instring.expandtabs(), 0 ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 784, in _ parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 1961, in parseImpl loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPr eParse=False ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 784, in _ parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 2204, in parseImpl return self.expr._parse( instring, loc, doActions, callPreParse=False ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 784, in _ parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 2070, in parseImpl ret = e._parse( instring, loc, doActions ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 810, in _ parseNoCache tokens = fn( instring, tokensStart, retTokens ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 658, in t mp return f(t) File wae.py, line 118, in lambda binop.setParseAction(lambda t: BinOp(t.op, t.lhs, t.rhs)) File wae.py, line 73, in __init__ '-': (operator.sub, 'Sub')}[op] KeyError: '' ** op ought to be '+' or '-'. In fact, testing showed than none of the result names for binop are being set correctly. -- Neil Cerutti The word genius isn't applicable in football. A genius is a guy like Norman Einstein. --Joe Theisman -- http://mail.python.org/mailman/listinfo/python-list
Re: Pretty Scheme, ??? Python
On Jul 3, 11:08 am, Neil Cerutti [EMAIL PROTECTED] wrote: C:\WINNT\system32\cmd.exe /c python wae.py ** File wae.py, line 6, in __main__ Failed example: parse('(+ 3 45)') Exception raised: Traceback (most recent call last): File c:\edconn32\python25\lib\doctest.py, line 1212, in __run compileflags, 1) in test.globs File doctest __main__[1], line 1, in module parse('(+ 3 45)') File wae.py, line 122, in parse return (wae + StringEnd()).parseString(s).asList()[0] File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 906, in p arseString loc, tokens = self._parse( instring.expandtabs(), 0 ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 784, in _ parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 1961, in parseImpl loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPr eParse=False ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 784, in _ parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 2204, in parseImpl return self.expr._parse( instring, loc, doActions, callPreParse=False ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 784, in _ parseNoCache loc,tokens = self.parseImpl( instring, preloc, doActions ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 2070, in parseImpl ret = e._parse( instring, loc, doActions ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 810, in _ parseNoCache tokens = fn( instring, tokensStart, retTokens ) File C:\edconn32\Python25\Lib\site-packages\pyparsing.py, line 658, in t mp return f(t) File wae.py, line 118, in lambda binop.setParseAction(lambda t: BinOp(t.op, t.lhs, t.rhs)) File wae.py, line 73, in __init__ '-': (operator.sub, 'Sub')}[op] KeyError: '' ** op ought to be '+' or '-'. In fact, testing showed than none of the result names for binop are being set correctly. -- Neil Cerutti The word genius isn't applicable in football. A genius is a guy like Norman Einstein. --Joe Theisman I think the problem is with your Groups, that they are wrapping the ParseResults into a single-element list. Try either of the following (but not both!): 1. Remove Group from the definitions of binop and with_. binop = ( LPAR + oneOf(+ -).setResultsName('op') + wae.setResultsName('lhs') + wae.setResultsName('rhs') + RPAR ) with_ = ( LPAR + with + LPAR + id_.setResultsName('bound_id') + wae.setResultsName('named_expr') + RPAR + wae.setResultsName('bound_body') + RPAR ) 2. Change the parse actions to deref the 0'th element of t for binop and with_. num.setParseAction(lambda t: Num(int(t.number))) id_.setParseAction(lambda t: Id(t.name)) binop.setParseAction(lambda t: BinOp(t[0].op, t[0].lhs, t[0].rhs)) with_.setParseAction(lambda t: With(t[0].bound_id, t[0].named_expr, t[0].bound_body)) As a troubleshooting measure, you can also create an explicit method, and then decorate it with pyparsing's built-in @traceParseAction decorator. @traceParseAction def test(t): return BinOp(t.op,t.lhs,t.rhs) This should print out information on the arguments being passed to test, and the results being returned. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Pretty Scheme, ??? Python
By the way, the next release of pyparsing will allow you to shortcut all of these .setResultsName calls, using this notation instead: binop = ( LPAR + oneOf(+ -)('op') + wae('lhs') + wae('rhs') + RPAR ) with_ = ( LPAR + with + LPAR + id_('bound_id') + wae('named_expr') + RPAR + wae('bound_body') + RPAR ) This is implemented in the latest version in the SF SVN repository. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Pretty Scheme, ??? Python
On 2007-07-03, Paul McGuire [EMAIL PROTECTED] wrote: 2. Change the parse actions to deref the 0'th element of t for binop and with_. num.setParseAction(lambda t: Num(int(t.number))) id_.setParseAction(lambda t: Id(t.name)) binop.setParseAction(lambda t: BinOp(t[0].op, t[0].lhs, t[0].rhs)) with_.setParseAction(lambda t: With(t[0].bound_id, t[0].named_expr, t[0].bound_body)) Thanks, that solves it. Now to get on with this exercise and find out which version turns out to be easier to modify and extend as the book progresses through all the different programming language features it covers. Scheme is bound to have an unfair advantage, though, as the book was composed for Scheme. I plan to see if I can create a class factory that behaves like the define-type macro, though. That should prove rather useful. The most tedious thing about the Python version is the cruft required by the class hierarchy. -- Neil Cerutti I pulled into a lay-by with smoke coming from under the bonnet. I realized the car was on fire so took my dog and smothered it with a blanket. --Insurance Claim Blooper -- http://mail.python.org/mailman/listinfo/python-list
Re: Pretty Scheme, ??? Python
Neil Cerutti wrote: ... How can I make the Python more idiomatic Python? Have you taken a look at pyparsing ? http://pyparsing.wikispaces.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Pretty Scheme, ??? Python
On 2007-07-02, Laurent Pointal [EMAIL PROTECTED] wrote: Neil Cerutti wrote: How can I make the Python more idiomatic Python? Have you taken a look at pyparsing ? Yes, I have it. PyParsing has, well, so many convenience features they seem to shout down whatever the core features are, and I don't know quite how to get started as a result. Hardest of all was modifying a working PyParsing program. As a result, I've found writing my own recursive descent parsers much easier. I'm probably wrong, though. ;) -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Re: Pretty Scheme, ??? Python
On Jul 2, 3:56 pm, Neil Cerutti [EMAIL PROTECTED] wrote: On 2007-07-02, Laurent Pointal [EMAIL PROTECTED] wrote: Neil Cerutti wrote: How can I make the Python more idiomatic Python? Have you taken a look at pyparsing ? Yes, I have it. PyParsing has, well, so many convenience features they seem to shout down whatever the core features are, and I don't know quite how to get started as a result. Hardest of all was modifying a working PyParsing program. As a result, I've found writing my own recursive descent parsers much easier. I'm probably wrong, though. ;) -- Neil Cerutti from pyparsing import * Neil - Ok, here is the step-by-step, beginning with your posted BNF. (Based on your test cases, I think the '{}'s are really supposed to be '()'s.) ; WAE ::= ; num ; | { + WAE WAE } ; | { - WAE WAE } ; | {with {id WAE} WAE} ; | id The most basic building blocks in pyparsing are Literal and Word. With these, you compose compound elements using And and MatchFirst, which are bound to the operators '+' and '|' (on occasion, Or is required, bound to operator '^', but not for this simple parser). Since you have a recursive grammar, you will also need Forward. Whitespace is skipped implicitly. Only slightly more advanced is the Group class, which will impart hierarchy and structure to the results - otherwise, everything just comes out as one flat list of tokens. You may be able to remove these in the final parser, depending on your results after steps 1 and 2 in the left for the student part below, but they are here to help show structure of the parsed tokens. As convenience functions go, I think the most common are oneOf and delimitedList. oneOf might be useful here if you want to express id as a single-char variable; otherwise, just use Word(alphas). At this point you should be able to write a parser for this WAE grammar. Like the following 9-liner: LPAR = Literal(().suppress() RPAR = Literal()).suppress() wae = Forward() num = Word(nums) id = oneOf( list(alphas) ) addwae = Group( LPAR + + + wae + wae + RPAR ) subwae = Group( LPAR + - + wae + wae + RPAR ) withwae = Group( LPAR + with + LPAR + id + wae + RPAR + wae + RPAR ) wae (addwae | subwae | withwae | num | id) tests = \ 3 (+ 3 4) (with (x (+ 5 5)) (+ x x)).splitlines() for t in tests: print t waeTree = wae.parseString(t) print waeTree.asList() print If you extract and run this script, here are the results: 3 ['3'] (+ 3 4) [['+', '3', '4']] (with (x (+ 5 5)) (+ x x)) [['with', 'x', ['+', '5', '5'], ['+', 'x', 'x']]] Left as an exercise for the student: 1. Define classes NumWAE, IdWAE, AddWAE, SubWAE, and WithWAE whose __init__ methods take a ParseResults object named tokens (which you can treat as a list of tokens), and each with a calc() method to evaluate them accordingly. 2. Hook each class to the appropriate WAE class using setParseAction. Hint: here is one done for you: num.setParseAction(NumWAE) 3. Modify the test loop to insert an evaluation of the parsed tree. Extra credit: why is id last in the set of alternatives defined for the wae expression? -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Pretty Scheme, ??? Python
On Jul 2, 6:28 pm, Paul McGuire [EMAIL PROTECTED] wrote: On Jul 2, 3:56 pm, Neil Cerutti [EMAIL PROTECTED] wrote: On 2007-07-02, Laurent Pointal [EMAIL PROTECTED] wrote: Neil Cerutti wrote: How can I make the Python more idiomatic Python? Have you taken a look at pyparsing ? Yes, I have it. PyParsing has, well, so many convenience features they seem to shout down whatever the core features are, and I don't know quite how to get started as a result. Hardest of all was modifying a working PyParsing program. As a result, I've found writing my own recursive descent parsers much easier. I'm probably wrong, though. ;) -- Neil Cerutti from pyparsing import * Neil - Ok, here is the step-by-step, beginning with your posted BNF. (Based on your test cases, I think the '{}'s are really supposed to be '()'s.) ; WAE ::= ; num ; | { + WAE WAE } ; | { - WAE WAE } ; | {with {id WAE} WAE} ; | id The most basic building blocks in pyparsing are Literal and Word. With these, you compose compound elements using And and MatchFirst, which are bound to the operators '+' and '|' (on occasion, Or is required, bound to operator '^', but not for this simple parser). Since you have a recursive grammar, you will also need Forward. Whitespace is skipped implicitly. Only slightly more advanced is the Group class, which will impart hierarchy and structure to the results - otherwise, everything just comes out as one flat list of tokens. You may be able to remove these in the final parser, depending on your results after steps 1 and 2 in the left for the student part below, but they are here to help show structure of the parsed tokens. As convenience functions go, I think the most common are oneOf and delimitedList. oneOf might be useful here if you want to express id as a single-char variable; otherwise, just use Word(alphas). At this point you should be able to write a parser for this WAE grammar. Like the following 9-liner: LPAR = Literal(().suppress() RPAR = Literal()).suppress() wae = Forward() num = Word(nums) id = oneOf( list(alphas) ) addwae = Group( LPAR + + + wae + wae + RPAR ) subwae = Group( LPAR + - + wae + wae + RPAR ) withwae = Group( LPAR + with + LPAR + id + wae + RPAR + wae + RPAR ) wae (addwae | subwae | withwae | num | id) tests = \ 3 (+ 3 4) (with (x (+ 5 5)) (+ x x)).splitlines() for t in tests: print t waeTree = wae.parseString(t) print waeTree.asList() print If you extract and run this script, here are the results: 3 ['3'] (+ 3 4) [['+', '3', '4']] (with (x (+ 5 5)) (+ x x)) [['with', 'x', ['+', '5', '5'], ['+', 'x', 'x']]] Left as an exercise for the student: 1. Define classes NumWAE, IdWAE, AddWAE, SubWAE, and WithWAE whose __init__ methods take a ParseResults object named tokens (which you can treat as a list of tokens), and each with a calc() method to evaluate them accordingly. 2. Hook each class to the appropriate WAE class using setParseAction. Hint: here is one done for you: num.setParseAction(NumWAE) 3. Modify the test loop to insert an evaluation of the parsed tree. Extra credit: why is id last in the set of alternatives defined for the wae expression? -- Paul - Hide quoted text - - Show quoted text - Oops, that should be a 10-liner - I forgot the from pyparsing import * line. -- Paul -- http://mail.python.org/mailman/listinfo/python-list