Re: parser recommendation
On 6 Jun., 01:58, Alan Isaac [EMAIL PROTECTED] wrote: One other possibility: SimpleParse (for speed). URL:http://simpleparse.sourceforge.net/ It is very nice. Alan Isaac How does SimpleParse manage left-factorings, left-recursion and other ambiguities? For example according to [1] there are two non-terminals UNICODEESCAPEDCHAR_16 and UNICODEESCAPEDCHAR_32 with an equal initial section of 4 token. How does SimpleParse detect when it has to use the second production? [1] http://simpleparse.sourceforge.net/simpleparse_grammars.html -- http://mail.python.org/mailman/listinfo/python-list
Re: parser recommendation
One other possibility: SimpleParse (for speed). URL:http://simpleparse.sourceforge.net/ It is very nice. Alan Isaac -- http://mail.python.org/mailman/listinfo/python-list
parser recommendation
I have a project that uses a proprietary format and I've been using regex to extract information from it. I haven't hit any roadblocks yet, but I'd like to use a parsing library rather than maintain my own code base of complicated regex's. I've been intrigued by the parsers available in python, which may add some much needed flexibility. I've briefly looked at PLY and pyparsing. There are several others, but too many to enumerate. My understanding is that PLY (although more difficult to use) has much more flexibility than pyparsing. I'm basically looking to make an informed choice. Not just for this project, but for the long haul. I'm not afraid of using a difficult (to use or learn) parser either if it buys me something like portability (with other languages) or flexibility). I've been to a few websites that enumerate the parsers, but not all that very helpful when it came to comparisons... http://nedbatchelder.com/text/python-parsers.html http://www.python.org/community/sigs/retired/parser-sig/towards-standard/ I'm not looking to start a flame war... I'd just like some honest opinions.. ;) thanks, filipe -- http://mail.python.org/mailman/listinfo/python-list
Re: parser recommendation
On Jun 3, 8:43 am, Filipe Fernandes [EMAIL PROTECTED] wrote: I've briefly looked at PLY and pyparsing. There are several others, but too many to enumerate. My understanding is that PLY (although more difficult to use) has much more flexibility than pyparsing. I'm basically looking to make an informed choice. Not just for this project, but for the long haul. I'm not afraid of using a difficult (to use or learn) parser either if it buys me something like portability (with other languages) or flexibility). Short answer: try them both. Learning curve on pyparsing is about a day, maybe two. And if you are already familiar with regex, PLY should not seem too much of a stretch. PLY parsers will probably be faster running than pyparsing parsers, but I think pyparsing parsers will be quicker to work up and get running. Longer answer: PLY is of the lex/yacc school of parsing libraries (PLY=Python Lex/Yacc). Use regular expressions to define terminal token specifications (a la lex). Then use t_XXX and p_XXX methods to build up the parsing logic - docstrings in these methods capture regex or BNF grammar definitions. In contrast, pyparsing is of the combinator school of parsers. Within your Python code, you compose your parser using '+' and '|' operations, building up the parser using pyparsing classes such as Literal, Word, OneOrMore, Group, etc. Also, pyparsing is 100% Python, so you wont have any portability issues (don't know about PLY). Here is a link to a page with a PLY and pyparsing example (although not strictly a side-by-side comparison): http://www.rexx.com/~dkuhlman/python_201/. For comparison, here is a pyparsing version of the PLY parser on that page (this is a recursive grammar, not necessarily a good beginner's example for pyparsing): === term = Word(alphas,alphanums) func_call = Forward() func_call_list = Forward() comma = Literal(,).suppress() func_call_list Group( func_call + Optional(comma + func_call_list) ) lpar = Literal(().suppress() rpar = Literal()).suppress() func_call Group( term + lpar + Optional(func_call_list,default=[]) + rpar ) command = func_call prog = OneOrMore(command) comment = # + restOfLine prog.ignore( comment ) With the data set given at Dave Kuhlman's web page, here is the output: [['aaa', ['']], ['bbb', [['ccc', ['', ['ddd', [['eee', ['']], [['fff', [['ggg', ['']], [['hhh', ['']], [['iii', ['']] Pyparsing makes some judicious assumptions about how you will want to parse, most significant being that whitespace can be ignored during parsing (this *can* be overridden in the parser definition). Pyparsing also supports token grouping (for building parse trees), parse-time callbacks (called 'parse actions'), and assigning names within subexpressions (called 'results names'), which really helps in working with the tokens returned from the parsing process. If you learn both, you may find that pyparsing is a good way to quickly prototype a particular parsing problem, which you can then convert to PLY for performance if necessary. The pyparsing prototype will be an efficient way to work out what the grammar kinks are, so that when you get around to PLY-ifying it, you already have a clear picture of what the parser needs to do. But, really, more flexible? I wouldn't really say that was the big difference between the two. Cheers, -- Paul (More pyparsing info at http://pyparsing.wikispaces.com.) -- http://mail.python.org/mailman/listinfo/python-list
Re: parser recommendation
On 3 Jun., 15:43, Filipe Fernandes [EMAIL PROTECTED] wrote: I have a project that uses a proprietary format and I've been using regex to extract information from it. I haven't hit any roadblocks yet, but I'd like to use a parsing library rather than maintain my own code base of complicated regex's. I've been intrigued by the parsers available in python, which may add some much needed flexibility. I've briefly looked at PLY and pyparsing. There are several others, but too many to enumerate. My understanding is that PLY (although more difficult to use) has much more flexibility than pyparsing. I'm basically looking to make an informed choice. Not just for this project, but for the long haul. I'm not afraid of using a difficult (to use or learn) parser either if it buys me something like portability (with other languages) or flexibility). I've been to a few websites that enumerate the parsers, but not all that very helpful when it came to comparisons... http://nedbatchelder.com/text/python-parsers.htmlhttp://www.python.org/community/sigs/retired/parser-sig/towards-stand... I'm not looking to start a flame war... I'd just like some honest opinions.. ;) thanks, filipe Trail [1] that comes with EasyExtend 3 is not Batchelders list. Trail is an EBNF based, top-down, 1 token of lookahead, non-backtracking parser which is strictly more powerful than LL(1). For LL(1) languages Trail *is* an LL(1) parser. Trail isn't well reasearched yet so I can say little about performance characteristics for parsing non LL(1) languages. They are somewhat varying with regard to the size of the automata generated by Trail. There are also classes of grammars which are not accepted by Trail. Just few of them are known. I used a Trail based parser to replace the regular expression based tokenizer for Python tokenization in EasyExtend 3. There are definitely performance issues with the pure Python implementation and Trail is an order of magnitude slower than tokenizer.py. On the other hand EBNF grammars are compositional and one can easily add new rules. [1] http://www.fiber-space.de/EasyExtend/doc/EE.html http://www.fiber-space.de/EasyExtend/doc/trail/Trail.html -- http://mail.python.org/mailman/listinfo/python-list
Re: parser recommendation
On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire [EMAIL PROTECTED] wrote: If you learn both, you may find that pyparsing is a good way to quickly prototype a particular parsing problem, which you can then convert to PLY for performance if necessary. The pyparsing prototype will be an efficient way to work out what the grammar kinks are, so that when you get around to PLY-ifying it, you already have a clear picture of what the parser needs to do. Thanks (both Paul and Kay) for responding. I'm still looking at Trail in EasyExtend and pyparsing is very nicely objected oriented but PLY does seems to have the speed advantage, so I'm leaning towards PLY But I do have more questions... when reading the ply.py header (in 2.5) I found the following paragraph... # The current implementation is only somewhat object-oriented. The # LR parser itself is defined in terms of an object (which allows multiple # parsers to co-exist). However, most of the variables used during table # construction are defined in terms of global variables. Users shouldn't # notice unless they are trying to define multiple parsers at the same # time using threads (in which case they should have their head examined). Now, I'm invariably going to have to use threads... I'm not exactly sure what the author is alluding to, but my guess is that to overcome this limitation I need to acquire a thread lock first before defining/creating a parser object before I can use it? Has anyone ran into this issue? This would definitely be a showstopper (for PLY anyway), if I couldn't create multiple parsers because of threads. I'm not saying I need more than one, I'm just not comfortable with that limitation. I have a feeling I'm just misunderstanding since it doesn't seem to hold you back from creating multiple parsers under a single process. filipe -- http://mail.python.org/mailman/listinfo/python-list
Re: parser recommendation
On 3 Jun., 19:34, Filipe Fernandes [EMAIL PROTECTED] wrote: # The current implementation is only somewhat object-oriented. The # LR parser itself is defined in terms of an object (which allows multiple # parsers to co-exist). However, most of the variables used during table # construction are defined in terms of global variables. Users shouldn't # notice unless they are trying to define multiple parsers at the same # time using threads (in which case they should have their head examined). Now, I'm invariably going to have to use threads... I'm not exactly sure what the author is alluding to, but my guess is that to overcome this limitation I need to acquire a thread lock first before defining/creating a parser object before I can use it? Nope. It just says that the parser-table construction itself relies on global state. But you will most likely build your parser offline in a separate run. -- http://mail.python.org/mailman/listinfo/python-list
Re: parser recommendation
On Jun 3, 12:34 pm, Filipe Fernandes [EMAIL PROTECTED] wrote: On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire [EMAIL PROTECTED] wrote: But I do have more questions... when reading the ply.py header (in 2.5) I found the following paragraph... # The current implementation is only somewhat object-oriented. The # LR parser itself is defined in terms of an object (which allows multiple # parsers to co-exist). However, most of the variables used during table # construction are defined in terms of global variables. Users shouldn't # notice unless they are trying to define multiple parsers at the same # time using threads (in which case they should have their head examined). Now, I'm invariably going to have to use threads... I'm not exactly sure what the author is alluding to, but my guess is that to overcome this limitation I need to acquire a thread lock first before defining/creating a parser object before I can use it? Has anyone ran into this issue? This would definitely be a showstopper (for PLY anyway), if I couldn't create multiple parsers because of threads. I'm not saying I need more than one, I'm just not comfortable with that limitation. I have a feeling I'm just misunderstanding since it doesn't seem to hold you back from creating multiple parsers under a single process. filipe You can use pyparsing from any thread, and you can create multiple parsers each running in a separate thread, but you cannot concurrently use one parser from two different threads. Some users work around this by instantiating a separate parser per thread using pickle to quickly construct the parser at thread start time. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: parser recommendation
On Jun 3, 12:34 pm, Filipe Fernandes [EMAIL PROTECTED] wrote: On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire [EMAIL PROTECTED] wrote: But I do have more questions... when reading the ply.py header (in 2.5) I found the following paragraph... # The current implementation is only somewhat object-oriented. The # LR parser itself is defined in terms of an object (which allows multiple # parsers to co-exist). However, most of the variables used during table # construction are defined in terms of global variables. Users shouldn't # notice unless they are trying to define multiple parsers at the same # time using threads (in which case they should have their head examined). Now, I'm invariably going to have to use threads... I'm not exactly sure what the author is alluding to, but my guess is that to overcome this limitation I need to acquire a thread lock first before defining/creating a parser object before I can use it? Has anyone ran into this issue? This would definitely be a showstopper (for PLY anyway), if I couldn't create multiple parsers because of threads. I'm not saying I need more than one, I'm just not comfortable with that limitation. On Tue, Jun 3, 2008 at 1:53 PM, Kay Schluehr [EMAIL PROTECTED] wrote: Nope. It just says that the parser-table construction itself relies on global state. But you will most likely build your parser offline in a separate run. Thanks Kay for the context.., I misunderstood completely, but your last sentence coupled with a few running examples, cleared things right up... On Tue, Jun 3, 2008 at 4:36 PM, Paul McGuire [EMAIL PROTECTED] wrote: You can use pyparsing from any thread, and you can create multiple parsers each running in a separate thread, but you cannot concurrently use one parser from two different threads. Some users work around this by instantiating a separate parser per thread using pickle to quickly construct the parser at thread start time. I didn't know that pyparsing wasn't thread safe. I kind of just assumed because of it's OO approach. Thanks for the work around. I haven't given up on pyparsing, although I'm now heavily leaning towards PLY as an end solution since lex and yacc parsing is available on other platforms as well. Thanks Kay and Paul for the advice... I'm still using the first two I started looking at, but I'm much for confident in the choices made. filipe -- http://mail.python.org/mailman/listinfo/python-list
Re: parser recommendation
On Jun 3, 2:55 pm, Filipe Fernandes [EMAIL PROTECTED] wrote: I haven't given up on pyparsing, although I'm now heavily leaning towards PLY as an end solution since lex and yacc parsing is available on other platforms as well. Keep in mind that PLY's compatibility with YACC is functional, not syntactical. That is, you can not take a YACC file, replace the actions with Python actions and feed it to PLY. It's a shame that the Python world has no truly YACC compatible parser like YAPP in the Perl world. -- http://mail.python.org/mailman/listinfo/python-list