This looks really useful! I have a project going right now which uses regular expressions as a sub-matcher for PEGs (I modified a GLL parser-generator to have PEG semantics, and it uses a regular expression-based lexer), and I've been running into some bugs with the interface between the two, so if you do remember any of those citations I'd be interested to see them.
Cheers, Aaron Moss From: PEG [mailto:peg-boun...@lists.csail.mit.edu] On Behalf Of goodman....@gmail.com Sent: Wednesday, September 29, 2021 23:26 To: peg@lists.csail.mit.edu Subject: [PEG] pe: a new PEG implementation for Python EXTERNAL EMAIL: This email originated from outside the University of Portland. Do not click links or open attachments unless you recognize the sender and know the content is safe. ________________________________ Hello all, I'd like to announce the release of 'pe', a new PEG parser for Python. While Python has no shortage of PEG parsers already, I think pe has enough interesting features that it's worth mentioning here. https://github.com/goodmami/pe<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgoodmami%2Fpe&data=04%7C01%7Cmossa%40up.edu%7C4ffd36014b6b4c1da06208d983db3f44%7Cea8f3949231c40b6a33f56873af96f87%7C0%7C0%7C637685800729543305%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=Oj74pXG%2Fc6yjwibfTvhV9l7w10BQqLX4WFlndGt2JAc%3D&reserved=0> It's "new" in that I've never properly announced it anywhere, but I've been developing and using it for several years. Here's a summary: * Low learning curve for users: the API is closely modeled after Python's re module and the metasyntax is a superset of Ford's original with only a few extensions * Two options for parser engines: (1) a pure-Python memoizing recursive-descent parser and (2) a compiled Cython (currently non-memoizing) state machine parser inspired by Medeiros and Ierusalimschy, 2008 and LPeg * It's fast (by Python standards): aggressive grammar optimizations and low overhead show it significantly faster than LALR (SLY, Lark) and PEG (Parsimonious, PyParsing) parsers in several tests (todo: more tests) I achieved the speed in the compiled Cython parser rather conventionally: static typing, reducing function calls, etc., along with some basic grammar optimizations and some state machine optimizations from Medeiros and Ierusalimschy. In the pure-Python parser, I sped things up primarily by replacing large portions of the grammar with equivalent regular expressions, for which Python has a backend written in C. However, constantly switching between Python and C code adds overhead and can actually make things *slower*, so I put as much into a single regex as I can by first inlining everything in the grammar that isn't recursive and doesn't have a semantic effect on the expression (captures, etc.). In addition, to replicate PEG behavior in a regular expression, I avoided backtracking with a combination of lookaheads and backreferences (I think there's an academic citation for this technique, but I'm sorry I don't have it now). The resulting regexes (and optimized grammar) are utterly illegible, but they perform well. Finally, I'm not concerned with drastically changing the AST because I don't produce ASTs: any grammar transformation that accepts the same strings and has the same semantic effect is fair game. Thanks for reading. I welcome any comments or questions. -- -Michael Wayne Goodman
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg