This looks really useful!

I have a project going right now which uses regular expressions as a 
sub-matcher for PEGs (I modified a GLL parser-generator to have PEG semantics, 
and it uses a regular expression-based lexer), and I've been running into some 
bugs with the interface between the two, so if you do remember any of those 
citations I'd be interested to see them.

Cheers,
Aaron Moss

From: PEG [mailto:peg-boun...@lists.csail.mit.edu] On Behalf Of 
goodman....@gmail.com
Sent: Wednesday, September 29, 2021 23:26
To: peg@lists.csail.mit.edu
Subject: [PEG] pe: a new PEG implementation for Python

EXTERNAL EMAIL: This email originated from outside the University of Portland. 
Do not click links or open attachments unless you recognize the sender and know 
the content is safe.
________________________________
Hello all,

I'd like to announce the release of 'pe', a new PEG parser for Python. While 
Python has no shortage of PEG parsers already, I think pe has enough 
interesting features that it's worth mentioning here.

https://github.com/goodmami/pe<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgoodmami%2Fpe&data=04%7C01%7Cmossa%40up.edu%7C4ffd36014b6b4c1da06208d983db3f44%7Cea8f3949231c40b6a33f56873af96f87%7C0%7C0%7C637685800729543305%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=Oj74pXG%2Fc6yjwibfTvhV9l7w10BQqLX4WFlndGt2JAc%3D&reserved=0>

It's "new" in that I've never properly announced it anywhere, but I've been 
developing and using it for several years. Here's a summary:

* Low learning curve for users: the API is closely modeled after Python's re 
module and the metasyntax is a superset of Ford's original with only a few 
extensions

* Two options for parser engines: (1) a pure-Python memoizing recursive-descent 
parser and (2) a compiled Cython (currently non-memoizing) state machine parser 
inspired by Medeiros and Ierusalimschy, 2008 and LPeg

* It's fast (by Python standards): aggressive grammar optimizations and low 
overhead show it significantly faster than LALR (SLY, Lark) and PEG 
(Parsimonious, PyParsing) parsers in several tests (todo: more tests)

I achieved the speed in the compiled Cython parser rather conventionally: 
static typing, reducing function calls, etc., along with some basic grammar 
optimizations and some state machine optimizations from Medeiros and 
Ierusalimschy.

In the pure-Python parser, I sped things up primarily by replacing large 
portions of the grammar with equivalent regular expressions, for which Python 
has a backend written in C. However, constantly switching between Python and C 
code adds overhead and can actually make things *slower*, so I put as much into 
a single regex as I can by first inlining everything in the grammar that isn't 
recursive and doesn't have a semantic effect on the expression (captures, 
etc.). In addition, to replicate PEG behavior in a regular expression, I 
avoided backtracking with a combination of lookaheads and backreferences (I 
think there's an academic citation for this technique, but I'm sorry I don't 
have it now). The resulting regexes (and optimized grammar) are utterly 
illegible, but they perform well. Finally, I'm not concerned with drastically 
changing the AST because I don't produce ASTs: any grammar transformation that 
accepts the same strings and has the same semantic effect is fair game.

Thanks for reading. I welcome any comments or questions.

--
-Michael Wayne Goodman
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to