A few days ago, I posted my unfinished rendition of a PEG parser generator on github, accessible at http://github.com/bmillare/simple-peg/ (Note, if you are looking for a working lib for parsing expression grammars (PEGs), please refer to http://www.lithinos.com/clj-peg/ ) I find writing parsers to be interesting but find writing parser generators to be quite challenging. I hope sharing my experiences will provide some value to the clojure community but ultimately I wish to complete a useful open source project so we can use it in our projects, so I am seeking feedback.
If you are unfamiliar with PEGs, please check the wiki, but in a nutshell, they provide another way to describe parsers without ambiguity because of the "ordered-choice" operator instead of "or". As far as implementation goes, I took the most straightforward approach I could think of. I started by taking a look at the wikipedia definition for a PEG and defined the operators. Then, I used the idea mentioned by Xach about generating a tree of closures instead of using eval or compile. I wrote a make-parser function that takes a sequence that describes the PEG in an s- expression like format, and returns the head function of a closure tree that parses sequences according to the PEG. This has the advantage of not having to recompile unnecessarily and avoid using the slow eval operator. The PEG sequence format is defined as follows: The first item is a keyword with a name that is the same as the operator of the same name. The remaining items in the sequence are either terminal, like a character or an object, or non-terminals, which is just another PEG sequence. To make things more convenient, I defined a set of macros that generate the PEG sequence in a way that is less verbose then typing out the full name of the operator. The advantage of using this method is the sequence can be generated by other clojure code which is more flexible then describing the PEG as a string, which then has to be parsed. Although, it should be quite easy to add that functionality since we would only need a function that generates the sequences given a string. Lastly, I implemented a macro that to the user, acts like a let form but for PEGs. PEGs usually are described using a set of named non- terminals and the non-terminals can refer to other non-terminals. This is slightly more complicated than a simple let form because PEGs can have cyclic dependencies where as let forms cannot. letfn's can refer to each other freely so I wrapped the PEG sequences in functions. The user sees PEGs being binded to names but in actuality, functions are bound to names, and those names used in the PEGs are transformed into their function calls using symbol macros. To prevent stack overflows with cycles, all the functions return lazy sequences. The macro then returns the parser function that takes an input sequence and outputs either the rest of the unparsed input, or nil if it fails. Note that this means if it outputs the empty sequence, it succeeded because the empty sequence is distinct from nil. I would like to say everything works correctly but currently PEGs with cycles create slightly incorrect parsers. I can't seem to find the problem, the binding macro seems to generate the proper code. Running the make-parser on simple PEGs seems to work fine. I've decorated many of the functions with println's to help provide some debugging info. I suppose I could use some of the trace libs to make this easier. After this problem is solved, I plan on implementing a way to interleave user provided clojure code to be executed as the parser parses the input or output the AST (which I believe can be implemented as the default code to be run as the parser parses) If it sounds like I'm asking others to help me debug my code, that's because that's exactly what I'm doing. Given that this is an open source project, I don't see that being a problem. I'm trying out the "if you feel you're ready to release, you've released to late" philosophy. Best, Brent Millare -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en