A few days ago, I posted my unfinished rendition of a PEG parser
generator on github, accessible at http://github.com/bmillare/simple-peg/
(Note, if you are looking for a working lib for parsing expression
grammars (PEGs), please refer to http://www.lithinos.com/clj-peg/ ) I
find writing parsers to be interesting but find writing parser
generators to be quite challenging. I hope sharing my experiences will
provide some value to the clojure community but ultimately I wish to
complete a useful open source project so we can use it in our
projects, so I am seeking feedback.

If you are unfamiliar with PEGs, please check the wiki, but in a
nutshell, they provide another way to describe parsers without
ambiguity because of the "ordered-choice" operator instead of "or".

As far as implementation goes, I took the most straightforward
approach I could think of. I started by taking a look at the wikipedia
definition for a PEG and defined the operators.

Then, I used the idea mentioned by Xach about generating a tree of
closures instead of using eval or compile. I wrote a make-parser
function that takes a sequence that describes the PEG in an s-
expression like format, and returns the head function of a closure
tree that parses sequences according to the PEG. This has the
advantage of not having to recompile unnecessarily and avoid using the
slow eval operator.

The PEG sequence format is defined as follows: The first item is a
keyword with a name that is the same as the operator of the same name.
The remaining items in the sequence are either terminal, like a
character or an object, or non-terminals, which is just another PEG
sequence. To make things more convenient, I defined a set of macros
that generate the PEG sequence in a way that is less verbose then
typing out the full name of the operator. The advantage of using this
method is the sequence can be generated by other clojure code which is
more flexible then describing the PEG as a string, which then has to
be parsed. Although, it should be quite easy to add that functionality
since we would only need a function that generates the sequences given
a string.

Lastly, I implemented a macro that to the user, acts like a let form
but for PEGs. PEGs usually are described using a set of named non-
terminals and the non-terminals can refer to other non-terminals. This
is slightly more complicated than a simple let form because PEGs can
have cyclic dependencies where as let forms cannot. letfn's can refer
to each other freely so I wrapped the PEG sequences in functions. The
user sees PEGs being binded to names but in actuality, functions are
bound to names, and those names used in the PEGs are transformed into
their function calls using symbol macros. To prevent stack overflows
with cycles, all the functions return lazy sequences. The macro then
returns the parser function that takes an input sequence and outputs
either the rest of the unparsed input, or nil if it fails. Note that
this means if it outputs the empty sequence, it succeeded because the
empty sequence is distinct from nil.

I would like to say everything works correctly but currently PEGs with
cycles create slightly incorrect parsers. I can't seem to find the
problem, the binding macro seems to generate the proper code. Running
the make-parser on simple PEGs seems to work fine. I've decorated many
of the functions with println's to help provide some debugging info. I
suppose I could use some of the trace libs to make this easier.

After this problem is solved, I plan on implementing a way to
interleave user provided clojure code to be executed as the parser
parses the input or output the AST (which I believe can be implemented
as the default code to be run as the parser parses)

If it sounds like I'm asking others to help me debug my code, that's
because that's exactly what I'm doing. Given that this is an open
source project, I don't see that being a problem. I'm trying out the
"if you feel you're ready to release, you've released to late"
philosophy.

Best,
Brent Millare

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to