Re: [ANN] Instaparse 1.0.0

Mark Engelberg Fri, 12 Apr 2013 22:48:08 -0700

On Fri, Apr 12, 2013 at 5:55 PM, Brandon Bloom <brandon.d.bl...@gmail.com>wrote:


> Your readme says "I had difficulty getting his Parsing with Derivatives
> technique to work in a performant way".  I was wondering if you could
> please elaborate.
>
> What kind of performance did you achieve?
> How does that compare to the GLL parser you implemented?
> Did you implement memoization/compaction/fixed-point/etc from the latest
> research?
> How do the implementations compare in terms of code size and readability?
>
>
>
"Parsing with Derivatives" is a very interesting algorithm: each time it
processes a character, it creates a new grammar for the remainder of the
text to be parsed.  The grammar is then compacted.  That process of
creating a new grammar and then compacting it each time you process a
character proved to be rather memory intensive, and I felt the time lag was
pretty noticeable, even with only a few thousand characters.

I did try to follow the latest research.  I had quite a bit of difficulty
getting the fixed point process to work right.  The Racket sample code used
a form of dynamic variable that they call "parameter" that doesn't map
exactly to what Clojure allows you to do with vars.  I tried a few
approaches, but none quite worked the way I expected.  I never got this
implemented satisfactorily.

Another complication is that in Racket, it's easier to create
self-referential and mutually recursive data structures than in Clojure,
and the reference implementation relied pretty heavily on this aspect of
Racket.  I was playing around with kotarak/lazymap as an attempt to solve
the problem, and I did have some success with that.

But perhaps the biggest problem occurred when I tried to work on the
version of the parser that returns a parse tree (as opposed to just
"recognizing" the input as satisfying the grammar).  The way that process
works is that with each transformation of the parser, you build up a huge
continuation of what you plan to create when the end is reached.  In
Racket, the stack is effectively unlimited so this is no problem.  But in
Clojure, when you reach the end and try to execute that continuation --
boom, stack overflow.  The length of the input was limited by the stack;
that's no good.

So I tried to build up the continuation in a way that could be executed in
a trampoline, but with the laziness and multiple prongs of execution, it
got ugly real fast.  Never really got the stack overflow resolved
completely, which made it difficult to test the Clojure version on
realistic-sized inputs.

I'm not saying these problems are insurmountable, I'm just saying that I
really struggled with them.  I was getting extremely frustrated and close
to giving up when I found out about the GLL algorithm.

The Derivatives code was definitely shorter and more readable, but I felt
it was a lot easier to reason about the GLL algorithm and figure out how to
tune it for Clojure and add features to it.  I was able to quickly get it
to the point where I could handle large inputs without stack overflow and,
on extremely simple grammars, could parse 200,000 characters in a couple
seconds on my crappy laptop.  I knew that was fast enough to be useful for
a lot of tasks and a night and day difference from what I had been
achieving after investing far more effort on the Derivatives code.  So I
dropped the Derivatives implementation and didn't look back.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [ANN] Instaparse 1.0.0

Reply via email to