On Fri, Apr 12, 2013 at 5:55 PM, Brandon Bloom <brandon.d.bl...@gmail.com>wrote:
> Your readme says "I had difficulty getting his Parsing with Derivatives > technique to work in a performant way". I was wondering if you could > please elaborate. > > What kind of performance did you achieve? > How does that compare to the GLL parser you implemented? > Did you implement memoization/compaction/fixed-point/etc from the latest > research? > How do the implementations compare in terms of code size and readability? > > > "Parsing with Derivatives" is a very interesting algorithm: each time it processes a character, it creates a new grammar for the remainder of the text to be parsed. The grammar is then compacted. That process of creating a new grammar and then compacting it each time you process a character proved to be rather memory intensive, and I felt the time lag was pretty noticeable, even with only a few thousand characters. I did try to follow the latest research. I had quite a bit of difficulty getting the fixed point process to work right. The Racket sample code used a form of dynamic variable that they call "parameter" that doesn't map exactly to what Clojure allows you to do with vars. I tried a few approaches, but none quite worked the way I expected. I never got this implemented satisfactorily. Another complication is that in Racket, it's easier to create self-referential and mutually recursive data structures than in Clojure, and the reference implementation relied pretty heavily on this aspect of Racket. I was playing around with kotarak/lazymap as an attempt to solve the problem, and I did have some success with that. But perhaps the biggest problem occurred when I tried to work on the version of the parser that returns a parse tree (as opposed to just "recognizing" the input as satisfying the grammar). The way that process works is that with each transformation of the parser, you build up a huge continuation of what you plan to create when the end is reached. In Racket, the stack is effectively unlimited so this is no problem. But in Clojure, when you reach the end and try to execute that continuation -- boom, stack overflow. The length of the input was limited by the stack; that's no good. So I tried to build up the continuation in a way that could be executed in a trampoline, but with the laziness and multiple prongs of execution, it got ugly real fast. Never really got the stack overflow resolved completely, which made it difficult to test the Clojure version on realistic-sized inputs. I'm not saying these problems are insurmountable, I'm just saying that I really struggled with them. I was getting extremely frustrated and close to giving up when I found out about the GLL algorithm. The Derivatives code was definitely shorter and more readable, but I felt it was a lot easier to reason about the GLL algorithm and figure out how to tune it for Clojure and add features to it. I was able to quickly get it to the point where I could handle large inputs without stack overflow and, on extremely simple grammars, could parse 200,000 characters in a couple seconds on my crappy laptop. I knew that was fast enough to be useful for a lot of tasks and a night and day difference from what I had been achieving after investing far more effort on the Derivatives code. So I dropped the Derivatives implementation and didn't look back. -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.