Here is a compliant JSON parser in 99 LOC, implemented with *pex, a new parsing library*. [1]
I'm making public an early version of pex [2], a parsing library in the PEG family. This is alpha software. For those of you familiar with Lua's LPEG library, this is similar, but with a Clojure-y twist. Like LPEG, pex uses a virtual machine to parse rules (as opposed to combinators or packrat.) Pex operates on traditional character types (as opposed to generic sequences/data structures). Here is another tiny example grammar to match floating point numbers: (def Number '{number [digits (? fractional) (? exponent)] fractional ["." digits] exponent ["e" (? (/ "+" "-")) digits] digits [(class num) (* (class num))]}) The only other input this particular grammar needs is to let pex know what a `num` character class is. (There is an interface that you can implement to match things, and several helpers. I'm planning to have several common ones out of the box.) Well, you also need to tell the grammar compiler what rule to start with (number). The grammar format has user defined *macros* which let you hide a lot of boilerplate, or make higher order rules. For example, it's very common to chew whitespace after rules, so hiding that is useful. There are also *captures* and *actions* that operate on a virtual "Value Stack". For example, while parsing a JSON array, you push all the captured values from the array onto the stack, then reduce them into a vector with an action. It's very early, but pex's completely unoptimized engine can parse a 1.5MB file in ~58ms vs ~9ms for Cheshire/Jackson, which is a handwritten highly-tuned JSON parser with many thousands of lines of code behind it. I plan on closing that gap by a) implementing some of LPEG's compiler optimizations and b) improving some of the terribly naive impls in the parser. The win here is *high expressive power per unit of performance*, not raw performance... Internally, the grammar data structure is analyzed, compiled into special parsing bytecode, and then subsequently run inside a virtual machine [3]. Hope you can find this useful in your data munging endeavors. Next up is to make CSV & EDN example parsers, tune the performance, make grammar debugging better, and write more docs & tests. I encourage any feedback. [1] https://github.com/ghadishayban/pex/blob/master/src/com/champbacon/pex/examples/json.clj#L7-L39 [2] https://github.com/ghadishayban/pex [3] https://github.com/ghadishayban/pex/blob/master/src-java/com/champbacon/pex/impl/PEGByteCodeVM.java#L247-L280
_______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg