I've been working the sister project to JSON Template, "JSON Pattern" (yes I need some non-bland names).
http://code.google.com/p/json-pattern/ JSON Template takes a dictionary -> string, and this takes a string -> dictionary. You can describe it roughly as annotating a big regular expression with a (JSON) tree structure. There are a bunch of things that need to be polished -- the main issue is making the syntax as readable as possible. I've gone through about 3 iterations there, with some more tweaks to make. Suggestions welcome. Simple example of parsing "ls -al" (subpattern definitions omitted here, the next 2 links have them): http://chubot.org/json-pattern/test-cases/testLs_NewSyntax.html Mini tutorial that explains the parts: http://chubot.org/json-pattern/test-cases/testMiniTutorial.html Parsing a big Perforce change description (from Google's open source work; scroll to the end for the big pattern, and a nice hierarchical structure): http://chubot.org/json-pattern/test-cases/testFullChangeDesc_NewSyntax.html Summary * Like JSON Template, it's meant to be a language-independent specification * Can be built on top of any regex engine, particularly JavaScript's relatively weak one * API is data, rather than a procedural API * ~1000 lines of code, so it can be ported easily, but still powerful * A well-defined (and fast) execution model * Readable syntax (still improving here). Regular expressions are very powerful, but hobbled by their obscure and inconsistent syntax. * A small number of orthogonal concepts * Blocks (e.g. for expressing repeated capture) * Filters (extensible through host language) * Subpatterns (a pattern reuse mechanism) * Composes with other components * The interpreter implements a binary operator (I think of it like ~=) * You can easily imagine a pipeline of text -> JSON Pattern -> structured data manipulation -> JSON Template -> text What does this add over regular expressions? * The ability to capture named, hiearchical data structures. Regular expressions can only capture flat data, and in some engines like JavaScript, the data can't be named. * Can capture integers and booleans, not just strings, via filters. * Reuse of regular expressions. This is fairly common in practice, e.g. when writing ad hoc lexers. * More readable syntax, using line prefixes. Applications * Exposing system stats from command line tools over the network, e.g. web services for system administration * Quick and dirty parsing of some network formats, like DNS, HTTP headers, etc. * Parsing little languages like *itself* and JSON Template. This should be possible, since there are no operators with precedence and such. Caveats * In most cases you wouldn't use this for HTML scraping. For HTML scraping, you want something that knows about the tree structure of the document, like jQuery's selector language. TODO * Allow filters to stop the match by returning None * Subpatterns can also be filters, for structure refinement! (both are functions from string -> JSON). Like the "Templates as Formatters" idea, this language turned out to be unexpectedly rich. * Perhaps allow hooks for executing procedural code, not just filters (Perl does this in a messy way). * Embedding a library of patterns in "JSON Config" * Need lots of docs! * Code cleanup, test cleanup --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "JSON Template" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/json-template?hl=en -~----------~----~----~----~------~----~------~--~---
