Eh, I'm still intuitively opposed to pull parsing. Okay, so there are some useful libraries these days.... if you are using the right language. If you're using ruby and don't want to use native C code? Just as an example. Seems like we want to arrive at something easy enough to interpret _anywhere_, since you never know where you'll need to process marc.

Simply reading a list of json records one at a time -- seems like it's not too much to ask for a solution that does not require complicated code that only has been written for some platforms. This does not seem like a complicated enough problem that you have to resort to complicated solutions like json pull parsers.

newline-delimited is certainly one simple solution, even though the aggregate file is not valid JSON. Does it matter? Not sure if there are any simple solutions that still give you valid JSON, but if there aren't, I'd rather sacrifice valid JSON (that it's unclear if there's any important use case for anyway), than sacrifice simplicity.

On 12/1/2011 2:47 PM, Bill Dueber wrote:
I was a strong proponent of NDJ at one point, but I've grown less strident
and more weary since then.

Brad Baxter has a good overview of some options[1]. I'm assuming it's a
given we'd all prefer to work with valid JSON files if the pain-point can
be brought down far enough.

A couple years have passed since we first talked about this stuff, and the
state of JSON pull-parsers is better than it once was:

   * yajl[2] is a super-fast C library for parsing json and support stream
parsing. Bindings for ruby, node, python, and perl are linked to off the
home page. I found one PHP binding[3] on github which is broken/abandoned,
and no other pull-parser for PHP that I can find. Sadly, the ruby wrapper
doesn't actually expose the callbacks necessary for pull-parsing, although
there is a pull request[4] and at least one other option[5].
   * Perl's JSON::XS supports incremental parsing
   * the Jackson java library[6] is excellent and has an easy-to-use
pull-parser. There are a few simplistic efforts to wrap it for jruby/jython
use as well.

Pull-parsing is ugly, but no longer astoundingly difficult or slow, with
the possible exception of PHP. And output is simple enough.

As much as it makes me shudder, I think we're probably better off trying to
do pull parsers and have a marc-in-json document be a valid JSON array.

We could easily adopt a *convention* of, essentially, one-record-per-line,
but wrap it in '[]' to make it valid json. That would allow folks with a
pull-parser to write a real streaming reader, and folks without to "cheat"
(ditch the leading and trailing [], and read the rest as
one-record-per-line) until such a time as they can start using a more
full-featured json parser.

1.
http://en.wikipedia.org/wiki/User:Baxter.brad/Drafts/JSON_Document_Streaming_Proposal
2. http://lloyd.github.com/yajl/
3. https://github.com/sfalvo/php-yajl
4. https://github.com/brianmario/yajl-ruby/pull/50
5. http://dgraham.github.com/json-stream/
6. http://wiki.fasterxml.com/JacksonHome



On Thu, Dec 1, 2011 at 12:56 PM, Michael B. Klein<mbkl...@gmail.com>  wrote:

+1 to marc-in-json
+1 to newline-delimited records
+1 to read support
+1 to edsu, rsinger, BillDueber, gmcharlt, and the other module maintainers

On Thu, Dec 1, 2011 at 9:31 AM, Keith Jenkins<k...@cornell.edu>  wrote:

On Thu, Dec 1, 2011 at 11:56 AM, Gabriel Farrell<gsf...@gmail.com>
wrote:>  I suspect newline-delimited will win this race.
Yes.  Everyone please cast a vote for newline-delimited JSON.

Is there any consensus on the appropriate mime type for ndj?

Keith



Reply via email to