Re: [CODE4LIB] MARCXML - What is it for?

MJ Suhonos Mon, 25 Oct 2010 13:17:42 -0700

JSON++

I routinely re-index about 2.5M JSON records (originally from binary MARC), and 
it's several orders of magnitude faster than XML (measured in single-digit 
minutes rather than double-digit hours).  I'm not sure if it's in the same 
range as binary MARC, but as Tim says, it's plenty fast enough for pragmatic 
purposes.


Unfortunately JSON doesn't have as many mature tools for manipulation as XML 
(yet?), but I'd be inclined to call it the best of both worlds rather than a 
middle-ground or compromise.

MJ

> Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML 
> (although still probably not as binary), based on a standard low-level data 
> format so easier to work with using existing tools (and developers eyes) than 
> binary, no maximum record length. 
> There have been a couple competing attempts to define a 
> marc-expressed-in-json 'standard', none have really caught on yet. I like 
> Ross's latest attempt:  
> http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
> 
> Patrick Hochstenbach wrote:
>> Dear Nate,
>> 
>> There is a trade-off: do you want very fast processing of data -> go for 
>> binary data. do you want to share your data globally easily in many (not per 
>> se library related) environments -> go for XML/RDF. Open your data and do 
>> both :-)
>> 
>> Pat
>> 
>> Sent from my iPhone
>> 
>> On 25 Oct 2010, at 20:39, "Nate Vack" <[email protected]> wrote:
>> 
>>  
>>> Hi all,
>>> 
>>> I've just spent the last couple of weeks delving into and decoding a
>>> binary file format. This, in turn, got me thinking about MARCXML.
>>> 
>>> In a nutshell, it looks like it's supposed to contain the exact same
>>> data as a normal MARC record, except in XML form. As in, it should be
>>> round-trippable.
>>> 
>>> What's the advantage to this? I can see using a human-readable format
>>> for poorly-documented file formats -- they're relatively easy to read
>>> and understand. But MARC is well, well-documented, with more than one
>>> free implementation in cursory searching. And once you know a binary
>>> file's format, it's no harder to parse than XML, and the data's
>>> smaller and processing faster.
>>> 
>>> So... why the XML?
>>> 
>>> Curious,
>>> -Nate
>>>    
>> 
>>

Re: [CODE4LIB] MARCXML - What is it for?

Reply via email to