Re: Serialization format for structured data

Ted Dunning Thu, 22 May 2008 15:20:04 -0700

What is it that makes you not think JSON has to be inefficient?

? Repeated value parsing ?


? Redundant redundant data labels ?

? Generic "parsing must be slow" prejudice ?


On 5/22/08 1:54 PM, "Stuart Sierra" <[EMAIL PROTECTED]> wrote:

> Hello, I'm still getting my head around how Hadoop works.  A survey
> question: what kind of serialization do you use to output structured
> data from your map/reduce jobs?
> 
> When both key and value are primitive types, either TextOutputFormat
> or SequenceFileOutputFormat is easy.  But what if you want to store a
> more complex data structure as the value?
> 
> By "complex data structure," I mean some combination of lists/arrays,
> dictionaries/hashes, and primitives (int, float, string, boolean,
> null).
> 
> I've tried using JSON to store structured data in TextOutputFormat,
> which works but is not very efficient.  Any better suggestions?
> 
> Thanks, all,
> -Stuart

Re: Serialization format for structured data

Reply via email to