Hello, I'm still getting my head around how Hadoop works.  A survey
question: what kind of serialization do you use to output structured
data from your map/reduce jobs?

When both key and value are primitive types, either TextOutputFormat
or SequenceFileOutputFormat is easy.  But what if you want to store a
more complex data structure as the value?

By "complex data structure," I mean some combination of lists/arrays,
dictionaries/hashes, and primitives (int, float, string, boolean,
null).

I've tried using JSON to store structured data in TextOutputFormat,
which works but is not very efficient.  Any better suggestions?

Thanks, all,
-Stuart

Reply via email to