Am joining the conversation late, but another option is is Hadoop's own
RecordIO. Like with Thrift, you need to use compiler-generated stubs to
read and write records, but it also supports schemas. You can
de/serialize schemas separately from content, which gives you lots of
flexibility. 

> -----Original Message-----
> From: Bryan Duxbury [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, May 24, 2008 12:13 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Serialization format for structured data
> 
> 
> On May 23, 2008, at 9:51 AM, Ted Dunning wrote:
> > Relative to thrift, JSON has the advantage of not requiring 
> a schema 
> > as well as the disadvantage of not having a schema.  The 
> advantage is 
> > that the data is more fluid and I don't have to generate code to 
> > handle the records.  The disadvantage is that I lose some data 
> > completeness and typing guarantees.
> > On balance, I would like to use JSON-like data quite a bit 
> in ad hoc 
> > data streams and in logs where the producer and consumer of 
> the data 
> > are not visible to parts of the data processing chain.
> 
> That about sums it up. If you want schema, Thrift is your 
> friend. If you don't, JSON probably will do pretty well for you.
> 
> -Bryan
> 

Reply via email to