Hi,

It seems that  Avro is poised to become "the" file format, is that still the 
case?

We've looked at Text, RCFile and Avro. Text is nice, but we'd really need to 
extend it. RCFile is great for Hive, but it has been a challenge using it 
outside of Hive. Avro has a great feature set, but is comparably (to RCFile) 
significantly slower and larger on disk in our testing, but if it has the 
highest rate of development, it may be the right choice.

If you were choosing a File Format today to build a general purpose cluster 
(general purpose in the sense of using all the Hadoop tools, not just Hive), 
what would you choose? (one of the choices being development of a Custom format)

Thanks,

Mike

Reply via email to