Hi, It seems that Avro is poised to become "the" file format, is that still the case?
We've looked at Text, RCFile and Avro. Text is nice, but we'd really need to extend it. RCFile is great for Hive, but it has been a challenge using it outside of Hive. Avro has a great feature set, but is comparably (to RCFile) significantly slower and larger on disk in our testing, but if it has the highest rate of development, it may be the right choice. If you were choosing a File Format today to build a general purpose cluster (general purpose in the sense of using all the Hadoop tools, not just Hive), what would you choose? (one of the choices being development of a Custom format) Thanks, Mike