Hi,
In my mapreduce program, I have my model defined in Avro and have been
using the AvroParquet Input/Output format classes to serialize Parquet
files with Avro model. I have faced no issues with that.
I'm being told that using a Avro model and writing to Parquet is
in-efficient and writing directly to Parquet is a better option.
I have two questions:
1) What are the advantages, if any, of writing directly to Parquet and not
through Avro?
2) Majority of the material on the web about Parquet is about writing to
Parquet using one of the available WriteSupport like Avro. Are there any
examples/pointers to code related to writing/reading direct Parquet files.

PS: My data model is not very complex. It has a bunch of primitives, some
Maps (String -> Number) and Lists (of Strings). No multi-level nested
structures.

Thanks & Regards
MK

Reply via email to