You can write to Parquet using the RecordConsumer model. It's lower level so not everyone will have appetite for it but it can be more efficient depending on your particular application.
On Tue, Apr 7, 2015 at 6:22 PM, Alex Levenson < [email protected]> wrote: > You have to write to parquet through *some* object model. Whether it's > thrift, avro, or plain java objects, you need some way to represent a > schema. While using plain java objects might seem more direct, the plain > java object support is done via reflection, so using avro makes more sense > when you've already got an avro schema. > > Does that make sense? > > On Tue, Apr 7, 2015 at 6:13 PM, Karthikeyan Muthukumar < > [email protected]> wrote: > > > Hi, > > In my mapreduce program, I have my model defined in Avro and have been > > using the AvroParquet Input/Output format classes to serialize Parquet > > files with Avro model. I have faced no issues with that. > > I'm being told that using a Avro model and writing to Parquet is > > in-efficient and writing directly to Parquet is a better option. > > I have two questions: > > 1) What are the advantages, if any, of writing directly to Parquet and > not > > through Avro? > > 2) Majority of the material on the web about Parquet is about writing to > > Parquet using one of the available WriteSupport like Avro. Are there any > > examples/pointers to code related to writing/reading direct Parquet > files. > > > > PS: My data model is not very complex. It has a bunch of primitives, some > > Maps (String -> Number) and Lists (of Strings). No multi-level nested > > structures. > > > > Thanks & Regards > > MK > > > > > > -- > Alex Levenson > @THISWILLWORK >
