Thanks Jacques and Alex. I have been successfully using Avro model to write to Parquet files and found that quite logical, because Avro is quite rich. Are there any functional or performance impacts of using Avro model based Parquet files, specifically w.r.t accessing the generated Parquet files through other tools like Drill, SparkSQL etc? Thanks & Regards MK
On Tue, Apr 7, 2015 at 9:30 PM, Jacques Nadeau <[email protected]> wrote: > You can write to Parquet using the RecordConsumer model. It's lower level > so not everyone will have appetite for it but it can be more efficient > depending on your particular application. > > On Tue, Apr 7, 2015 at 6:22 PM, Alex Levenson < > [email protected]> wrote: > > > You have to write to parquet through *some* object model. Whether it's > > thrift, avro, or plain java objects, you need some way to represent a > > schema. While using plain java objects might seem more direct, the plain > > java object support is done via reflection, so using avro makes more > sense > > when you've already got an avro schema. > > > > Does that make sense? > > > > On Tue, Apr 7, 2015 at 6:13 PM, Karthikeyan Muthukumar < > > [email protected]> wrote: > > > > > Hi, > > > In my mapreduce program, I have my model defined in Avro and have been > > > using the AvroParquet Input/Output format classes to serialize Parquet > > > files with Avro model. I have faced no issues with that. > > > I'm being told that using a Avro model and writing to Parquet is > > > in-efficient and writing directly to Parquet is a better option. > > > I have two questions: > > > 1) What are the advantages, if any, of writing directly to Parquet and > > not > > > through Avro? > > > 2) Majority of the material on the web about Parquet is about writing > to > > > Parquet using one of the available WriteSupport like Avro. Are there > any > > > examples/pointers to code related to writing/reading direct Parquet > > files. > > > > > > PS: My data model is not very complex. It has a bunch of primitives, > some > > > Maps (String -> Number) and Lists (of Strings). No multi-level nested > > > structures. > > > > > > Thanks & Regards > > > MK > > > > > > > > > > > -- > > Alex Levenson > > @THISWILLWORK > > >
