I think AvroWriteSupport class already saves avro schema as part of parquet meta data. You can think of using parquet-mr <https://github.com/Parquet/parquet-mr> directly.
Raghavendra On Fri, Jan 9, 2015 at 10:32 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Raghavendra, > > This makes a lot of sense. Thank you. > The problem is that I'm using Spark SQL right now to generate the parquet > file. > > What I think I need to do is to use Spark directly and transform all rows > from SchemaRDD to avro objects and supply it to use saveAsNewAPIHadoopFile > (from the PairRDD). From there, I can supply the avro schema to parquet via > AvroParquetOutputFormat. > > It is not difficult just not as simple as I would like because SchemaRDD > can write to Parquet file using its schema and if I can supply the avro > schema to parquet, it save me the transformation step for avro objects. > > I'm thinking of overriding the saveAsParquetFile method to allows me to > persist the avro schema inside parquet. Is this possible at all? > > Best Regards, > > Jerry > > > On Fri, Jan 9, 2015 at 2:05 AM, Raghavendra Pandey < > raghavendra.pan...@gmail.com> wrote: > >> I cam across this >> http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/. You can take >> a look. >> >> >> On Fri Jan 09 2015 at 12:08:49 PM Raghavendra Pandey < >> raghavendra.pan...@gmail.com> wrote: >> >>> I have the similar kind of requirement where I want to push avro data >>> into parquet. But it seems you have to do it on your own. There >>> is parquet-mr project that uses hadoop to do so. I am trying to write a >>> spark job to do similar kind of thing. >>> >>> On Fri, Jan 9, 2015 at 3:20 AM, Jerry Lam <chiling...@gmail.com> wrote: >>> >>>> Hi spark users, >>>> >>>> I'm using spark SQL to create parquet files on HDFS. I would like to >>>> store the avro schema into the parquet meta so that non spark sql >>>> applications can marshall the data without avro schema using the avro >>>> parquet reader. Currently, schemaRDD.saveAsParquetFile does not allow to do >>>> that. Is there another API that allows me to do this? >>>> >>>> Best Regards, >>>> >>>> Jerry >>>> >>> >>> >