I cam across this http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/. You can take a look.
On Fri Jan 09 2015 at 12:08:49 PM Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > I have the similar kind of requirement where I want to push avro data into > parquet. But it seems you have to do it on your own. There is parquet-mr > project that uses hadoop to do so. I am trying to write a spark job to do > similar kind of thing. > > On Fri, Jan 9, 2015 at 3:20 AM, Jerry Lam <chiling...@gmail.com> wrote: > >> Hi spark users, >> >> I'm using spark SQL to create parquet files on HDFS. I would like to >> store the avro schema into the parquet meta so that non spark sql >> applications can marshall the data without avro schema using the avro >> parquet reader. Currently, schemaRDD.saveAsParquetFile does not allow to do >> that. Is there another API that allows me to do this? >> >> Best Regards, >> >> Jerry >> > >