Re: Spark SQL: Storing AVRO Schema in Parquet

2015-01-11 Thread Raghavendra Pandey
I think AvroWriteSupport class already saves avro schema as part of parquet meta data. You can think of using parquet-mr https://github.com/Parquet/parquet-mr directly. Raghavendra On Fri, Jan 9, 2015 at 10:32 PM, Jerry Lam chiling...@gmail.com wrote: Hi Raghavendra, This makes a lot of

Re: Spark SQL: Storing AVRO Schema in Parquet

2015-01-09 Thread Jerry Lam
Hi Raghavendra, This makes a lot of sense. Thank you. The problem is that I'm using Spark SQL right now to generate the parquet file. What I think I need to do is to use Spark directly and transform all rows from SchemaRDD to avro objects and supply it to use saveAsNewAPIHadoopFile (from the

Re: Spark SQL: Storing AVRO Schema in Parquet

2015-01-08 Thread Raghavendra Pandey
I cam across this http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/. You can take a look. On Fri Jan 09 2015 at 12:08:49 PM Raghavendra Pandey raghavendra.pan...@gmail.com wrote: I have the similar kind of requirement where I want to push avro data into parquet. But it seems you have

Re: Spark SQL: Storing AVRO Schema in Parquet

2015-01-08 Thread Raghavendra Pandey
I have the similar kind of requirement where I want to push avro data into parquet. But it seems you have to do it on your own. There is parquet-mr project that uses hadoop to do so. I am trying to write a spark job to do similar kind of thing. On Fri, Jan 9, 2015 at 3:20 AM, Jerry Lam

Spark SQL: Storing AVRO Schema in Parquet

2015-01-08 Thread Jerry Lam
Hi spark users, I'm using spark SQL to create parquet files on HDFS. I would like to store the avro schema into the parquet meta so that non spark sql applications can marshall the data without avro schema using the avro parquet reader. Currently, schemaRDD.saveAsParquetFile does not allow to do