I think AvroWriteSupport class already saves avro schema as part of parquet
meta data. You can think of using parquet-mr
<https://github.com/Parquet/parquet-mr> directly.

Raghavendra

On Fri, Jan 9, 2015 at 10:32 PM, Jerry Lam <chiling...@gmail.com> wrote:

> Hi Raghavendra,
>
> This makes a lot of sense. Thank you.
> The problem is that I'm using Spark SQL right now to generate the parquet
> file.
>
> What I think I need to do is to use Spark directly and transform all rows
> from SchemaRDD to avro objects and supply it to use saveAsNewAPIHadoopFile
> (from the PairRDD). From there, I can supply the avro schema to parquet via
> AvroParquetOutputFormat.
>
> It is not difficult just not as simple as I would like because SchemaRDD
> can write to Parquet file using its schema and if I can supply the avro
> schema to parquet, it save me the transformation step for avro objects.
>
> I'm thinking of overriding the saveAsParquetFile method to allows me to
> persist the avro schema inside parquet. Is this possible at all?
>
> Best Regards,
>
> Jerry
>
>
> On Fri, Jan 9, 2015 at 2:05 AM, Raghavendra Pandey <
> raghavendra.pan...@gmail.com> wrote:
>
>> I cam across this
>> http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/. You can take
>> a look.
>>
>>
>> On Fri Jan 09 2015 at 12:08:49 PM Raghavendra Pandey <
>> raghavendra.pan...@gmail.com> wrote:
>>
>>> I have the similar kind of requirement where I want to push avro data
>>> into parquet. But it seems you have to do it on your own. There
>>> is parquet-mr project that uses hadoop to do so. I am trying to write a
>>> spark job to do similar kind of thing.
>>>
>>> On Fri, Jan 9, 2015 at 3:20 AM, Jerry Lam <chiling...@gmail.com> wrote:
>>>
>>>> Hi spark users,
>>>>
>>>> I'm using spark SQL to create parquet files on HDFS. I would like to
>>>> store the avro schema into the parquet meta so that non spark sql
>>>> applications can marshall the data without avro schema using the avro
>>>> parquet reader. Currently, schemaRDD.saveAsParquetFile does not allow to do
>>>> that. Is there another API that allows me to do this?
>>>>
>>>> Best Regards,
>>>>
>>>> Jerry
>>>>
>>>
>>>
>

Reply via email to