I should have phrased it differently, Avro schema has additional
properties like required etc.. Right now the json data that I have gets
stored as optional fields in the parquet file. Is there a way to model the
parquet file schema, close to avro schema. I tried using the
>
> But when tired using Spark streamng I could not find a way to store the
> data with the avro schema information. The closest that I got was to create
> a Dataframe using the json RDDs and store them as parquet. Here the parquet
> files had a spark specific schema in their footer.
>
Does this
Hi,
Which version of spark are you using??
On Mon, Mar 21, 2016 at 12:28 PM, Sebastian Piu
wrote:
> We use this, but not sure how the schema is stored
>
> Job job = Job.getInstance();
> ParquetOutputFormat.setWriteSupportClass(job, AvroWriteSupport.class);
>
We use this, but not sure how the schema is stored
Job job = Job.getInstance();
ParquetOutputFormat.setWriteSupportClass(job, AvroWriteSupport.class);
AvroParquetOutputFormat.setSchema(job, schema);
LazyOutputFormat.setOutputFormatClass(job, new
ParquetOutputFormat().getClass());
Hi All,
In my current project there is a requirement to store avro data
(json format) as parquet files.
I was able to use AvroParquetWriter in separately to create the Parquet
Files. The parquet files along with the data also had the 'avro schema'
stored on them as a part of their