On 09/07/2014 01:45 PM, Rafeeq S wrote:
I am newBee to parquet.
Please suggest an example to write data into *parquet* file using
parquetFileWriter.
I have tried below Example code to write data into parquet file using
*parquetWriter*.
http://php.sabscape.com/blog/?p=623
The above example uses parquetWriter, But I want to use ParquetFileWriter
to write data efficiently in parquet files.
Please suggest an example or how we can write parquet files using
*ParquetFileWriter* ?
Regards,
Rafeeq S
Hi Rafeeq,
ParquetFileWriter is actually an internal implementation that's used by
higher-level interfaces, like parquet-avro, parquet-thrift, and others.
The reason is that Parquet doesn't have its own object model that it
makes you use. It has an API so that you can use whatever model you want
backed by the Parquet format. The Avro classes are a good demonstration,
where you use Avro runtime objects and Schemas, but the results are
stored as Parquet files.
This is great if you're moving from another serialization library to
Parquet because there is very little code to change and you don't have
to translate. But if you just want to store data in Parquet, then you
first need to choose what library you want for runtime objects.
I highly recommend using Avro objects because both Avro and Parquet are
splittable Hadoop-friendly formats. Plus, Avro has a lot of flexibility:
you can use generic objects, generate objects from your data schema, or
generate a schema from java classes. Here's what it would look like to
write Strings:
Schema schema = Schema.create(Schema.Type.STRING);
writer = new AvroParquetWriter(
new Path("/file/path.parquet"), schema);
writer.write("a string");
writer.close();
Of course, the Avro objects can be a lot more complicated than Strings,
but should work just fine as long as the object matches the Schema you
provide to build the writer.
Does this help?
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.