On 03/13/2015 10:07 AM, shef wrote:
I've spent an hour going through the code and the links and the presentations
and haven't found example code to get started.
All I want to do is read and write records. I don't need any of the
integrations (Hadoop, Pig, etc.). Just want to use it to store and query data.
Can someone point me in the right direction?
Hi Shef,
I understand how it can be a little difficult since there are quite a
few choices. The reason is that Parquet doesn't require you to use
objects that knows how to interact with. In the long run, that's more
flexible but more difficult to know where to start.
I recommend starting with parquet-avro, which uses the well-documented
Avro object model and can be used as a drop-in replacement for Avro
input or output formats. It would look like this:
// Schema is an Avro Schema object, and GenericRecord is Avro, too
AvroParquetWriter<GenericRecord> writer =
new AvroParquetWriter<GenericRecord>(file, schema);
writer.write(record);
writer.close();
You can also use higher-level frameworks that manage collections of
files, like Kite (with a CLI to convert to Parquet) or Hive (convert to
Parquet with SQL).
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.