On 03/13/2015 10:07 AM, shef wrote:
I've spent an hour going through the code and the links and the presentations 
and haven't found example code to get started.
All I want to do is read and write records. I don't need any of the 
integrations (Hadoop, Pig, etc.). Just want to use it to store and query data.
Can someone point me in the right direction?

Hi Shef,

I understand how it can be a little difficult since there are quite a few choices. The reason is that Parquet doesn't require you to use objects that knows how to interact with. In the long run, that's more flexible but more difficult to know where to start.

I recommend starting with parquet-avro, which uses the well-documented Avro object model and can be used as a drop-in replacement for Avro input or output formats. It would look like this:

  // Schema is an Avro Schema object, and GenericRecord is Avro, too
  AvroParquetWriter<GenericRecord> writer =
      new AvroParquetWriter<GenericRecord>(file, schema);
  writer.write(record);
  writer.close();

You can also use higher-level frameworks that manage collections of files, like Kite (with a CLI to convert to Parquet) or Hive (convert to Parquet with SQL).

rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to