Hi Ping,
There's an experimental new model that should speed things up a little bit:
https://github.com/apache/incubator-parquet-mr/pull/15
alternatively you could use the Avro or ProtoBuf support to see if it's
faster.


On Wed, Jul 23, 2014 at 8:46 AM, Ping Hao <[email protected]> wrote:

> we are using parquet-mr API to continuously write parquet file to hdfs,
> which will become data file of predefined hive & impala table so it can be
> query out. the problem is the throughput of parquet writer become
> bottleneck of our pipeline, my local test on a linux box (Dell T5600) show
> about 150K/sec records could be wrote with one parquet writer thread. I
> wander if there is anyway we could boost up the performance.
>
> more context, the record have 30 fields (no nesting) with primitive types
> like tinyint, int, bigint and float. all record have values of all fields
> assigned. we use DataWritableWriteSupport and ParquetWriter<ArrayWritable>
> to seriliaze records.
>
> any suggestion?
>
>
>  --
> http://parquet.github.com/
> ---
> You received this message because you are subscribed to the Google Groups
> "Parquet" group.
> To post to this group, send email to [email protected].
>

Reply via email to