Hi Ping, There's an experimental new model that should speed things up a little bit: https://github.com/apache/incubator-parquet-mr/pull/15 alternatively you could use the Avro or ProtoBuf support to see if it's faster.
On Wed, Jul 23, 2014 at 8:46 AM, Ping Hao <[email protected]> wrote: > we are using parquet-mr API to continuously write parquet file to hdfs, > which will become data file of predefined hive & impala table so it can be > query out. the problem is the throughput of parquet writer become > bottleneck of our pipeline, my local test on a linux box (Dell T5600) show > about 150K/sec records could be wrote with one parquet writer thread. I > wander if there is anyway we could boost up the performance. > > more context, the record have 30 fields (no nesting) with primitive types > like tinyint, int, bigint and float. all record have values of all fields > assigned. we use DataWritableWriteSupport and ParquetWriter<ArrayWritable> > to seriliaze records. > > any suggestion? > > > -- > http://parquet.github.com/ > --- > You received this message because you are subscribed to the Google Groups > "Parquet" group. > To post to this group, send email to [email protected]. >
