-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35950/
-----------------------------------------------------------
(Updated June 28, 2015, 12:29 a.m.)
Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
Bugs: HIVE-11131
https://issues.apache.org/jira/browse/HIVE-11131
Repository: hive-git
Description
-------
Implemented data type writers that will be created before the first Hive row is
written to Parquet. These writers contain information about object inspectors
and schema of a specific data type, and calls the specific addXXXX() method
used by Parquet for each data type.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
c195c3ec3ddae19bf255fc2c9633f8bf4390f428
Diff: https://reviews.apache.org/r/35950/diff/
Testing (updated)
-------
Tests from TestDataWritableWriter run OK.
I run other tests with micro-becnhmarks, and I got some better results from
this new implemntation:
Using repeated rows across the file, this is the throughput increase using 1
million records:
bigint boolean double float int string
7.598 7.491 7.488 7.588 7.53 0.270 (before)
10.137 11.511 10.155 10.297 10.242 0.286 (after)
Using random rows across the file, the is the throughput increase using 1
million records:
bigint boolean double float int string
5.268 7.723 4.107 4.173 4.729 0.20 (before)
6.236 10.466 5.944 4.749 5.234 0.22 (after)
Thanks,
Sergio Pena