-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35950/
-----------------------------------------------------------
Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
Bugs: HIVE-11131
https://issues.apache.org/jira/browse/HIVE-11131
Repository: hive-git
Description
-------
Implemented data type writers that will be created before the first Hive row is
written to Parquet. These writers contain information about object inspectors
and schema of a specific data type, and calls the specific addXXXX() method
used by Parquet for each data type.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
c195c3ec3ddae19bf255fc2c9633f8bf4390f428
Diff: https://reviews.apache.org/r/35950/diff/
Testing
-------
Tests from TestDataWritableWriter run OK.
I run other tests with micro-becnhmarks, and I got some better results from
this new implemntation:
Using repeated rows across the file, the speed increased in:
bigint boolean double float int string
33.42% 53.66% 35.62% 35.70% 36.02% 5.93%
Using random rows across the file, the speed increased in:
bigint boolean double float int string
18.38% 35.52% 44.73% 13.80% 10.68% 10.00%
Thanks,
Sergio Pena