-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30281/
-----------------------------------------------------------
(Updated Ene. 27, 2015, 6:47 p.m.)
Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
Changes
-------
Here's the second patch with Ferd/Brock feedback.
- Return compressionType back to ParquetHiveSerDe. We will follow up this
legacy code in another JIRA ticket.
- Clean DataWritableWriter a little to make writeMap more readable.
Bugs: HIVE-9333
https://issues.apache.org/jira/browse/HIVE-9333
Repository: hive-git
Description
-------
This patch moves the ParquetHiveSerDe.serialize() implementation to
DataWritableWriter class in order to save time in materializing data on
serialize().
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
ea4109d358f7c48d1e2042e5da299475de4a0a29
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
9caa4ed169ba92dbd863e4a2dc6d06ab226a4465
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
060b1b722d32f3b2f88304a1a73eb249e150294b
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
41b5f1c3b0ab43f734f8a211e3e03d5060c75434
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java
e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java
a693aff18516d133abf0aae4847d3fe00b9f1c96
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java
667d3671547190d363107019cd9a2d105d26d336
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java
007a665529857bcec612f638a157aa5043562a15
serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java
PRE-CREATION
Diff: https://reviews.apache.org/r/30281/diff/
Testing
-------
The tests run were the following:
1. JMH (Java microbenchmark)
This benchmark called parquet serialize/write methods using text writable
objects.
Class.method Before Change (ops/s) After Change (ops/s)
-------------------------------------------------------------------------------
ParquetHiveSerDe.serialize: 19,113 249,528 -> 19x
speed increase
DataWritableWriter.write: 5,033 5,201 ->
3.34% speed increase
2. Write 20 million rows (~1GB file) from Text to Parquet
I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format
using the following
statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text;
Time (s) it took to write the whole file BEFORE changes: 93.758 s
Time (s) it took to write the whole file AFTER changes: 83.903 s
It got a 10% of speed inscrease.
Thanks,
Sergio Pena