-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30281/
-----------------------------------------------------------

(Updated Ene. 29, 2015, 5:12 p.m.)


Review request for hive, Ryan Blue, cheng xu, and Dong Chen.


Changes
-------

Patch with Ferd changes recommendations.
I also checking for the inspector category on writeValue() in order to pass the 
correct object inspector to the rest of the methods. I thinkg this makes other 
methods clean.


Bugs: HIVE-9333
    https://issues.apache.org/jira/browse/HIVE-9333


Repository: hive-git


Description
-------

This patch moves the ParquetHiveSerDe.serialize() implementation to 
DataWritableWriter class in order to save time in materializing data on 
serialize().


Diffs (updated)
-----

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
ea4109d358f7c48d1e2042e5da299475de4a0a29 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 060b1b722d32f3b2f88304a1a73eb249e150294b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
41b5f1c3b0ab43f734f8a211e3e03d5060c75434 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java
 e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
a693aff18516d133abf0aae4847d3fe00b9f1c96 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java
 667d3671547190d363107019cd9a2d105d26d336 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 
007a665529857bcec612f638a157aa5043562a15 
  serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/30281/diff/


Testing
-------

The tests run were the following:

1. JMH (Java microbenchmark)

This benchmark called parquet serialize/write methods using text writable 
objects. 

Class.method                  Before Change (ops/s)      After Change (ops/s)   
    
-------------------------------------------------------------------------------
ParquetHiveSerDe.serialize:          19,113                   249,528   ->  19x 
speed increase
DataWritableWriter.write:             5,033                     5,201   ->  
3.34% speed increase


2. Write 20 million rows (~1GB file) from Text to Parquet

I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format 
using the following
statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text;

Time (s) it took to write the whole file BEFORE changes: 93.758 s
Time (s) it took to write the whole file AFTER changes: 83.903 s

It got a 10% of speed inscrease.


Thanks,

Sergio Pena

Reply via email to