-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/
-----------------------------------------------------------
(Updated Aug. 7, 2013, 2:13 a.m.)
Review request for hive, Ashutosh Chauhan and Jakob Homan.
Changes
-------
Add logic to avoid excessive logging for each record.
Bugs: HIVE-4732
https://issues.apache.org/jira/browse/HIVE-4732
Repository: hive-git
Description
-------
>From our performance analysis, we found AvroSerde's schema.equals() call
>consumed a substantial amount ( nearly 40%) of time. This patch intends to
>minimize the number schema.equals() calls by pushing the check as late/fewer
>as possible.
At first, we added a unique id for each record reader which is then included in
every AvroGenericRecordWritable. Then, we introduce two new data structures
(one hashset and one hashmap) to store intermediate data to avoid duplicates
checkings. Hashset contains all the record readers' IDs that don't need any
re-encoding. On the other hand, HashMap contains the already used re-encoders.
It works as cache and allows re-encoders reuse. With this change, our test
shows nearly 40% reduction in Avro record reading time.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java
ed2a9af
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
e994411
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
66f0348
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
3828940
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java
9af751b
serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb
Diff: https://reviews.apache.org/r/12480/diff/
Testing
-------
Thanks,
Mohammad Islam