----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12480/#review25537 -----------------------------------------------------------
One issue in the testing and a few formatting issues. Otherwise looks good. serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java <https://reviews.apache.org/r/12480/#comment49986> Weird spacing... 2x below as well. serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java <https://reviews.apache.org/r/12480/#comment49984> These should never be null, not even in testing. It's better to change the tests to correctly populate the data structure. serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java <https://reviews.apache.org/r/12480/#comment49985> And this would indicate a bug. - Jakob Homan On Aug. 6, 2013, 7:13 p.m., Mohammad Islam wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/12480/ > ----------------------------------------------------------- > > (Updated Aug. 6, 2013, 7:13 p.m.) > > > Review request for hive, Ashutosh Chauhan and Jakob Homan. > > > Bugs: HIVE-4732 > https://issues.apache.org/jira/browse/HIVE-4732 > > > Repository: hive-git > > > Description > ------- > > From our performance analysis, we found AvroSerde's schema.equals() call > consumed a substantial amount ( nearly 40%) of time. This patch intends to > minimize the number schema.equals() calls by pushing the check as late/fewer > as possible. > > At first, we added a unique id for each record reader which is then included > in every AvroGenericRecordWritable. Then, we introduce two new data > structures (one hashset and one hashmap) to store intermediate data to avoid > duplicates checkings. Hashset contains all the record readers' IDs that don't > need any re-encoding. On the other hand, HashMap contains the already used > re-encoders. It works as cache and allows re-encoders reuse. With this > change, our test shows nearly 40% reduction in Avro record reading time. > > > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java > ed2a9af > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java > e994411 > > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java > 66f0348 > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java > 3828940 > serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java > 9af751b > serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb > > Diff: https://reviews.apache.org/r/12480/diff/ > > > Testing > ------- > > > Thanks, > > Mohammad Islam > >