----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30739/#review71546 -----------------------------------------------------------
High-level comments: 1. File name convention needs to take care of multiple threads doing the same thing in a single JVM. 2. Avoid creating Tuple2 objects. There is no need to have this object, but we create them just to wrap key/value, likely increase GC pressure. 3. Disk writing can be more performing. We should be able to put keys/values as bytes in a large byte[] (as the buffer). The we need to spill, we write the whole byte[] to disk in one shot. 4. Related to #3, but I'm wondering why we need kryo at all. 5. We need to take care of file closing in case of exceptions. The call usually goes into a finally block. 6. It's better to allocate the buffer in the beginning and fail right way in case of OOM. We don't want the job to fail when only the last row needs to spill and incur OOM then. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java <https://reviews.apache.org/r/30739/#comment117291> private final? - Xuefu Zhang On Feb. 7, 2015, 3:09 a.m., Jimmy Xiang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30739/ > ----------------------------------------------------------- > > (Updated Feb. 7, 2015, 3:09 a.m.) > > > Review request for hive, Rui Li and Xuefu Zhang. > > > Bugs: HIVE-9574 > https://issues.apache.org/jira/browse/HIVE-9574 > > > Repository: hive-git > > > Description > ------- > > Result KV cache doesn't use RowContainer any more since it has logic we don't > need, which is some overhead. We don't do lazy computing right away, instead > we wait a little till the cache is close to spill. > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java > 78ab680 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java > 8ead0cb > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java > 7a09b4d > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java > e92e299 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java > 070ea4d > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java > d4ff37c > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java > 286816b > ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java > 0df4598 > > Diff: https://reviews.apache.org/r/30739/diff/ > > > Testing > ------- > > Unit test, test on cluster > > > Thanks, > > Jimmy Xiang > >
