Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

Jimmy Xiang Wed, 22 Apr 2015 09:37:14 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33251/
-----------------------------------------------------------


(Updated April 22, 2015, 4:36 p.m.)


Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.


Changes
-------

Addressed Xuefu's review comments: removed threadlocal variable, added some 
javadoc, fixed some code clarification issue.
In this patch, we still clean up cache based on work id so that we can avoid 
extra memory usage for other works in the same job. Unfortunately, this means, 
if there are other works running in parallel with the mapjoin work, the cache 
may be released when it can still be kept for a while.


Bugs: HIVE-10302
    https://issues.apache.org/jira/browse/HIVE-10302


Repository: hive-git


Description
-------

Cached the small table containter so that mapjoin tasks can use it if the task 
is executed on the same Spark executor.
The cache is released right before the next job after the mapjoin job is done.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java fe108c4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java 
2f137f9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
3f240f5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 72ab913 

Diff: https://reviews.apache.org/r/33251/diff/


Testing
-------

Ran several queries in live cluster. ptest pending.


Thanks,

Jimmy Xiang

Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

Reply via email to