Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

Jimmy Xiang Wed, 22 Apr 2015 09:28:06 -0700


> On April 21, 2015, 2:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java, line 
> > 120
> > <https://reviews.apache.org/r/33251/diff/2/?file=937278#file937278line120>
> >
> >     1. For clarity, it might be good to put this in a separate private 
> > method.
> >     2. Does it work if we just synchronize on mapJoinTables[pos]?


1. Have put it in a separate method. 2. Probabl it does work to synchronize on 
mapJoinTables[pos] since it is not the same object in different tasks.


> On April 21, 2015, 2:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java, 
> > line 52
> > <https://reviews.apache.org/r/33251/diff/2/?file=937280#file937280line52>
> >
> >     Method naming, see below.

Fixed.  Changed the method a little so that it doesn't do more than clean up 
small table caches.


> On April 21, 2015, 2:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 
> > 51
> > <https://reviews.apache.org/r/33251/diff/2/?file=937281#file937281line51>
> >
> >     Using thread-local makes me a little nervous, but let's discuss about 
> > this offline.

Per our discussion, we don't use thread-local any more.


> On April 21, 2015, 2:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 
> > 60
> > <https://reviews.apache.org/r/33251/diff/2/?file=937281#file937281line60>
> >
> >     The method name suggests no indication of a side effect of setting 
> > thread local value. We'd better put this outside of this method.
> >     
> >     In addition, the method name seems also a little confusing in that it 
> > suggests cleanup is for sure but in fact it's conditional.

Removed the thread local variable usage.  As to the method name, it means we 
clean up the cache a little, it doesn't mean we remove all the cahced contents. 
I added some javadoc to make it a little clear.


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33251/#review80969
-----------------------------------------------------------


On April 21, 2015, 1:37 a.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33251/
> -----------------------------------------------------------
> 
> (Updated April 21, 2015, 1:37 a.m.)
> 
> 
> Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10302
>     https://issues.apache.org/jira/browse/HIVE-10302
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Cached the small table containter so that mapjoin tasks can use it if the 
> task is executed on the same Spark executor.
> The cache is released right before the next job after the mapjoin job is done.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> fe108c4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 3f240f5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> 97b3471 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
> 72ab913 
> 
> Diff: https://reviews.apache.org/r/33251/diff/
> 
> 
> Testing
> -------
> 
> Ran several queries in live cluster. ptest pending.
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>

Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]

Reply via email to