Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

Chao Sun Fri, 19 Sep 2014 18:34:57 -0700


> On Sept. 20, 2014, 1:03 a.m., Brock Noland wrote:
> > Awesome work!!!! I have a few minor comments that can be addressed in a 
> > *follow on* patch.


Thanks brock for the comments! I've attached the updated patch.


> On Sept. 20, 2014, 1:03 a.m., Brock Noland wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java, 
> > line 92
> > <https://reviews.apache.org/r/25394/diff/4/?file=698349#file698349line92>
> >
> >     it sounds like we'll be creating a multi-insert specific context? In 
> > that context can we make all the members private?

Yes, I'll do that in the following patch.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/#review54065
-----------------------------------------------------------


On Sept. 20, 2014, 1:33 a.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25394/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2014, 1:33 a.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: HIVE-7503
>     https://issues.apache.org/jira/browse/HIVE-7503
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For Hive's multi insert query 
> (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
> may be an MR job for each insert. When we achieve this with Spark, it would 
> be nice if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things 
> worse, the source of the insert may be re-computed unless it's staged. Even 
> with this, the inserts will happen sequentially, making the performance 
> suffer.
> This task is to find out what takes in Spark to enable this without requiring 
> staging the source and sequential insertion. If this has to be solved in 
> Hive, find out an optimum way to do this.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
> 4211a07 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> 695d8b9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 76fc290 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
>  5fcaf64 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/insert1.q.out 49fb1d4 
>   ql/src/test/results/clientpositive/spark/union18.q.out 9a40807 
>   ql/src/test/results/clientpositive/spark/union19.q.out 131591f 
>   ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1bc55f4 
> 
> Diff: https://reviews.apache.org/r/25394/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

Reply via email to