Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

Chao Sun Fri, 05 Sep 2014 11:19:00 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/
-----------------------------------------------------------


(Updated Sept. 5, 2014, 6:18 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-7503
    https://issues.apache.org/jira/browse/HIVE-7503


Repository: hive-git


Description
-------

For Hive's multi insert query 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
may be an MR job for each insert. When we achieve this with Spark, it would be 
nice if all the inserts can happen concurrently.
It seems that this functionality isn't available in Spark. To make things 
worse, the source of the insert may be re-computed unless it's staged. Even 
with this, the inserts will happen sequentially, making the performance suffer.
This task is to find out what takes in Spark to enable this without requiring 
staging the source and sequential insertion. If this has to be solved in Hive, 
find out an optimum way to do this.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
5ddc16d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
5fcaf64 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25394/diff/


Testing
-------


Thanks,

Chao Sun

Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

Reply via email to