[
https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chao updated HIVE-7503:
-----------------------
Attachment: HIVE-7503.6-spark.patch
This patch mainly changes the way for detecting multi-table insertion.
> Support Hive's multi-table insert query with Spark [Spark Branch]
> -----------------------------------------------------------------
>
> Key: HIVE-7503
> URL: https://issues.apache.org/jira/browse/HIVE-7503
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Chao
> Labels: spark-m1
> Attachments: HIVE-7503.1-spark.patch, HIVE-7503.2-spark.patch,
> HIVE-7503.3-spark.patch, HIVE-7503.4-spark.patch, HIVE-7503.5-spark.patch,
> HIVE-7503.6-spark.patch
>
>
> For Hive's multi insert query
> (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there
> may be an MR job for each insert. When we achieve this with Spark, it would
> be nice if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things
> worse, the source of the insert may be re-computed unless it's staged. Even
> with this, the inserts will happen sequentially, making the performance
> suffer.
> This task is to find out what takes in Spark to enable this without requiring
> staging the source and sequential insertion. If this has to be solved in
> Hive, find out an optimum way to do this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)