[
https://issues.apache.org/jira/browse/HIVE-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209856#comment-14209856
]
Xuefu Zhang commented on HIVE-8841:
-----------------------------------
Patch looks good. +1
{quote}
do you know how I can verify this?
{quote}
Currently we can only manually verify this. For instance, if a MapWork is split
and thus cloned, we should read from the source only once. Similar for a
ReduceWork, the shuffle before the ReduceWork should only happen once.
I'm going to create a JIRA to visualize the Spark plan so that we know where
caching is turned on.
> Make RDD caching work for multi-insert after HIVE-8793 when map join is
> involved [Spark Branch]
> -----------------------------------------------------------------------------------------------
>
> Key: HIVE-8841
> URL: https://issues.apache.org/jira/browse/HIVE-8841
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Rui Li
> Attachments: HIVE-8841.1-spark.patch
>
>
> Splitting SparkWork now happens before MapJoinResolver. As MapJoinResolve may
> further spins off a dependent SparkWork for small tables of a join, we need
> to make Spark RDD caching continue work even across SparkWorks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)