[
https://issues.apache.org/jira/browse/HIVE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203758#comment-14203758
]
Xuefu Zhang commented on HIVE-8622:
-----------------------------------
Here is my sudo code showing my attemp to solve this seemingly complex problem:
{code}
// Notation:
// MJWork - a work with map join operator
// HTSWork = a work with HashTableSinkOperator
// Each MJWork will build a SparkWork for its small table works. This info is
held in a map <MJWork, SparkWork>,
// originally empty and named childSparkWorkMap
Map<MJWork, SparkWork> childSparkWorkMap = new HashMap<MJWork, SparkWork>();
// Each work, including a MJWork, also belongs to a parent SparkWork.
Originally, all works belong to the original SparkWork.
// The info is help in another map <MJWork, SparkWork> named parentSparkWork.
Map<BaseWork, SparkWork> parentSparkWorkMap = new HashMap<BaseWork,
SparkWork>();
List<BaseWork> works = sparkWork.getAllWorks(); // sparkWork is original
SparkWork to be split
for (BaseWork work : works) {
parentSparkWorkMap.put(work, sparkWork);
}
// dependency map among all SparkWorks. This our final result
Map<SparkWork, SparkWork> dependencyMap = new new HashMap<SparkWork,
SparkWork>();
// Process the original SparkWork from leaves backwards to roots.
List<BaseWork> leaves = sparkWork.getLeaves();
for (BaseWork leaf : leaves) {
move(leaf, sparkWork);
}
/**
* Move a work from original SparkWork to the target SparkWork
*/
void move(BaseWork work, SparkWork target) {
List<BaseWork> parentWorks = sparkWork.getParents(work);
SparkWork currentParentSparkWork = parentSparkWorkMap.get(work);
if(currentParentSparkWork != target) {
// TODO: move the work from currentParent to target.
parentSparkWorkMap.put(work, target); // update new parent
}
if (!(work instanceof MJWork)) {
for(BaseWork parent : parents) {
// move each parent to the same parent SparkWork of work
move(parent, target);
}
} else {
// it's a MJWork.
SparkWork childSparkWork = new SparkWork();
dependencyMap.put(target, childSparkWork);
childSparkMap.put(work, childSparkWork);
for(BaseWork parent : parents) {
if (parent instanceof HTSWork) {
move(parent, childSparkWork);
} else {
move(parent, target);
}
}
}
}
{code}
> Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]
> ----------------------------------------------------------------
>
> Key: HIVE-8622
> URL: https://issues.apache.org/jira/browse/HIVE-8622
> Project: Hive
> Issue Type: Sub-task
> Reporter: Suhas Satish
> Assignee: Chao
> Attachments: HIVE-8622.2-spark.patch, HIVE-8622.3-spark.patch,
> HIVE-8622.patch
>
>
> This is a sub-task of map-join for spark
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)