> On Nov. 8, 2014, 3:15 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
> > line 214
> > <https://reviews.apache.org/r/27627/diff/3/?file=754597#file754597line214>
> >
> > This assumes that result SparkWorks will be linearly dependent on each
> > other, which isn't true in general.Let's say the are two works (w1 and w2),
> > each having a map join operator. w1 and w2 are connected to w3 via HTS. w3
> > also contains map join operator. Dependency in this scenario will be
> > graphic rather than linear.
>
> Chao Sun wrote:
> I was thinking, in this case, if there's no dependency between w1 and w2,
> they can be put in the same SparkWork, right?
> Otherwise, they will form a linear dependency too.
>
> Xuefu Zhang wrote:
> w1 and w2 are fine. they will be in the same SparkWork. This SparkWork
> will depends on both the SparkWork generated at w1 and SparkWork generated at
> w2. This dependency is not linear.
>
> To put more details, for each work that has map join op, we need to
> create a SparkWork to handle its small tables. So, both w1 and w2 will need
> to create such SparkWork. While w1 and w2 are in the same SparkWork, this
> SparkWork depends on the two SparkWorks created.
I'm not getting it, why "This dependency is not linear"? Can you give a counter
example?
Suppose w1(MJ_1) w2(MJ_2), and w3(MJ_3) are like the following:
HTS_1 HTS_2 HTS_3 HTS_4
\ / \ /
\ / \ /
MJ_1 MJ_2
| |
| |
HTS_5 HTS_6
\ /
\ /
\ /
\ /
\ /
MJ_3
Then, what I'm doing is to put HTS_1, HTS_2, HTS_3, and HTS_4 in the same
SparkWork, say SW_1
then, MJ_1, MJ_2, HTS_5, and HTS_6 will be in another SparkWork SW_2, and MJ_3
in another SparkWork SW_3.
SW_1 -> SW_2 -> SW_3.
- Chao
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/#review60482
-----------------------------------------------------------
On Nov. 7, 2014, 6:07 p.m., Chao Sun wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27627/
> -----------------------------------------------------------
>
> (Updated Nov. 7, 2014, 6:07 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-8622
> https://issues.apache.org/jira/browse/HIVE-8622
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This is a sub-task of map-join for spark
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616
>
>
> Diffs
> -----
>
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
> PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6
>
> Diff: https://reviews.apache.org/r/27627/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Chao Sun
>
>