> On Nov. 8, 2014, 3:15 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java, > > line 214 > > <https://reviews.apache.org/r/27627/diff/3/?file=754597#file754597line214> > > > > This assumes that result SparkWorks will be linearly dependent on each > > other, which isn't true in general.Let's say the are two works (w1 and w2), > > each having a map join operator. w1 and w2 are connected to w3 via HTS. w3 > > also contains map join operator. Dependency in this scenario will be > > graphic rather than linear. > > Chao Sun wrote: > I was thinking, in this case, if there's no dependency between w1 and w2, > they can be put in the same SparkWork, right? > Otherwise, they will form a linear dependency too. > > Xuefu Zhang wrote: > w1 and w2 are fine. they will be in the same SparkWork. This SparkWork > will depends on both the SparkWork generated at w1 and SparkWork generated at > w2. This dependency is not linear. > > To put more details, for each work that has map join op, we need to > create a SparkWork to handle its small tables. So, both w1 and w2 will need > to create such SparkWork. While w1 and w2 are in the same SparkWork, this > SparkWork depends on the two SparkWorks created. > > Chao Sun wrote: > I'm not getting it, why "This dependency is not linear"? Can you give a > counter example? > Suppose w1(MJ_1) w2(MJ_2), and w3(MJ_3) are like the following: > > HTS_1 HTS_2 HTS_3 HTS_4 > \ / \ / > \ / \ / > MJ_1 MJ_2 > | | > | | > HTS_5 HTS_6 > \ / > \ / > \ / > \ / > \ / > MJ_3 > > Then, what I'm doing is to put HTS_1, HTS_2, HTS_3, and HTS_4 in the same > SparkWork, say SW_1 > then, MJ_1, MJ_2, HTS_5, and HTS_6 will be in another SparkWork SW_2, and > MJ_3 in another SparkWork SW_3. > SW_1 -> SW_2 -> SW_3. > > Xuefu Zhang wrote: > I don't think we should put (HTS1,HTS2) and (HTS3, HTS4) in the same > SparkWork. They belong to different MJ handling different sets of small > tables. This will complicate things, making HashTableSinkOperator and > HashTableLoader more complicated. > > Per dependency, MJ1 doesn't need to wait for HTS3/HTS4 in order to run, > and vice versa. > > Please refer to pseudo code posted in the JIRA for implementation ideas. > Thanks.
Resolved via a offline chat. - Chao ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27627/#review60482 ----------------------------------------------------------- On Nov. 9, 2014, 10:39 p.m., Chao Sun wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27627/ > ----------------------------------------------------------- > > (Updated Nov. 9, 2014, 10:39 p.m.) > > > Review request for hive. > > > Bugs: HIVE-8622 > https://issues.apache.org/jira/browse/HIVE-8622 > > > Repository: hive-git > > > Description > ------- > > This is a sub-task of map-join for spark > https://issues.apache.org/jira/browse/HIVE-7613 > This can use the baseline patch for map-join > https://issues.apache.org/jira/browse/HIVE-8616 > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 46d02bf > > Diff: https://reviews.apache.org/r/27627/diff/ > > > Testing > ------- > > > Thanks, > > Chao Sun > >