> On Dec. 12, 2014, 7:45 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java, > > line 78 > > <https://reviews.apache.org/r/28889/diff/2/?file=789801#file789801line78> > > > > nit: grandParentOps.get(0) is repeated in the next line. nice to have a > > var for it.
Sure. Will fix. - Chao ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28889/#review64959 ----------------------------------------------------------- On Dec. 11, 2014, 10:36 p.m., Chao Sun wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28889/ > ----------------------------------------------------------- > > (Updated Dec. 11, 2014, 10:36 p.m.) > > > Review request for hive, Szehon Ho and Xuefu Zhang. > > > Bugs: HIVE-8911 > https://issues.apache.org/jira/browse/HIVE-8911 > > > Repository: hive-git > > > Description > ------- > > Basically the idea is to reuse as much code as possible, from MR. > > The issue is that, in MR's MapJoinProcessor, after join op is converted to > mapjoin op, all the parents ReduceSinkOperators are removed. However, for our > Spark branch, we need to preserve those, because they serve as boundaries > between BaseWorks, and SparkReduceSinkMapJoinProc triggers upon them. > > Initially I tried to move this part of logic to SparkMapJoinOptimizer, which > happens at a later stage. However, although this works, I'm worried it may > have too much affect on the smb join w/ hint, because we then have to move > that part of logic to SparkMapJoinOptimizer too. In general, I want to > minimize the affect on code path. > > This patch make changes on MapJoinProcessor. I created a separate method > convertMapJoinForSpark, which doesn't remove the > ReduceSinkOperators, for small tables. Then, in the transform method it > decides which method to call based on the execution engine. > > I also have to disable several tests related to smb join w/ hints. They can > be activated once HIVE-8640 is resolved. > > > Diffs > ----- > > data/conf/spark/hive-site.xml 44eac86 > itests/src/test/resources/testconfiguration.properties 2348e06 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java > 773c827 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java a8a3d86 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java > PRE-CREATION > ql/src/test/results/clientpositive/spark/bucket_map_join_1.q.out f24ae73 > ql/src/test/results/clientpositive/spark/bucket_map_join_2.q.out 33e9e8b > ql/src/test/results/clientpositive/spark/bucketmapjoin1.q.out aaa0151 > ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out 9954b77 > ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out ad8f0a5 > ql/src/test/results/clientpositive/spark/bucketmapjoin12.q.out aa3e2b6 > ql/src/test/results/clientpositive/spark/bucketmapjoin13.q.out 44233f6 > ql/src/test/results/clientpositive/spark/bucketmapjoin2.q.out c4702ef > ql/src/test/results/clientpositive/spark/bucketmapjoin3.q.out 7c31e05 > ql/src/test/results/clientpositive/spark/bucketmapjoin4.q.out a8e892e > ql/src/test/results/clientpositive/spark/bucketmapjoin5.q.out 041ba12 > ql/src/test/results/clientpositive/spark/bucketmapjoin7.q.out 54c4be3 > ql/src/test/results/clientpositive/spark/bucketmapjoin8.q.out da9fe1c > ql/src/test/results/clientpositive/spark/bucketmapjoin9.q.out 5a5e3f6 > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative.q.out > 5ac3f4c > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative2.q.out > e4ff965 > ql/src/test/results/clientpositive/spark/bucketmapjoin_negative3.q.out > fce5566 > ql/src/test/results/clientpositive/spark/join25.q.out 284c97d > ql/src/test/results/clientpositive/spark/join26.q.out e271184 > ql/src/test/results/clientpositive/spark/join27.q.out d31f29e > ql/src/test/results/clientpositive/spark/join30.q.out 7fbbcfa > ql/src/test/results/clientpositive/spark/join36.q.out f1317ea > ql/src/test/results/clientpositive/spark/join37.q.out 448e983 > ql/src/test/results/clientpositive/spark/join38.q.out 735d7ea > ql/src/test/results/clientpositive/spark/join39.q.out 0734d4b > ql/src/test/results/clientpositive/spark/join40.q.out 60ef13d > ql/src/test/results/clientpositive/spark/join_map_ppr.q.out 59fdb99 > ql/src/test/results/clientpositive/spark/mapjoin1.q.out 80e38b9 > ql/src/test/results/clientpositive/spark/mapjoin_distinct.q.out dc7241c > ql/src/test/results/clientpositive/spark/mapjoin_filter_on_outerjoin.q.out > 3b80437 > ql/src/test/results/clientpositive/spark/mapjoin_test_outer.q.out fdf8f24 > ql/src/test/results/clientpositive/spark/semijoin.q.out 2b8e04b > ql/src/test/results/clientpositive/spark/skewjoin.q.out 56b78be > > Diff: https://reviews.apache.org/r/28889/diff/ > > > Testing > ------- > > bucket_map_join_1.q > bucket_map_join_2.q > bucketmapjoin1.q > bucketmapjoin10.q > bucketmapjoin11.q > bucketmapjoin12.q > bucketmapjoin13.q > bucketmapjoin2.q > bucketmapjoin3.q > bucketmapjoin4.q > bucketmapjoin5.q > bucketmapjoin7.q > bucketmapjoin8.q > bucketmapjoin9.q > bucketmapjoin_negative.q > bucketmapjoin_negative2.q > column_access_stats.q > join25.q > join26.q > join27.q > join30.q > join36.q > join37.q > join38.q > join39.q > join40.q > join_empty.q > join_filters_overlap.q > join_map_ppr.q > mapjoin1.q > mapjoin_distinct.q > mapjoin_filter_onerjoin.q > mapjoin_hook.q > mapjoin_tester.q > semijoin.q > skewjoin.q > table_access_keys_stats.q > > > Thanks, > > Chao Sun > >