> On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java,
> >  line 207
> > <https://reviews.apache.org/r/29111/diff/1/?file=793109#file793109line207>
> >
> >     should we remove this variable completely?

Yes, I'll remove it completely.


> On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java, 
> > line 93
> > <https://reviews.apache.org/r/29111/diff/1/?file=793110#file793110line93>
> >
> >     Original name seems more meaningful.

OK, will fix.


> On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java, line 98
> > <https://reviews.apache.org/r/29111/diff/1/?file=793112#file793112line98>
> >
> >     Should we keep it?

You're right - this was a mistake.


> On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java, 
> > line 212
> > <https://reviews.apache.org/r/29111/diff/1/?file=793108#file793108line212>
> >
> >     An assert here would be good.

OK, will add.


> On Dec. 17, 2014, midnight, Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java, 
> > line 133
> > <https://reviews.apache.org/r/29111/diff/1/?file=793108#file793108line133>
> >
> >     An assert here would be good.

OK, will add.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29111/#review65257
-----------------------------------------------------------


On Dec. 16, 2014, 7:02 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29111/
> -----------------------------------------------------------
> 
> (Updated Dec. 16, 2014, 7:02 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9041
>     https://issues.apache.org/jira/browse/HIVE-9041
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This JIRA removes UnionWork from Spark plan.
> 
> UnionWork right now is just a dummy work - in execution, it is translated to 
> IdentityTran, which does nothing.
> The actually union operation is implemented with rdd.union, which happens 
> when a BaseWork has multiple parent BaseWorks. For instance:
> 
>      MW_1    MW_2
>         \    /
>          \  /
>          RW_1
>          
> In this case, MW_1 and MW_2 both translates to RDD_1 and RDD_2, and then we 
> create another RDD_3 which is the
> result of rdd.union(RDD_1, RDD_2). We then create RDD_4 for RW_1, whose 
> parent is RDD_3.
> 
> *Changes on GenSparkWork*
> 
> To remove the UnionWork, most changes are in GenSparkWork. I got rid of a 
> chunk of code that creates UnionWork and link the work with parent works. 
> But, I still kept `currentUnionOperators` and `workWithUnionOperators`, since 
> they are needed for removing union operators later.
> 
> I also changed how `followingWork` is handled. This happens when we have the 
> following operator tree:
> 
>          TS_0      TS_1
>            \       /
>             \     /
>              UNION_2
>               /
>              RS_3
>             /
>            FS_4
>            
> (You can see that I ignored quite a few operators here. They are not required 
> to illustrate the problem)
> 
> In this plan, we will reach `RS_3` via two different paths: `TS_0` and `TS_1`.
> The first time we get to `RS_3`, say via `TS_0`, we would break `RS_3` with 
> its child, and create a work
> for the path `TS_0 -> UNION_2 -> RS_3`. Let's say the work is `MW_1`.
> 
> We then proceed to `FS_4`, create another ReduceWork `RW_2` for it, and link 
> `RW_2` with `MW_1`.
> We then will visit to `RS_3` for the second time, from `TS_1`, and create 
> another work for the path
> `TS_1 -> UNION_2 -> RS_3`, say `MW_3`.
> 
> But, the problem is that `RS_3` is already disconnected with `FS_4`. In order 
> to link `MW_3` with `RW_2`,
> we need to save that information somewhere.
> 
> This is why we need `leafOpToChildWorkInfo`. It is actually changed from 
> `leafOpToFollowingWork`.
> But, I found that we also need to have the edge property between `RS_3` and 
> its child saved, in order to connect.
> 
> I also encountered a case where two BaseWorks may be connected twice. I've 
> explained that in the comments for the source code.
> 
> *Changes on SparkPlanGenerator*
> 
> Without UnionWork, SparkPlanGenerator can be a bit cleaner. The changes on 
> this class are mostly refactoring.
> I got rid of some redundant code in `generate(SparkWork)` method, and 
> combined `generate(MapWork)` and `generate(ReduceWork)` into one.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 
> eb758e09888d7864acc9d88c7186ae2de48bc8f7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 438efabb062112da8fefc1bed9d8bd90ade26c67 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
>  78cbc6d2eebef5b8edc10fe693a1b580a6ee389c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
> ad6b09be83a33c0cd97ab9c3bc7d02adb928f1f3 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> 654ba333969cacaafddec38c3c3f45ccb4b81d4a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 
> 137df65d2bb2de20bca06e47b9e1386ddf511c68 
>   ql/src/test/results/clientpositive/spark/auto_join27.q.out 
> fb48351bea5df3a19c14c755eb3a3fbb7f503e61 
>   ql/src/test/results/clientpositive/spark/auto_sortmerge_join_10.q.out 
> 8472df914b8f5bdcb7974fc6689313d33975a4ad 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 
> 72b2bd7e9b48033ca9cb1bd96facad42f12b6450 
>   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 
> 1757d16a736741f90c5d84b7a0cc0c168cb7d3ad 
>   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
> 04f481d4a304fdc8a86e8cfd305899084bab2e8d 
>   ql/src/test/results/clientpositive/spark/join34.q.out 
> 9a58a228002a2b704541dfed1c713b3880e71f35 
>   ql/src/test/results/clientpositive/spark/join35.q.out 
> 851a98128dca74f0008c20faf717d3cc974150e0 
>   ql/src/test/results/clientpositive/spark/load_dyn_part13.q.out 
> 92693e69a08d1ab2ea0c019f2b7f0634316d1eaf 
>   ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out 
> 060745dcc80d69b5d17101c2641c228b949c2fb8 
>   ql/src/test/results/clientpositive/spark/multi_insert.q.out 
> 0a38beab815fb50fbb991d6228f48bb02b009998 
>   
> ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
>  639f4bd729587ce21b509bd8e3595107c0cf71bc 
>   ql/src/test/results/clientpositive/spark/multi_join_union.q.out 
> d8dc110c3562e5c1e925553df86fba8ceda55b4a 
>   ql/src/test/results/clientpositive/spark/skewjoin_union_remove_1.q.out 
> db92598ecaba27bd95c8134c1e887cb35e5049ae 
>   ql/src/test/results/clientpositive/spark/skewjoin_union_remove_2.q.out 
> bbe60a72833aae4530336565dfdfb187c86bd9b9 
>   ql/src/test/results/clientpositive/spark/skewjoinopt1.q.out 
> c3d550b09779fa1488fbd558ca7d74e1f54fe7e3 
>   ql/src/test/results/clientpositive/spark/skewjoinopt10.q.out 
> 4bb908ad72bc6a6fb1d3ef86a398bcf6b8191377 
>   ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out 
> bb1111a9f10d59621eb2347c25487ed5fd367506 
>   ql/src/test/results/clientpositive/spark/skewjoinopt12.q.out 
> 468139cfb9da1d80ac56dfdc841ea25efebd7662 
>   ql/src/test/results/clientpositive/spark/skewjoinopt14.q.out 
> b3dcb0abd61e9bad100d81e8734275484a86c768 
>   ql/src/test/results/clientpositive/spark/skewjoinopt15.q.out 
> e62e886bda9e15d593b11dc3a9818b53bf716ca4 
>   ql/src/test/results/clientpositive/spark/skewjoinopt16.q.out 
> 771b16ec1b44a386bbeca8013b95d6e2e56cb1e2 
>   ql/src/test/results/clientpositive/spark/skewjoinopt17.q.out 
> f59f1e48e9d6d410c0b227a142c8ea5e28b3a50e 
>   ql/src/test/results/clientpositive/spark/skewjoinopt19.q.out 
> bac15f62eec9a83f4fda38d4f2d836752971a22c 
>   ql/src/test/results/clientpositive/spark/skewjoinopt2.q.out 
> f6a47de50f86ee03eba2edd92d67ecda09ecaef2 
>   ql/src/test/results/clientpositive/spark/skewjoinopt20.q.out 
> 4150c23d09bf6942eafe2061acc537e41ceb4a79 
>   ql/src/test/results/clientpositive/spark/skewjoinopt3.q.out 
> 5640384a8f27932c8427dfb18af48351399d617c 
>   ql/src/test/results/clientpositive/spark/skewjoinopt4.q.out 
> 353e1b8e7d6fa7cdd797f4fec9dff6a68df74267 
>   ql/src/test/results/clientpositive/spark/skewjoinopt5.q.out 
> 4fe594c41bdf51aa2b844100518eb6ed20659cf5 
>   ql/src/test/results/clientpositive/spark/skewjoinopt6.q.out 
> fc1e81ca8c901320d09444ad4e6f41a9c1aa5d8f 
>   ql/src/test/results/clientpositive/spark/skewjoinopt7.q.out 
> f76ebbd214f190fb34ff532de827af1819e78769 
>   ql/src/test/results/clientpositive/spark/skewjoinopt8.q.out 
> 466d70e167334400401e345a6239eca25c0077db 
>   ql/src/test/results/clientpositive/spark/skewjoinopt9.q.out 
> bc0b1f74428d63c14738f9615f99771b9283abc3 
>   ql/src/test/results/clientpositive/spark/stats1.q.out 
> ba22d9a8732023c3e28b7498bd68f3632a5b7404 
>   ql/src/test/results/clientpositive/spark/temp_table.q.out 
> feb7711c59e6c226ac7c66c45858106b362c1f99 
>   ql/src/test/results/clientpositive/spark/union.q.out 
> 7d8d452824aeefbefec0797927084842281ac340 
>   ql/src/test/results/clientpositive/spark/union10.q.out 
> 40a43c6c61f2dea939e4e0966aff7f372074a9b8 
>   ql/src/test/results/clientpositive/spark/union11.q.out 
> 068f7dc28e178117f24606a5841a46fc874162f0 
>   ql/src/test/results/clientpositive/spark/union13.q.out 
> 27de88bec6401e591c1c6ef195dd4f522190268e 
>   ql/src/test/results/clientpositive/spark/union14.q.out 
> 47f4ac1b7ecaec156a76e7691b8ffff77b9252a7 
>   ql/src/test/results/clientpositive/spark/union15.q.out 
> 487fe534edbf0be13071f53837fe2bcceaf0295b 
>   ql/src/test/results/clientpositive/spark/union16.q.out 
> c35ed10915ec8439e9b955ba99b21985e6b2dd41 
>   ql/src/test/results/clientpositive/spark/union18.q.out 
> d2bcd704df3f5c12f53ee6d141ffa6417178d48e 
>   ql/src/test/results/clientpositive/spark/union19.q.out 
> 13fb395bc15a23ed06a9cfa24ab5893a00b4d7f3 
>   ql/src/test/results/clientpositive/spark/union2.q.out 
> da8d154bb21089b3d4ab7a671a3f5310fab167c3 
>   ql/src/test/results/clientpositive/spark/union23.q.out 
> 606153aa66129c4677c0e8226c5ae3e6999e6328 
>   ql/src/test/results/clientpositive/spark/union25.q.out 
> c439b1ae722b26c3e6659ad82840bc23d221e496 
>   ql/src/test/results/clientpositive/spark/union28.q.out 
> b478a779d0d76cff7a3cd6397e2c39a5efeb57a4 
>   ql/src/test/results/clientpositive/spark/union29.q.out 
> da224560cc65fcfdf0135381477a0c1a2017f471 
>   ql/src/test/results/clientpositive/spark/union3.q.out 
> 8240654b75ab52120744d14876855cf97a860cc0 
>   ql/src/test/results/clientpositive/spark/union30.q.out 
> c4eeb8d358246031b6699a766a8f604d0836feb3 
>   ql/src/test/results/clientpositive/spark/union33.q.out 
> b5a3f99645fcc542f69e95026412dde8e0e3979a 
>   ql/src/test/results/clientpositive/spark/union4.q.out 
> 255df7608a2b96633ba0d99af787f49b539a3147 
>   ql/src/test/results/clientpositive/spark/union5.q.out 
> 2d8a3c446709c24ebefd964d54e8d0ac8506ab5e 
>   ql/src/test/results/clientpositive/spark/union6.q.out 
> eb1c75140c43d6794a662fb7f573eebb19cfd6cc 
>   ql/src/test/results/clientpositive/spark/union7.q.out 
> 5fb37da43d6b90129c76d2c9cab467f5f5ba71f5 
>   ql/src/test/results/clientpositive/spark/union8.q.out 
> 4e5cec5c5a2a890c2ae98d2dbd106cb5968a39b6 
>   ql/src/test/results/clientpositive/spark/union9.q.out 
> db144777ef9fdbe9d696b196e6dd16ceb1b9201b 
>   ql/src/test/results/clientpositive/spark/union_ppr.q.out 
> b90795387f7747f95697adf785bdc4b71e5e84f4 
>   ql/src/test/results/clientpositive/spark/union_remove_1.q.out 
> be6e0e907bf5f3d3a3b7b52dcaa6a1a6fc4da1f6 
>   ql/src/test/results/clientpositive/spark/union_remove_10.q.out 
> 98cad44600f0b711a1644fdc5e54ed94d4c59a3e 
>   ql/src/test/results/clientpositive/spark/union_remove_11.q.out 
> 4accb54e696514749596917b3c3950b7a42c3a97 
>   ql/src/test/results/clientpositive/spark/union_remove_15.q.out 
> e7b2cd9ca67a8472d3f4045c3e2c41a691c12358 
>   ql/src/test/results/clientpositive/spark/union_remove_16.q.out 
> 88387284dd92b3c42a26340ce4ab93e0466d430e 
>   ql/src/test/results/clientpositive/spark/union_remove_17.q.out 
> 823cbafae8d629c4dba235c27b81fdc6bab0da32 
>   ql/src/test/results/clientpositive/spark/union_remove_18.q.out 
> 28f1b003c7c174814449a2f3d7c93797d793419d 
>   ql/src/test/results/clientpositive/spark/union_remove_19.q.out 
> 301aad4bb8731936ebf548bc8f9ff776bd217c16 
>   ql/src/test/results/clientpositive/spark/union_remove_2.q.out 
> 2b05b7be6e46835be703d6fec95b16795400948e 
>   ql/src/test/results/clientpositive/spark/union_remove_20.q.out 
> c67f47bd011883906366ffa8e743533345c07310 
>   ql/src/test/results/clientpositive/spark/union_remove_21.q.out 
> 6b119bad2bd236cdd4bcdb2ff6c3deac148c983e 
>   ql/src/test/results/clientpositive/spark/union_remove_24.q.out 
> 5ed88b4a9d36d4a15be0b3ec0931511753798820 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 
> 944508042007fb6541fc65c5840a40791054c217 
>   ql/src/test/results/clientpositive/spark/union_remove_3.q.out 
> 09b8636184595d665696ed0f6b129012e0a32589 
>   ql/src/test/results/clientpositive/spark/union_remove_4.q.out 
> 65d2aa14aae6877077d85d09580311fede1f97ce 
>   ql/src/test/results/clientpositive/spark/union_remove_5.q.out 
> 41271b2b0d70ec7c92bde69942ec1d66880525a7 
>   ql/src/test/results/clientpositive/spark/union_remove_6.q.out 
> d7cd40ba54714fd34297e3587961313815dc1f51 
>   ql/src/test/results/clientpositive/spark/union_remove_7.q.out 
> b3be93218fe4d874e4604c4c73b8980de8c0baa4 
>   ql/src/test/results/clientpositive/spark/union_remove_8.q.out 
> 8773535e628176706ecf91ecafb8da4d190c744a 
>   ql/src/test/results/clientpositive/spark/union_remove_9.q.out 
> 8dc6dd840b3cfb08e22aa9c362454b393443b209 
> 
> Diff: https://reviews.apache.org/r/29111/diff/
> 
> 
> Testing
> -------
> 
> I tested this patch with all tests under 
> ql/src/test/results/clientpositive/spark.
> Some outputs changed:
> 
> ql/src/test/results/clientpositive/spark/auto_join27.q.out
> ql/src/test/results/clientpositive/spark/auto_sortmerge_join_10.q.out
> ql/src/test/results/clientpositive/spark/column_access_stats.q.out
> ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out
> ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out
> ql/src/test/results/clientpositive/spark/join34.q.out
> ql/src/test/results/clientpositive/spark/join35.q.out
> ql/src/test/results/clientpositive/spark/load_dyn_part13.q.out
> ql/src/test/results/clientpositive/spark/load_dyn_part14.q.out
> ql/src/test/results/clientpositive/spark/multi_insert.q.out
> ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
> ql/src/test/results/clientpositive/spark/multi_join_union.q.out
> ql/src/test/results/clientpositive/spark/skewjoin_union_remove_1.q.out
> ql/src/test/results/clientpositive/spark/skewjoin_union_remove_2.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt1.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt10.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt11.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt12.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt14.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt15.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt16.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt17.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt19.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt2.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt20.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt3.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt4.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt5.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt6.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt7.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt8.q.out
> ql/src/test/results/clientpositive/spark/skewjoinopt9.q.out
> ql/src/test/results/clientpositive/spark/stats1.q.out
> ql/src/test/results/clientpositive/spark/temp_table.q.out
> ql/src/test/results/clientpositive/spark/union.q.out
> ql/src/test/results/clientpositive/spark/union10.q.out
> ql/src/test/results/clientpositive/spark/union11.q.out
> ql/src/test/results/clientpositive/spark/union13.q.out
> ql/src/test/results/clientpositive/spark/union14.q.out
> ql/src/test/results/clientpositive/spark/union15.q.out
> ql/src/test/results/clientpositive/spark/union16.q.out
> ql/src/test/results/clientpositive/spark/union18.q.out
> ql/src/test/results/clientpositive/spark/union19.q.out
> ql/src/test/results/clientpositive/spark/union2.q.out
> ql/src/test/results/clientpositive/spark/union23.q.out
> ql/src/test/results/clientpositive/spark/union25.q.out
> ql/src/test/results/clientpositive/spark/union28.q.out
> ql/src/test/results/clientpositive/spark/union29.q.out
> ql/src/test/results/clientpositive/spark/union3.q.out
> ql/src/test/results/clientpositive/spark/union30.q.out
> ql/src/test/results/clientpositive/spark/union33.q.out
> ql/src/test/results/clientpositive/spark/union4.q.out
> ql/src/test/results/clientpositive/spark/union5.q.out
> ql/src/test/results/clientpositive/spark/union6.q.out
> ql/src/test/results/clientpositive/spark/union7.q.out
> ql/src/test/results/clientpositive/spark/union8.q.out
> ql/src/test/results/clientpositive/spark/union9.q.out
> ql/src/test/results/clientpositive/spark/union_ppr.q.out
> ql/src/test/results/clientpositive/spark/union_remove_1.q.out
> ql/src/test/results/clientpositive/spark/union_remove_10.q.out
> ql/src/test/results/clientpositive/spark/union_remove_11.q.out
> ql/src/test/results/clientpositive/spark/union_remove_15.q.out
> ql/src/test/results/clientpositive/spark/union_remove_16.q.out
> ql/src/test/results/clientpositive/spark/union_remove_17.q.out
> ql/src/test/results/clientpositive/spark/union_remove_18.q.out
> ql/src/test/results/clientpositive/spark/union_remove_19.q.out
> ql/src/test/results/clientpositive/spark/union_remove_2.q.out
> ql/src/test/results/clientpositive/spark/union_remove_20.q.out
> ql/src/test/results/clientpositive/spark/union_remove_21.q.out
> ql/src/test/results/clientpositive/spark/union_remove_24.q.out
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out
> ql/src/test/results/clientpositive/spark/union_remove_3.q.out
> ql/src/test/results/clientpositive/spark/union_remove_4.q.out
> ql/src/test/results/clientpositive/spark/union_remove_5.q.out
> ql/src/test/results/clientpositive/spark/union_remove_6.q.out
> ql/src/test/results/clientpositive/spark/union_remove_7.q.out
> ql/src/test/results/clientpositive/spark/union_remove_8.q.out
> ql/src/test/results/clientpositive/spark/union_remove_9.q.out
> 
> 
> Thanks,
> 
> Chao Sun
> 
>

Reply via email to