Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote: ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q, line 1 https://reviews.apache.org/r/26706/diff/4/?file=724864#file724864line1 Could we put this test as spark only, as splitting doesn't apply mr or tez? I think we have a dir for spark only tests. Chao Sun wrote: I also wanted to make this as a spark-only test. But the feature hasn't been implemented yet (I think Szehon is working on it). I made the file name to start with spark_ so in future we can move it to spark-only test directory. But currently, there's no test dir for spark, only result dir. Xuefu Zhang wrote: In the case, let rename the test to have a generic name. it's a valid test case for MR also, but also a special case for Spark. OK, thanks. I've updated the patch accordingly. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review57286 --- On Oct. 19, 2014, 12:46 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 19, 2014, 12:46 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 20, 2014, 5:38 p.m.) Review request for hive and Xuefu Zhang. Changes --- Renamed the test to make it a generic one. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs (updated) - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/multi_insert_split_work.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out b07c582 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out fd477ca ql/src/test/results/clientpositive/spark/multi_insert_split_work.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b Diff: https://reviews.apache.org/r/26706/diff/ Testing --- All multi-insertion related results are regenerated, and manually
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review57412 --- itests/src/test/resources/testconfiguration.properties https://reviews.apache.org/r/26706/#comment98113 split in the name seems a little confusing. Could we call it multi_insert_mixed.q? ql/src/test/queries/clientpositive/multi_insert_split_work.q https://reviews.apache.org/r/26706/#comment98111 could we update the comments here? I guess the test case is special in that some inserts are map only while others invovles a shuffle. - Xuefu Zhang On Oct. 20, 2014, 5:38 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 20, 2014, 5:38 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/multi_insert_split_work.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out b07c582 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out fd477ca
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 20, 2014, 9:10 p.m.) Review request for hive and Xuefu Zhang. Changes --- Changed test file name and comments, also rebased to the latest update. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs (updated) - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out b4ded62 ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 62c179e ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 730fb4f ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out 1f31f56 ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 4ded9d2 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 2b63321 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 16bfac1 ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 05d719a ql/src/test/results/clientpositive/spark/union18.q.out ce3e20c ql/src/test/results/clientpositive/spark/union19.q.out ac28e36 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1836150 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 179edd1 Diff: https://reviews.apache.org/r/26706/diff/ Testing --- All multi-insertion related results are regenerated,
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review57445 --- itests/src/test/resources/testconfiguration.properties https://reviews.apache.org/r/26706/#comment98143 We might need to change this as well. - Xuefu Zhang On Oct. 20, 2014, 9:10 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 20, 2014, 9:10 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out b4ded62 ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 62c179e ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 730fb4f ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out 1f31f56 ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 4ded9d2 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 2b63321 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 16bfac1
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
On Oct. 20, 2014, 9:52 p.m., Xuefu Zhang wrote: itests/src/test/resources/testconfiguration.properties, line 509 https://reviews.apache.org/r/26706/diff/7/?file=726397#file726397line509 We might need to change this as well. Can't believe I missed this. Sorry for the sloppyness! - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review57445 --- On Oct. 20, 2014, 9:10 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 20, 2014, 9:10 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out b4ded62 ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 62c179e ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 730fb4f ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out 1f31f56 ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 4ded9d2
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 20, 2014, 10:04 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs (updated) - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out b4ded62 ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 62c179e ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 730fb4f ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out 1f31f56 ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 4ded9d2 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 2b63321 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 16bfac1 ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 05d719a ql/src/test/results/clientpositive/spark/union18.q.out ce3e20c ql/src/test/results/clientpositive/spark/union19.q.out ac28e36 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1836150 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 179edd1 Diff: https://reviews.apache.org/r/26706/diff/ Testing --- All multi-insertion related results are regenerated, and manually checked against the old results. Also I created a new test
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review57286 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/26706/#comment97861 Could you remove if this is not applicable? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java https://reviews.apache.org/r/26706/#comment97862 Could we reuse this as a utility? I think we have same/similar thing somewhere. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97863 Let's keep the same style of import organization consistent with the rest of Hive. This is usually on the top, with an abc order. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97864 Can we rename this to generate... instead of get... to be more precise? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97866 Nit: it might be better if this line comes after the following if statement. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97867 If check here is unnecessary. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97857 Do we need to disconnect it or remove does this automatically? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97856 Nit: code style. ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java https://reviews.apache.org/r/26706/#comment97858 Some java doc would be helpful. ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q https://reviews.apache.org/r/26706/#comment97859 Could we put this test as spark only, as splitting doesn't apply mr or tez? I think we have a dir for spark only tests. ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q https://reviews.apache.org/r/26706/#comment97860 Nit: trailing space. - Xuefu Zhang On Oct. 17, 2014, 9:24 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 17, 2014, 9:24 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java, line 64 https://reviews.apache.org/r/26706/diff/4/?file=724853#file724853line64 Could we reuse this as a utility? I think we have same/similar thing somewhere. You're right - HiveBaseFunctionResultList has the same method. I've put it in the SparkUtilities. On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java, line 250 https://reviews.apache.org/r/26706/diff/4/?file=724854#file724854line250 Do we need to disconnect it or remove does this automatically? Yes, remove also remove all edges connected to this node. On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote: ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q, line 1 https://reviews.apache.org/r/26706/diff/4/?file=724864#file724864line1 Could we put this test as spark only, as splitting doesn't apply mr or tez? I think we have a dir for spark only tests. I also wanted to make this as a spark-only test. But the feature hasn't been implemented yet (I think Szehon is working on it). I made the file name to start with spark_ so in future we can move it to spark-only test directory. But currently, there's no test dir for spark, only result dir. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review57286 --- On Oct. 17, 2014, 9:24 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 17, 2014, 9:24 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 19, 2014, 12:46 a.m.) Review request for hive and Xuefu Zhang. Changes --- Updated the patch according to comments. Thanks Xuefu! Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs (updated) - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out b07c582 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out fd477ca ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b ql/src/test/results/clientpositive/spark_multi_insert_split_work.q.out PRE-CREATION Diff: https://reviews.apache.org/r/26706/diff/ Testing --- All multi-insertion related
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote: ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q, line 1 https://reviews.apache.org/r/26706/diff/4/?file=724864#file724864line1 Could we put this test as spark only, as splitting doesn't apply mr or tez? I think we have a dir for spark only tests. Chao Sun wrote: I also wanted to make this as a spark-only test. But the feature hasn't been implemented yet (I think Szehon is working on it). I made the file name to start with spark_ so in future we can move it to spark-only test directory. But currently, there's no test dir for spark, only result dir. In the case, let rename the test to have a generic name. it's a valid test case for MR also, but also a special case for Spark. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review57286 --- On Oct. 19, 2014, 12:46 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 19, 2014, 12:46 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java c956101 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 17, 2014, 6:04 p.m.) Review request for hive and Xuefu Zhang. Changes --- Added a test to check that splitting work doesn't create duplicate FSs. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out b07c582 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out fd477ca ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b Diff: https://reviews.apache.org/r/26706/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 17, 2014, 9:22 p.m.) Review request for hive and Xuefu Zhang. Changes --- Included a qfile result for MR mode. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs (updated) - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out b07c582 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out fd477ca ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b ql/src/test/results/clientpositive/spark_multi_insert_split_work.q.out PRE-CREATION Diff: https://reviews.apache.org/r/26706/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 17, 2014, 9:24 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - itests/src/test/resources/testconfiguration.properties 558dd02 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out b07c582 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out fd477ca ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b ql/src/test/results/clientpositive/spark_multi_insert_split_work.q.out PRE-CREATION Diff: https://reviews.apache.org/r/26706/diff/ Testing (updated) --- All multi-insertion related results are regenerated, and manually checked against the old results. Also I created a new test spark_multi_insert_spill_work.q to check splitting won't generate duplicate FSs. Thanks, Chao Sun
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 16, 2014, 1:25 a.m.) Review request for hive and Xuefu Zhang. Changes --- Addressing the comments. Also, I'm thinking about adding another test for multi-insert in another JIRA, specifically check if the plan after splitting is in the correct shape. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 399fe41 ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out b07c582 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out fd477ca ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b Diff: https://reviews.apache.org/r/26706/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review56640 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/26706/#comment97030 Can we try a generic method so that we only have one method doing cloning for both? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java https://reviews.apache.org/r/26706/#comment97031 I think input param can be just BytesWritable. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97033 I think we should use add() instead. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97035 The design doc explicitly specifies that the first clone is handled differently than the rest, but I didn't see such handling here. We may have problem with this implementation. ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java https://reviews.apache.org/r/26706/#comment97036 Let's not to use * in imports. - Xuefu Zhang On Oct. 14, 2014, 9:17 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 14, 2014, 9:17 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/results/clientpositive/spark/groupby7_map.q.out 95d7b59 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out b425c67 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out dc713b3 ql/src/test/results/clientpositive/spark/groupby_cube1.q.out cd8e85e ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 801ac8a ql/src/test/results/clientpositive/spark/groupby_position.q.out b04e55c ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4bde6ea ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out ab2fe84 ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 5c1cbc4 ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 90bc8ea ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 7964802 ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out 31ebbeb ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 0a983d8 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out
Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/#review56647 --- ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java https://reviews.apache.org/r/26706/#comment97038 Please throw IllegalStateException, prefix with AssertionError and append work.getClass().getName() to this message - Brock Noland On Oct. 14, 2014, 9:17 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26706/ --- (Updated Oct. 14, 2014, 9:17 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8436 https://issues.apache.org/jira/browse/HIVE-8436 Repository: hive-git Description --- Based on the design doc, we need to split the operator tree of a work in SparkWork if the work is connected to multiple child works. The way splitting the operator tree is performed by cloning the original work and removing unwanted branches in the operator tree. Please refer to the design doc for details. This process should be done right before we generate SparkPlan. We should have a utility method that takes the orignal SparkWork and return a modified SparkWork. This process should also keep the information about the original work and its clones. Such information will be needed during SparkPlan generation (HIVE-8437). Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 5153885 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 126cb9f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java d7744e9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 1d01040 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java 93940bc ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 20eb344 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java a62643a ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 ql/src/test/results/clientpositive/spark/groupby7_map.q.out 95d7b59 ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out b425c67 ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out dc713b3 ql/src/test/results/clientpositive/spark/groupby_cube1.q.out cd8e85e ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 801ac8a ql/src/test/results/clientpositive/spark/groupby_position.q.out b04e55c ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4bde6ea ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out ab2fe84 ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 5c1cbc4 ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 ql/src/test/results/clientpositive/spark/input13.q.out 260a65a ql/src/test/results/clientpositive/spark/input1_limit.q.out 90bc8ea ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb ql/src/test/results/clientpositive/spark/insert_into3.q.out 7964802 ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d ql/src/test/results/clientpositive/spark/multi_insert.q.out 31ebbeb ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 0a983d8 ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 68b1312 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out f7867ac ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out dbb78a6 ql/src/test/results/clientpositive/spark/orc_analyze.q.out a0af7ba ql/src/test/results/clientpositive/spark/parallel.q.out acd418f ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 169d2f1 ql/src/test/results/clientpositive/spark/ppd_transform.q.out 54b8a8a ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 6f8066d ql/src/test/results/clientpositive/spark/union18.q.out 07ea2c5 ql/src/test/results/clientpositive/spark/union19.q.out 2fefe8e