Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun


 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q, line 1
  https://reviews.apache.org/r/26706/diff/4/?file=724864#file724864line1
 
  Could we put this test as spark only, as splitting doesn't apply mr or 
  tez? I think we have a dir for spark only tests.
 
 Chao Sun wrote:
 I also wanted to make this as a spark-only test.
 But the feature hasn't been implemented yet (I think Szehon is working on 
 it).
 I made the file name to start with spark_ so in future we can move it 
 to spark-only test directory.
 But currently, there's no test dir for spark, only result dir.
 
 Xuefu Zhang wrote:
 In the case, let rename the test to have a generic name. it's a valid 
 test case for MR also, but also a special case for Spark.

OK, thanks. I've updated the patch accordingly.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57286
---


On Oct. 19, 2014, 12:46 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 19, 2014, 12:46 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  c956101 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 3773dcb 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 399fe41 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 a43921e 
   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
   ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
   

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 20, 2014, 5:38 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Renamed the test to make it a generic one.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 c956101 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/multi_insert_split_work.q PRE-CREATION 
  ql/src/test/results/clientpositive/multi_insert_split_work.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multi_insert_split_work.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---

All multi-insertion related results are regenerated, and manually 

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57412
---



itests/src/test/resources/testconfiguration.properties
https://reviews.apache.org/r/26706/#comment98113

split in the name seems a little confusing. Could we call it 
multi_insert_mixed.q?



ql/src/test/queries/clientpositive/multi_insert_split_work.q
https://reviews.apache.org/r/26706/#comment98111

could we update the comments here? I guess the test case is special in that 
some inserts are map only while others invovles a shuffle.


- Xuefu Zhang


On Oct. 20, 2014, 5:38 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 20, 2014, 5:38 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  c956101 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 3773dcb 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/multi_insert_split_work.q PRE-CREATION 
   ql/src/test/results/clientpositive/multi_insert_split_work.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 399fe41 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 a43921e 
   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
   ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
   ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
   ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
   ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
 b07c582 
   
 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
  fd477ca 
   

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 20, 2014, 9:10 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Changed test file name and comments, also rebased to the latest update.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 c956101 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION 
  ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
b4ded62 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 62c179e 
  ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c 
  ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 
  ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a 
  ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
730fb4f 
  ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 1f31f56 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 4ded9d2 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 2b63321 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 16bfac1 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 05d719a 
  ql/src/test/results/clientpositive/spark/union18.q.out ce3e20c 
  ql/src/test/results/clientpositive/spark/union19.q.out ac28e36 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1836150 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 179edd1 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---

All multi-insertion related results are regenerated, 

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57445
---



itests/src/test/resources/testconfiguration.properties
https://reviews.apache.org/r/26706/#comment98143

We might need to change this as well.


- Xuefu Zhang


On Oct. 20, 2014, 9:10 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 20, 2014, 9:10 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  c956101 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 3773dcb 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION 
   ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 b4ded62 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 62c179e 
   ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c 
   ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 
   ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a 
   ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 
   ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 
   ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 
   ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
 730fb4f 
   ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
  1f31f56 
   ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 
 4ded9d2 
   ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 2b63321 
   ql/src/test/results/clientpositive/spark/ppd_transform.q.out 16bfac1 
   

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun


 On Oct. 20, 2014, 9:52 p.m., Xuefu Zhang wrote:
  itests/src/test/resources/testconfiguration.properties, line 509
  https://reviews.apache.org/r/26706/diff/7/?file=726397#file726397line509
 
  We might need to change this as well.

Can't believe I missed this. Sorry for the sloppyness!


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57445
---


On Oct. 20, 2014, 9:10 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 20, 2014, 9:10 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  c956101 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 3773dcb 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION 
   ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 b4ded62 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 62c179e 
   ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c 
   ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 
   ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a 
   ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 
   ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 
   ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 
   ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
 730fb4f 
   ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
  1f31f56 
   ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 
 4ded9d2 
   

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-20 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 20, 2014, 10:04 p.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 c956101 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/multi_insert_mixed.q PRE-CREATION 
  ql/src/test/results/clientpositive/multi_insert_mixed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 310f2fe 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out e6054c9 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out d0f3e76 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out d40c7bb 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
b4ded62 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out d2529bb 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 7fa6130 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 4a4070b 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 62c179e 
  ql/src/test/results/clientpositive/spark/input12.q.out a4b7a3c 
  ql/src/test/results/clientpositive/spark/input13.q.out 5c799dc 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1105ed8 
  ql/src/test/results/clientpositive/spark/input_part2.q.out 514f54a 
  ql/src/test/results/clientpositive/spark/insert1.q.out 1b88026 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5b2aa78 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out cbf7204 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 3905d84 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out 0404119 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 903e966 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
730fb4f 
  ql/src/test/results/clientpositive/spark/multi_insert_mixed.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 1f31f56 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 4ded9d2 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 2b63321 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 16bfac1 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 05d719a 
  ql/src/test/results/clientpositive/spark/union18.q.out ce3e20c 
  ql/src/test/results/clientpositive/spark/union19.q.out ac28e36 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1836150 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 179edd1 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---

All multi-insertion related results are regenerated, and manually checked 
against the old results.
Also I created a new test 

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-18 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57286
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/26706/#comment97861

Could you remove if this is not applicable?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java
https://reviews.apache.org/r/26706/#comment97862

Could we reuse this as a utility? I think we have same/similar thing 
somewhere.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97863

Let's keep the same style of import organization consistent with the rest 
of Hive. This is usually on the top, with an abc order.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97864

Can we rename this to generate... instead of get... to be more precise?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97866

Nit: it might be better if this line comes after the following if statement.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97867

If check here is unnecessary.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97857

Do we need to disconnect it or remove does this automatically?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97856

Nit: code style.



ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java
https://reviews.apache.org/r/26706/#comment97858

Some java doc would be helpful.



ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q
https://reviews.apache.org/r/26706/#comment97859

Could we put this test as spark only, as splitting doesn't apply mr or tez? 
I think we have a dir for spark only tests.



ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q
https://reviews.apache.org/r/26706/#comment97860

Nit: trailing space.


- Xuefu Zhang


On Oct. 17, 2014, 9:24 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 17, 2014, 9:24 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
   

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-18 Thread Chao Sun


 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java, line 64
  https://reviews.apache.org/r/26706/diff/4/?file=724853#file724853line64
 
  Could we reuse this as a utility? I think we have same/similar thing 
  somewhere.

You're right - HiveBaseFunctionResultList has the same method.
I've put it in the SparkUtilities.


 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java, 
  line 250
  https://reviews.apache.org/r/26706/diff/4/?file=724854#file724854line250
 
  Do we need to disconnect it or remove does this automatically?

Yes, remove also remove all edges connected to this node.


 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q, line 1
  https://reviews.apache.org/r/26706/diff/4/?file=724864#file724864line1
 
  Could we put this test as spark only, as splitting doesn't apply mr or 
  tez? I think we have a dir for spark only tests.

I also wanted to make this as a spark-only test.
But the feature hasn't been implemented yet (I think Szehon is working on it).
I made the file name to start with spark_ so in future we can move it to 
spark-only test directory.
But currently, there's no test dir for spark, only result dir.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57286
---


On Oct. 17, 2014, 9:24 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 17, 2014, 9:24 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 399fe41 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 a43921e 
   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
   

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-18 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 19, 2014, 12:46 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Updated the patch according to comments. Thanks Xuefu!


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
 c956101 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 3773dcb 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 
  ql/src/test/results/clientpositive/spark_multi_insert_split_work.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---

All multi-insertion related 

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-18 Thread Xuefu Zhang


 On Oct. 19, 2014, 12:15 a.m., Xuefu Zhang wrote:
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q, line 1
  https://reviews.apache.org/r/26706/diff/4/?file=724864#file724864line1
 
  Could we put this test as spark only, as splitting doesn't apply mr or 
  tez? I think we have a dir for spark only tests.
 
 Chao Sun wrote:
 I also wanted to make this as a spark-only test.
 But the feature hasn't been implemented yet (I think Szehon is working on 
 it).
 I made the file name to start with spark_ so in future we can move it 
 to spark-only test directory.
 But currently, there's no test dir for spark, only result dir.

In the case, let rename the test to have a generic name. it's a valid test case 
for MR also, but also a special case for Spark.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review57286
---


On Oct. 19, 2014, 12:46 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 19, 2014, 12:46 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 558dd02 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java
  c956101 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 
 3773dcb 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 399fe41 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 a43921e 
   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
   ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
   ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
   

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-17 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 17, 2014, 6:04 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Added a test to check that splitting work doesn't create duplicate FSs.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-17 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 17, 2014, 9:22 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Included a qfile result for MR mode.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 
  ql/src/test/results/clientpositive/spark_multi_insert_split_work.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-17 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 17, 2014, 9:24 p.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs
-

  itests/src/test/resources/testconfiguration.properties 558dd02 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/queries/clientpositive/spark_multi_insert_split_work.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/spark_multi_insert_split_work.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 
  ql/src/test/results/clientpositive/spark_multi_insert_split_work.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/26706/diff/


Testing (updated)
---

All multi-insertion related results are regenerated, and manually checked 
against the old results.
Also I created a new test spark_multi_insert_spill_work.q to check splitting 
won't generate duplicate FSs.


Thanks,

Chao Sun



Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-15 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/
---

(Updated Oct. 16, 2014, 1:25 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Addressing the comments. Also, I'm thinking about adding another test for 
multi-insert in another JIRA, specifically check if the plan after splitting is 
in the correct shape.


Bugs: HIVE-8436
https://issues.apache.org/jira/browse/HIVE-8436


Repository: hive-git


Description
---

Based on the design doc, we need to split the operator tree of a work in 
SparkWork if the work is connected to multiple child works. The way splitting 
the operator tree is performed by cloning the original work and removing 
unwanted branches in the operator tree. Please refer to the design doc for 
details.
This process should be done right before we generate SparkPlan. We should have 
a utility method that takes the orignal SparkWork and return a modified 
SparkWork.
This process should also keep the information about the original work and its 
clones. Such information will be needed during SparkPlan generation (HIVE-8437).


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
5153885 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
126cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
d7744e9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 280edde 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 644c681 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
1d01040 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 93940bc 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
20eb344 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
a62643a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out 2d99a81 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out ca73985 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out 2d2c55b 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out 942cdaa 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
399fe41 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out 5e68807 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4259412 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out e0e882e 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out a43921e 
  ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
  ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out 1f3b484 
  ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
  ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out 5318a8b 
  ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out bae325f 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 280a893 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
b07c582 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 fd477ca 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 44991e3 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 96f2c06 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out 7ec5d8d 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 2b4a331 
  ql/src/test/results/clientpositive/spark/union18.q.out f94fa0b 
  ql/src/test/results/clientpositive/spark/union19.q.out 8dcb543 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 6730010 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 909378b 

Diff: https://reviews.apache.org/r/26706/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-14 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review56640
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/26706/#comment97030

Can we try a generic method so that we only have one method doing cloning 
for both?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java
https://reviews.apache.org/r/26706/#comment97031

I think input param can be just BytesWritable.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97033

I think we should use add() instead.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97035

The design doc explicitly specifies that the first clone is handled 
differently than the rest, but I didn't see such handling here. We may have 
problem with this implementation.



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java
https://reviews.apache.org/r/26706/#comment97036

Let's not to use * in imports.


- Xuefu Zhang


On Oct. 14, 2014, 9:17 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 14, 2014, 9:17 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 95d7b59 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out b425c67 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out dc713b3 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out cd8e85e 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 801ac8a 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out b04e55c 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4bde6ea 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out ab2fe84 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 5c1cbc4 
   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 90bc8ea 
   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
   ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7964802 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
   ql/src/test/results/clientpositive/spark/multi_insert.q.out 31ebbeb 
   ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 0a983d8 
   ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
 

Re: Review Request 26706: HIVE-8436 - Modify SparkWork to split works with multiple child works [Spark Branch]

2014-10-14 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26706/#review56647
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/26706/#comment97038

Please throw IllegalStateException, prefix with AssertionError  and 
append work.getClass().getName() to this message


- Brock Noland


On Oct. 14, 2014, 9:17 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26706/
 ---
 
 (Updated Oct. 14, 2014, 9:17 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8436
 https://issues.apache.org/jira/browse/HIVE-8436
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Based on the design doc, we need to split the operator tree of a work in 
 SparkWork if the work is connected to multiple child works. The way splitting 
 the operator tree is performed by cloning the original work and removing 
 unwanted branches in the operator tree. Please refer to the design doc for 
 details.
 This process should be done right before we generate SparkPlan. We should 
 have a utility method that takes the orignal SparkWork and return a modified 
 SparkWork.
 This process should also keep the information about the original work and its 
 clones. Such information will be needed during SparkPlan generation 
 (HIVE-8437).
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7d9feac 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 5153885 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 3fd37a0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 126cb9f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 d7744e9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 280edde 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ac94ea0 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 644c681 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  1d01040 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  93940bc 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  20eb344 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  a62643a 
   ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java 05be1f1 
   ql/src/test/results/clientpositive/spark/groupby7_map.q.out 95d7b59 
   ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out b425c67 
   ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out dc713b3 
   ql/src/test/results/clientpositive/spark/groupby_cube1.q.out cd8e85e 
   ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
 801ac8a 
   ql/src/test/results/clientpositive/spark/groupby_position.q.out b04e55c 
   ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out 4bde6ea 
   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out ab2fe84 
   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
 5c1cbc4 
   ql/src/test/results/clientpositive/spark/input12.q.out 4b0cf44 
   ql/src/test/results/clientpositive/spark/input13.q.out 260a65a 
   ql/src/test/results/clientpositive/spark/input1_limit.q.out 90bc8ea 
   ql/src/test/results/clientpositive/spark/input_part2.q.out f2f3a2d 
   ql/src/test/results/clientpositive/spark/insert1.q.out 65032cb 
   ql/src/test/results/clientpositive/spark/insert_into3.q.out 7964802 
   ql/src/test/results/clientpositive/spark/load_dyn_part1.q.out 3b669fc 
   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 50c052d 
   ql/src/test/results/clientpositive/spark/multi_insert.q.out 31ebbeb 
   ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out 0a983d8 
   ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
 68b1312 
   
 ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
  f7867ac 
   ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 
 dbb78a6 
   ql/src/test/results/clientpositive/spark/orc_analyze.q.out a0af7ba 
   ql/src/test/results/clientpositive/spark/parallel.q.out acd418f 
   ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out 169d2f1 
   ql/src/test/results/clientpositive/spark/ppd_transform.q.out 54b8a8a 
   ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 6f8066d 
   ql/src/test/results/clientpositive/spark/union18.q.out 07ea2c5 
   ql/src/test/results/clientpositive/spark/union19.q.out 2fefe8e