Review Request 26296: HIVE-8331 - HIVE-8303 followup, investigate result diff [Spark Branch]

2014-10-02 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26296/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8331
https://issues.apache.org/jira/browse/HIVE-8331


Repository: hive-git


Description
---

HIVE-8303 patch introduced some result diffs for some spark tests. We need to 
investigate those, including parallel_join0.q, union22.q, 
vectorized_shufflejoin.q, union_remove_18.q, and maybe more.
Also the investigation includes the test failures related to spark. 
Specifically, union_remove_18.q demonstrated random order.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java fed6ccd 
  ql/src/test/results/clientpositive/spark/union_remove_18.q.out 60ab60b 

Diff: https://reviews.apache.org/r/26296/diff/


Testing
---


Thanks,

Chao Sun



Review Request 26181: HIVE-8262 - Create CacheTran that transforms the input RDD by caching it [Spark Branch]

2014-09-30 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26181/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8262
https://issues.apache.org/jira/browse/HIVE-8262


Repository: hive-git


Description
---

In a few cases we need to cache a RDD to avoid recompute it for better 
performance. However, caching a map input RDD is different from caching a 
regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the 
input to MapWork, is to cache, the result RDD that is transformed from the 
original Hadoop RDD by applying a map function, in which key, value pairs are 
copied. To cache intermediate RDDs, such as that from a shuffle, is just 
calling .cache().
This task is to create a CacheTran to capture this, which can be used to plug 
in Spark Plan when caching is desirable. 


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CachedTran.java PRE-CREATION 

Diff: https://reviews.apache.org/r/26181/diff/


Testing
---


Thanks,

Chao Sun



Review Request 26211: HIVE-8314 - Restore thrift string interning of HIVE-7975

2014-09-30 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26211/
---

Review request for hive and Szehon Ho.


Bugs: HIVE-8314
https://issues.apache.org/jira/browse/HIVE-8314


Repository: hive-git


Description
---

In HIVE-7975 did string interning in thrift-generated code by having a 
google-replacer plugin run with -Pthriftif that does the replacements.
In commit of HIVE-7482, it was removed as it was done without that plugin. 
Thrift code should be regenerated.


Diffs
-

  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FieldSchema.java
 a993810 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java
 312807e 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java
 24d65bb 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java
 d0b9843 

Diff: https://reviews.apache.org/r/26211/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 26120: HIVE-8278 - Restoring a graph representation of SparkPlan [Spark Branch]

2014-09-29 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26120/
---

(Updated Sept. 29, 2014, 8:16 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Updated according to Xuefu's suggestions.


Bugs: HIVE-8278
https://issues.apache.org/jira/browse/HIVE-8278


Repository: hive-git


Description
---

HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator 
logic. As a side effect, however, a visual representation of SparkPlan got 
lost. Such representation is helpful for debugging and performance profiling. 
In addition, it would be also good to separate plan generation and plan 
execution.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java f8b3283 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
15af0f9 

Diff: https://reviews.apache.org/r/26120/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 26120: HIVE-8278 - Restoring a graph representation of SparkPlan [Spark Branch]

2014-09-29 Thread Chao Sun


 On Sept. 29, 2014, 11:34 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java, line 34
  https://reviews.apache.org/r/26120/diff/2/?file=708344#file708344line34
 
  Do we need rootTrans?

I thought rootTrans is just the same as the keyset of mapInputs, so I removed 
it. Correct me if I'm wrong.


 On Sept. 29, 2014, 11:34 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java, line 72
  https://reviews.apache.org/r/26120/diff/2/?file=708344#file708344line72
 
  So we are not do union of those any more?

I forgot ... now fixed.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26120/#review54904
---


On Sept. 29, 2014, 8:16 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26120/
 ---
 
 (Updated Sept. 29, 2014, 8:16 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8278
 https://issues.apache.org/jira/browse/HIVE-8278
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator 
 logic. As a side effect, however, a visual representation of SparkPlan got 
 lost. Such representation is helpful for debugging and performance profiling. 
 In addition, it would be also good to separate plan generation and plan 
 execution.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java f8b3283 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 15af0f9 
 
 Diff: https://reviews.apache.org/r/26120/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 26120: HIVE-8278 - Restoring a graph representation of SparkPlan [Spark Branch]

2014-09-29 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26120/
---

(Updated Sept. 30, 2014, 12:06 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Thanks Xuefu for the comments. Now a updated patch.


Bugs: HIVE-8278
https://issues.apache.org/jira/browse/HIVE-8278


Repository: hive-git


Description
---

HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator 
logic. As a side effect, however, a visual representation of SparkPlan got 
lost. Such representation is helpful for debugging and performance profiling. 
In addition, it would be also good to separate plan generation and plan 
execution.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java f8b3283 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
15af0f9 

Diff: https://reviews.apache.org/r/26120/diff/


Testing
---


Thanks,

Chao Sun



Review Request 26120: HIVE-8278 - Restoring a graph representation of SparkPlan [Spark Branch]

2014-09-27 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26120/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8278
https://issues.apache.org/jira/browse/HIVE-8278


Repository: hive-git


Description
---

HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator 
logic. As a side effect, however, a visual representation of SparkPlan got 
lost. Such representation is helpful for debugging and performance profiling. 
In addition, it would be also good to separate plan generation and plan 
execution.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java f8b3283 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
15af0f9 

Diff: https://reviews.apache.org/r/26120/diff/


Testing
---


Thanks,

Chao Sun



Review Request 26047: HIVE-8256 - Add SORT_QUERY_RESULTS for test that doesn't guarantee order #2

2014-09-25 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26047/
---

Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-8256
https://issues.apache.org/jira/browse/HIVE-8256


Repository: hive-git


Description
---

Following HIVE-8035, we need to further add SORT_QUERY_RESULTS to a few more 
tests that doesn't guarantee output order.


Diffs
-

  ql/src/test/queries/clientpositive/groupby7.q 1235e3c 
  ql/src/test/queries/clientpositive/groupby_complex_types.q bb1e6d2 
  ql/src/test/queries/clientpositive/table_access_keys_stats.q 23209d8 
  ql/src/test/results/clientpositive/groupby7.q.out ee0153a 
  ql/src/test/results/clientpositive/groupby_complex_types.q.out 1697dd9 
  ql/src/test/results/clientpositive/table_access_keys_stats.q.out adea0f6 

Diff: https://reviews.apache.org/r/26047/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 26001: HIVE-8233 - multi-table insertion doesn't work with ForwardOperator [Spark Branch]

2014-09-25 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26001/
---

(Updated Sept. 25, 2014, 6:42 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

I forgot that I need to change groupby11.q as well.


Bugs: hive-8233
https://issues.apache.org/jira/browse/hive-8233


Repository: hive-git


Description
---

Right now, for multi-table insertion, we will start from multiple 
FileSinkOperators, and break from their lowest common ancestor, adding 
temporary FileSinkOperator and TableScanOperators. A special case is when the 
LCA is a ForwardOperator, in which case we don't break it, since it's already 
been optimized.
However, there's a issue, considering the following plan:
  ...
  RS_0
   |
  FOR
   |
 /   \
   GBY_1  GBY_2
| |
   ...   ...
| |
   RS_1  RS_2
| |
   ...   ...
| |
   FS_1  FS_2
which may result to:
  RW
 /  \
   RWRW
Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream 
branches will get duplicated (and same) input.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties cd83998 
  ql/src/test/queries/clientpositive/spark_groupby11.q PRE-CREATION 
  
ql/src/test/queries/clientpositive/spark_groupby7_noskew_multi_single_reducer.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_map.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_map_skew.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_noskew.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby9.q PRE-CREATION 
  
ql/src/test/queries/clientpositive/spark_groupby_multi_insert_common_distinct.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_union17.q PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby11.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_groupby7_noskew_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_map.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_map_skew.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_noskew.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby9.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_groupby_multi_insert_common_distinct.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_union17.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/26001/diff/


Testing
---


Thanks,

Chao Sun



Review Request 26001: HIVE-8233 - multi-table insertion doesn't work with ForwardOperator [Spark Branch]

2014-09-24 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26001/
---

Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: hive-8233
https://issues.apache.org/jira/browse/hive-8233


Repository: hive-git


Description
---

Right now, for multi-table insertion, we will start from multiple 
FileSinkOperators, and break from their lowest common ancestor, adding 
temporary FileSinkOperator and TableScanOperators. A special case is when the 
LCA is a ForwardOperator, in which case we don't break it, since it's already 
been optimized.
However, there's a issue, considering the following plan:
  ...
  RS_0
   |
  FOR
   |
 /   \
   GBY_1  GBY_2
| |
   ...   ...
| |
   RS_1  RS_2
| |
   ...   ...
| |
   FS_1  FS_2
which may result to:
  RW
 /  \
   RWRW
Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream 
branches will get duplicated (and same) input.


Diffs
-

  itests/src/test/resources/testconfiguration.properties 637fbc1 
  
ql/src/test/queries/clientpositive/spark_groupby7_noskew_multi_single_reducer.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_map.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_map_skew.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_noskew.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby9.q PRE-CREATION 
  
ql/src/test/queries/clientpositive/spark_groupby_multi_insert_common_distinct.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_union17.q PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_groupby7_noskew_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_map.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_map_skew.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_noskew.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby9.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_groupby_multi_insert_common_distinct.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_union17.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/26001/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 26001: HIVE-8233 - multi-table insertion doesn't work with ForwardOperator [Spark Branch]

2014-09-24 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26001/
---

(Updated Sept. 24, 2014, 9:04 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Made these qfiles as spark-only tests.


Bugs: hive-8233
https://issues.apache.org/jira/browse/hive-8233


Repository: hive-git


Description
---

Right now, for multi-table insertion, we will start from multiple 
FileSinkOperators, and break from their lowest common ancestor, adding 
temporary FileSinkOperator and TableScanOperators. A special case is when the 
LCA is a ForwardOperator, in which case we don't break it, since it's already 
been optimized.
However, there's a issue, considering the following plan:
  ...
  RS_0
   |
  FOR
   |
 /   \
   GBY_1  GBY_2
| |
   ...   ...
| |
   RS_1  RS_2
| |
   ...   ...
| |
   FS_1  FS_2
which may result to:
  RW
 /  \
   RWRW
Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream 
branches will get duplicated (and same) input.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 637fbc1 
  
ql/src/test/queries/clientpositive/spark_groupby7_noskew_multi_single_reducer.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_map.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_map_skew.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby8_noskew.q PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_groupby9.q PRE-CREATION 
  
ql/src/test/queries/clientpositive/spark_groupby_multi_insert_common_distinct.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/spark_union17.q PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_groupby7_noskew_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_map.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_map_skew.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby8_noskew.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_groupby9.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/spark_groupby_multi_insert_common_distinct.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/spark_union17.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/26001/diff/


Testing
---


Thanks,

Chao Sun



Review Request 26007: HIVE-8249 - Refactoring SparkPlan and SparkPlanGenerator [Spark Branch]

2014-09-24 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26007/
---

Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: hive-8249
https://issues.apache.org/jira/browse/hive-8249


Repository: hive-git


Description
---

Currently, the code for SparkPlanGenerator seems a little bit messy, and the 
logic is not quite clear. This JIRA is created to refactor this and related 
classes.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java acd42be 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 46e4b6d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
7ab2ca0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 546b448 

Diff: https://reviews.apache.org/r/26007/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 26007: HIVE-8249 - Refactoring SparkPlan and SparkPlanGenerator [Spark Branch]

2014-09-24 Thread Chao Sun


 On Sept. 24, 2014, 11:51 p.m., Xuefu Zhang wrote:
 

OK, I'll change it in the next patch, once the test result is out.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26007/#review54481
---


On Sept. 24, 2014, 10:50 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26007/
 ---
 
 (Updated Sept. 24, 2014, 10:50 p.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: hive-8249
 https://issues.apache.org/jira/browse/hive-8249
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently, the code for SparkPlanGenerator seems a little bit messy, and the 
 logic is not quite clear. This JIRA is created to refactor this and related 
 classes.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java acd42be 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 46e4b6d 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 7ab2ca0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 546b448 
 
 Diff: https://reviews.apache.org/r/26007/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 26007: HIVE-8249 - Refactoring SparkPlan and SparkPlanGenerator [Spark Branch]

2014-09-24 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26007/
---

(Updated Sept. 25, 2014, 1:02 a.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Changed UnionTran to IdentityTran. Thanks Xuefu for the suggestion!


Bugs: hive-8249
https://issues.apache.org/jira/browse/hive-8249


Repository: hive-git


Description
---

Currently, the code for SparkPlanGenerator seems a little bit messy, and the 
logic is not quite clear. This JIRA is created to refactor this and related 
classes.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java acd42be 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 46e4b6d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
7ab2ca0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 546b448 

Diff: https://reviews.apache.org/r/26007/diff/


Testing
---


Thanks,

Chao Sun



Review Request 25943: HIVE-8207 - Add .q tests for multi-table insertion [Spark Branch]

2014-09-23 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25943/
---

Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-8207
https://issues.apache.org/jira/browse/HIVE-8207


Repository: hive-git


Description
---

Now that multi-table insertion is committed to branch, we should enable those 
related qtests.


Diffs
-

  ql/src/test/queries/clientpositive/groupby10.q 7750cb9 
  ql/src/test/queries/clientpositive/groupby11.q 0bf92ac 
  ql/src/test/queries/clientpositive/groupby7.q 1235e3c 
  ql/src/test/queries/clientpositive/groupby_complex_types.q bb1e6d2 
  ql/src/test/queries/clientpositive/subquery_multiinsert.q ed36d9e 
  ql/src/test/queries/clientpositive/table_access_keys_stats.q 23209d8 
  ql/src/test/results/clientpositive/spark/add_part_multiple.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/column_access_stats.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/date_udf.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_map.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_map_multi_distinct.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_noskew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_noskew_multi_distinct.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby7_map_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer3.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_ppr.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/innerjoin.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input12.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input13.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input14.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input17.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input18.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input_part2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/join_nullsafe.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/metadata_only_queries_with_filters.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_gby.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_gby2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/parallel.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/table_access_keys_stats.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25943/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25943: HIVE-8207 - Add .q tests for multi-table insertion [Spark Branch]

2014-09-23 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25943/
---

(Updated Sept. 24, 2014, 1:15 a.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Fixing the last patch - sorry, I shouldn't modify qfiles in this JIRA, since it 
affects results from MR/Tez (Thanks for Xuefu for pointing out). Also, I didn't 
change testconfiguration.properties in the last patch.


Bugs: HIVE-8207
https://issues.apache.org/jira/browse/HIVE-8207


Repository: hive-git


Description
---

Now that multi-table insertion is committed to branch, we should enable those 
related qtests.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties aa04c0a 
  ql/src/test/results/clientpositive/spark/add_part_multiple.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/column_access_stats.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/date_udf.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_map.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_map_multi_distinct.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_noskew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_noskew_multi_distinct.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby7_map_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_cube1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer3.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_position.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_ppr.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/innerjoin.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input12.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input13.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input14.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input17.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input18.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input1_limit.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input_part2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/insert_into3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/join_nullsafe.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/metadata_only_queries_with_filters.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_gby.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_gby2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/parallel.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/ppd_transform.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25943/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-19 Thread Chao Sun


 On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote:
  Nice work.
  
  Besides comment below, I think there are some improvement can be done, 
  either here or in a different patch:
  
  1. If we have a module that can compile an op tree (given by top ops) into 
  a spark task, then we can reuse it after the original op tree is broken 
  into several trees. From each tree, we compile it into a spark task. In the 
  end, we hook up parent child relation ship. The current logic is a little 
  complicated and hard to understand.
  2. Tests 
  3. Optimizations

I agree. I can do these in separate following patches.


 On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java,
   line 142
  https://reviews.apache.org/r/25394/diff/3/?file=693788#file693788line142
 
  Here we are mapping the children of lca to lca itself. Why is this 
  necessary, as you can find the chidren of lca later without the map. Cannot 
  we just store lca here?

The problem is because we are only generating one FS but multiple TSs. After 
the FS and the first TS is generated, the relation between child-parent is lost 
(since the optree is modified), and hence we need to store this information 
somewhere else, to be used when process the rest TSs.


 On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java,
   line 140
  https://reviews.apache.org/r/25394/diff/3/?file=693788#file693788line140
 
  This seems covering only the case where all FSs have a commont FORWARD 
  parent. What if only some of them sharing a FORWARD parent, but other FSs 
  and the FORWARD operator sharing some common parent?
  
  I think the rule for whether to break the plan goes like this:
  
  A plan needs to be broken if and only if there are more than one 
  FileSinkOperator that can be traced back to a common parent and the tracing 
  has to pass a ReduceSinkOperator on the way.

In this case the LCA is not a FOR, then break at this point is safe (might not 
be optimal), is that right?
Personally, after so many attempts, I'm a bit inclined to just do what MR does: 
go top-down and keep the first RS in the same SparkWork. For the rests RS, just 
break the plan.


 On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java,
   line 120
  https://reviews.apache.org/r/25394/diff/3/?file=693788#file693788line120
 
  I feel that the logic here can be simplified. Could we just pop all 
  paths and then check if the root is the same and keep doing so until the 
  common parent is found?

I'm not quite sure. I would happily accept if you have a better algorithm :) 
(the one I'm using is a just standard algorithm for finding LCA).
The LCA could be at different place in each path. How do you proceed to pop all 
paths? Also, there could be multiple common parents, but we need to identify 
the lowest one.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/#review53871
---


On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25394/
 ---
 
 (Updated Sept. 18, 2014, 6:38 p.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: HIVE-7503
 https://issues.apache.org/jira/browse/HIVE-7503
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 For Hive's multi insert query 
 (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
 may be an MR job for each insert. When we achieve this with Spark, it would 
 be nice if all the inserts can happen concurrently.
 It seems that this functionality isn't available in Spark. To make things 
 worse, the source of the insert may be re-computed unless it's staged. Even 
 with this, the inserts will happen sequentially, making the performance 
 suffer.
 This task is to find out what takes in Spark to enable this without requiring 
 staging the source and sequential insertion. If this has to be solved in 
 Hive, find out an optimum way to do this.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 4211a0703f5b6bfd8a628b13864fac75ef4977cf 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 695d8b90cb1989805a7ff4e39a9635bbcea9c66c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 
 864965e03a3f9d665e21e1c1b10b19dc286b842f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark

Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-19 Thread Chao Sun


 On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java,
   line 142
  https://reviews.apache.org/r/25394/diff/3/?file=693788#file693788line142
 
  Here we are mapping the children of lca to lca itself. Why is this 
  necessary, as you can find the chidren of lca later without the map. Cannot 
  we just store lca here?
 
 Chao Sun wrote:
 The problem is because we are only generating one FS but multiple TSs. 
 After the FS and the first TS is generated, the relation between child-parent 
 is lost (since the optree is modified), and hence we need to store this 
 information somewhere else, to be used when process the rest TSs.

It might be tricky to just store LCA. When the graph walker reaches a node, it 
needs to check whether that node is a child of LCA, and if so, break the plan.
You could say that since we have LCA, we have all its children info. However, 
after the first child, the children for the LCA are changed, so we need to 
store this info somewhere, IMHO.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/#review53871
---


On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25394/
 ---
 
 (Updated Sept. 18, 2014, 6:38 p.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: HIVE-7503
 https://issues.apache.org/jira/browse/HIVE-7503
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 For Hive's multi insert query 
 (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
 may be an MR job for each insert. When we achieve this with Spark, it would 
 be nice if all the inserts can happen concurrently.
 It seems that this functionality isn't available in Spark. To make things 
 worse, the source of the insert may be re-computed unless it's staged. Even 
 with this, the inserts will happen sequentially, making the performance 
 suffer.
 This task is to find out what takes in Spark to enable this without requiring 
 staging the source and sequential insertion. If this has to be solved in 
 Hive, find out an optimum way to do this.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 4211a0703f5b6bfd8a628b13864fac75ef4977cf 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 695d8b90cb1989805a7ff4e39a9635bbcea9c66c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 
 864965e03a3f9d665e21e1c1b10b19dc286b842f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  5fcaf643a0e90fc4acc21187f6d78cefdb1b691a 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25394/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-19 Thread Chao Sun


 On Sept. 19, 2014, 8:14 p.m., Xuefu Zhang wrote:
 

Fixed most of the issues through a offline chat with Xuefu. Thanks!


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/#review54004
---


On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25394/
 ---
 
 (Updated Sept. 18, 2014, 6:38 p.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: HIVE-7503
 https://issues.apache.org/jira/browse/HIVE-7503
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 For Hive's multi insert query 
 (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
 may be an MR job for each insert. When we achieve this with Spark, it would 
 be nice if all the inserts can happen concurrently.
 It seems that this functionality isn't available in Spark. To make things 
 worse, the source of the insert may be re-computed unless it's staged. Even 
 with this, the inserts will happen sequentially, making the performance 
 suffer.
 This task is to find out what takes in Spark to enable this without requiring 
 staging the source and sequential insertion. If this has to be solved in 
 Hive, find out an optimum way to do this.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 4211a0703f5b6bfd8a628b13864fac75ef4977cf 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 695d8b90cb1989805a7ff4e39a9635bbcea9c66c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 
 864965e03a3f9d665e21e1c1b10b19dc286b842f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  5fcaf643a0e90fc4acc21187f6d78cefdb1b691a 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25394/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-19 Thread Chao Sun


 On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote:
  Nice work.
  
  Besides comment below, I think there are some improvement can be done, 
  either here or in a different patch:
  
  1. If we have a module that can compile an op tree (given by top ops) into 
  a spark task, then we can reuse it after the original op tree is broken 
  into several trees. From each tree, we compile it into a spark task. In the 
  end, we hook up parent child relation ship. The current logic is a little 
  complicated and hard to understand.
  2. Tests 
  3. Optimizations
 
 Chao Sun wrote:
 I agree. I can do these in separate following patches.

Following up discussion with Xuefu:

4. we should create a separate context specifically for multi-insertion. This 
can be done in a separate JIRA.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/#review53871
---


On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25394/
 ---
 
 (Updated Sept. 18, 2014, 6:38 p.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: HIVE-7503
 https://issues.apache.org/jira/browse/HIVE-7503
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 For Hive's multi insert query 
 (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
 may be an MR job for each insert. When we achieve this with Spark, it would 
 be nice if all the inserts can happen concurrently.
 It seems that this functionality isn't available in Spark. To make things 
 worse, the source of the insert may be re-computed unless it's staged. Even 
 with this, the inserts will happen sequentially, making the performance 
 suffer.
 This task is to find out what takes in Spark to enable this without requiring 
 staging the source and sequential insertion. If this has to be solved in 
 Hive, find out an optimum way to do this.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 4211a0703f5b6bfd8a628b13864fac75ef4977cf 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 695d8b90cb1989805a7ff4e39a9635bbcea9c66c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 
 864965e03a3f9d665e21e1c1b10b19dc286b842f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  5fcaf643a0e90fc4acc21187f6d78cefdb1b691a 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25394/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-19 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/
---

(Updated Sept. 20, 2014, 12:04 a.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Made some changes according to suggestions by Xuefu, also added more comments.


Bugs: HIVE-7503
https://issues.apache.org/jira/browse/HIVE-7503


Repository: hive-git


Description
---

For Hive's multi insert query 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
may be an MR job for each insert. When we achieve this with Spark, it would be 
nice if all the inserts can happen concurrently.
It seems that this functionality isn't available in Spark. To make things 
worse, the source of the insert may be re-computed unless it's staged. Even 
with this, the inserts will happen sequentially, making the performance suffer.
This task is to find out what takes in Spark to enable this without requiring 
staging the source and sequential insertion. If this has to be solved in Hive, 
find out an optimum way to do this.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
4211a07 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
5fcaf64 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25394/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-19 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/
---

(Updated Sept. 20, 2014, 1:33 a.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-7503
https://issues.apache.org/jira/browse/HIVE-7503


Repository: hive-git


Description
---

For Hive's multi insert query 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
may be an MR job for each insert. When we achieve this with Spark, it would be 
nice if all the inserts can happen concurrently.
It seems that this functionality isn't available in Spark. To make things 
worse, the source of the insert may be re-computed unless it's staged. Even 
with this, the inserts will happen sequentially, making the performance suffer.
This task is to find out what takes in Spark to enable this without requiring 
staging the source and sequential insertion. If this has to be solved in Hive, 
find out an optimum way to do this.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
4211a07 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
5fcaf64 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/insert1.q.out 49fb1d4 
  ql/src/test/results/clientpositive/spark/union18.q.out 9a40807 
  ql/src/test/results/clientpositive/spark/union19.q.out 131591f 
  ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1bc55f4 

Diff: https://reviews.apache.org/r/25394/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-19 Thread Chao Sun


 On Sept. 20, 2014, 1:03 a.m., Brock Noland wrote:
  Awesome work I have a few minor comments that can be addressed in a 
  *follow on* patch.

Thanks brock for the comments! I've attached the updated patch.


 On Sept. 20, 2014, 1:03 a.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java, 
  line 92
  https://reviews.apache.org/r/25394/diff/4/?file=698349#file698349line92
 
  it sounds like we'll be creating a multi-insert specific context? In 
  that context can we make all the members private?

Yes, I'll do that in the following patch.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/#review54065
---


On Sept. 20, 2014, 1:33 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25394/
 ---
 
 (Updated Sept. 20, 2014, 1:33 a.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: HIVE-7503
 https://issues.apache.org/jira/browse/HIVE-7503
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 For Hive's multi insert query 
 (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
 may be an MR job for each insert. When we achieve this with Spark, it would 
 be nice if all the inserts can happen concurrently.
 It seems that this functionality isn't available in Spark. To make things 
 worse, the source of the insert may be re-computed unless it's staged. Even 
 with this, the inserts will happen sequentially, making the performance 
 suffer.
 This task is to find out what takes in Spark to enable this without requiring 
 staging the source and sequential insertion. If this has to be solved in 
 Hive, find out an optimum way to do this.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
 4211a07 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
 695d8b9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e 
   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
 76fc290 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
  5fcaf64 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/insert1.q.out 49fb1d4 
   ql/src/test/results/clientpositive/spark/union18.q.out 9a40807 
   ql/src/test/results/clientpositive/spark/union19.q.out 131591f 
   ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1bc55f4 
 
 Diff: https://reviews.apache.org/r/25394/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-18 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/
---

(Updated Sept. 18, 2014, 6:38 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Main changed the way for detecting multi-insertion pattern.


Bugs: HIVE-7503
https://issues.apache.org/jira/browse/HIVE-7503


Repository: hive-git


Description
---

For Hive's multi insert query 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
may be an MR job for each insert. When we achieve this with Spark, it would be 
nice if all the inserts can happen concurrently.
It seems that this functionality isn't available in Spark. To make things 
worse, the source of the insert may be re-computed unless it's staged. Even 
with this, the inserts will happen sequentially, making the performance suffer.
This task is to find out what takes in Spark to enable this without requiring 
staging the source and sequential insertion. If this has to be solved in Hive, 
find out an optimum way to do this.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
4211a0703f5b6bfd8a628b13864fac75ef4977cf 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
695d8b90cb1989805a7ff4e39a9635bbcea9c66c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 
864965e03a3f9d665e21e1c1b10b19dc286b842f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
5fcaf643a0e90fc4acc21187f6d78cefdb1b691a 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25394/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/
---

(Updated Sept. 5, 2014, 6:18 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-7503
https://issues.apache.org/jira/browse/HIVE-7503


Repository: hive-git


Description
---

For Hive's multi insert query 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
may be an MR job for each insert. When we achieve this with Spark, it would be 
nice if all the inserts can happen concurrently.
It seems that this functionality isn't available in Spark. To make things 
worse, the source of the insert may be re-computed unless it's staged. Even 
with this, the inserts will happen sequentially, making the performance suffer.
This task is to find out what takes in Spark to enable this without requiring 
staging the source and sequential insertion. If this has to be solved in Hive, 
find out an optimum way to do this.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
5ddc16d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
5fcaf64 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25394/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25280: Refactoring GraphTran to make it conform to SparkTran interface. [Spark Branch]

2014-09-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25280/
---

(Updated Sept. 5, 2014, 6:20 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-7939
https://issues.apache.org/jira/browse/HIVE-7939


Repository: hive-git


Description
---

Currently, GraphTran uses its own execute method, which executes the operator 
plan in a DFS fashion, and does something special for union. The goal for this 
JIRA is to do some refactoring and make it conform to the SparkTran interface.
The initial idea is to use varargs for SparkTran::transform.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java 5d4414a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java b03a51c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 76b74e7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 46e4b6d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
9b11fe4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 19894b0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 5ec7d0f 
  ql/src/test/results/clientpositive/spark/union17.q.out.sorted PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union20.q.out.sorted PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union21.q.out.sorted PRE-CREATION 
  ql/src/test/results/clientpositive/spark/union27.q.out.sorted PRE-CREATION 

Diff: https://reviews.apache.org/r/25280/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]

2014-09-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25394/
---

(Updated Sept. 5, 2014, 8:35 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-7503
https://issues.apache.org/jira/browse/HIVE-7503


Repository: hive-git


Description
---

For Hive's multi insert query 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
may be an MR job for each insert. When we achieve this with Spark, it would be 
nice if all the inserts can happen concurrently.
It seems that this functionality isn't available in Spark. To make things 
worse, the source of the insert may be re-computed unless it's staged. Even 
with this, the inserts will happen sequentially, making the performance suffer.
This task is to find out what takes in Spark to enable this without requiring 
staging the source and sequential insertion. If this has to be solved in Hive, 
find out an optimum way to do this.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
5ddc16d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25394/diff/


Testing
---


Thanks,

Chao Sun



Review Request 25404: NPE while reading null decimal value

2014-09-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25404/
---

Review request for hive, Brock Noland and Xuefu Zhang.


Repository: hive-git


Description
---

Say you have this table dec_test:
dec decimal(10,0)   
If the table has a row that is 99.5, and if we do
select * from dec_test;
it will crash with NPE:
2014-09-05 14:08:56,023 ERROR [main]: CliDriver 
(SessionState.java:printError(545)) - Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
  ... 12 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
  at 
org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39)
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
  ... 19 more


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 
00ea481c2eed84de12815eedb079e965aa2ee701 

Diff: https://reviews.apache.org/r/25404/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25404: NPE while reading null decimal value

2014-09-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25404/
---

(Updated Sept. 5, 2014, 11:06 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-8008
https://issues.apache.org/jira/browse/HIVE-8008


Repository: hive-git


Description
---

Say you have this table dec_test:
dec decimal(10,0)   
If the table has a row that is 99.5, and if we do
select * from dec_test;
it will crash with NPE:
2014-09-05 14:08:56,023 ERROR [main]: CliDriver 
(SessionState.java:printError(545)) - Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
  ... 12 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
  at 
org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39)
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
  ... 19 more


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 
00ea481c2eed84de12815eedb079e965aa2ee701 

Diff: https://reviews.apache.org/r/25404/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25404: NPE while reading null decimal value

2014-09-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25404/
---

(Updated Sept. 5, 2014, 11:38 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Sorry for the extra blank line at end of the test method..


Bugs: HIVE-8008
https://issues.apache.org/jira/browse/HIVE-8008


Repository: hive-git


Description
---

Say you have this table dec_test:
dec decimal(10,0)   
If the table has a row that is 99.5, and if we do
select * from dec_test;
it will crash with NPE:
2014-09-05 14:08:56,023 ERROR [main]: CliDriver 
(SessionState.java:printError(545)) - Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
  ... 12 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
  at 
org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39)
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
  ... 19 more


Diffs (updated)
-

  HIVE-8008.patch2 PRE-CREATION 
  common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 
00ea481c2eed84de12815eedb079e965aa2ee701 
  common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 
769410d474fdc0ecbd63c7fe8944b2f6d23d5e5a 

Diff: https://reviews.apache.org/r/25404/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25404: NPE while reading null decimal value

2014-09-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25404/
---

(Updated Sept. 5, 2014, 11:54 p.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

sorry. that last patch has a patch in it by accident. 


Bugs: HIVE-8008
https://issues.apache.org/jira/browse/HIVE-8008


Repository: hive-git


Description
---

Say you have this table dec_test:
dec decimal(10,0)   
If the table has a row that is 99.5, and if we do
select * from dec_test;
it will crash with NPE:
2014-09-05 14:08:56,023 ERROR [main]: CliDriver 
(SessionState.java:printError(545)) - Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
  ... 12 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
  at 
org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39)
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
  ... 19 more


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 
00ea481c2eed84de12815eedb079e965aa2ee701 
  common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 
769410d474fdc0ecbd63c7fe8944b2f6d23d5e5a 

Diff: https://reviews.apache.org/r/25404/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 25404: NPE while reading null decimal value

2014-09-05 Thread Chao Sun


 On Sept. 6, 2014, 12:03 a.m., Lars Francke wrote:
  Good find! Only two minor style issues. +1

Thanks! Sorry I wasn't paying enough attention to long lines - will fix those.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25404/#review52529
---


On Sept. 5, 2014, 11:54 p.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25404/
 ---
 
 (Updated Sept. 5, 2014, 11:54 p.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: HIVE-8008
 https://issues.apache.org/jira/browse/HIVE-8008
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Say you have this table dec_test:
 dec   decimal(10,0)   
 If the table has a row that is 99.5, and if we do
 select * from dec_test;
 it will crash with NPE:
 2014-09-05 14:08:56,023 ERROR [main]: CliDriver 
 (SessionState.java:printError(545)) - Failed with exception 
 java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
   ... 12 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
   at 
 org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
   at 
 org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39)
   at 
 org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
   ... 19 more
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 
 00ea481c2eed84de12815eedb079e965aa2ee701 
   common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 
 769410d474fdc0ecbd63c7fe8944b2f6d23d5e5a 
 
 Diff: https://reviews.apache.org/r/25404/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 25404: NPE while reading null decimal value

2014-09-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25404/
---

(Updated Sept. 6, 2014, 12:44 a.m.)


Review request for hive, Brock Noland and Xuefu Zhang.


Changes
---

Fixing style issue: wrapping long lines.


Bugs: HIVE-8008
https://issues.apache.org/jira/browse/HIVE-8008


Repository: hive-git


Description
---

Say you have this table dec_test:
dec decimal(10,0)   
If the table has a row that is 99.5, and if we do
select * from dec_test;
it will crash with NPE:
2014-09-05 14:08:56,023 ERROR [main]: CliDriver 
(SessionState.java:printError(545)) - Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
  at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544)
  at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
  ... 12 more
Caused by: java.lang.NullPointerException
  at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
  at 
org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71)
  at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70)
  at 
org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39)
  at 
org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87)
  ... 19 more


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 
00ea481c2eed84de12815eedb079e965aa2ee701 
  common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 
769410d474fdc0ecbd63c7fe8944b2f6d23d5e5a 

Diff: https://reviews.apache.org/r/25404/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 24297: Spark Explain should give useful information on dependencies

2014-08-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24297/
---

(Updated Aug. 5, 2014, 6:09 a.m.)


Review request for hive and Brock Noland.


Bugs: HIVE-7607
https://issues.apache.org/jira/browse/HIVE-7607


Repository: hive-git


Description
---

Currently, when using Explain under Spark mode, it displays dependency 
information like this:

 STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
Reducer 2 
[org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@29a09c49, 
org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@6f7491f8]
  DagName: chao_20140804145151_acc57d5a-27fa-44c0-aabc-052b318ed832:2

I think it should be improved by giving more information on the dependencies, 
such as work information and edge type.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1 

Diff: https://reviews.apache.org/r/24297/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 24297: Spark Explain should give useful information on dependencies

2014-08-05 Thread Chao Sun


 On Aug. 5, 2014, 6:15 a.m., Brock Noland wrote:
  Hi Chao,
  
  Can you share what the output looks like with the patch?
  
  Hive has thousands of .q file tests 
  (https://github.com/apache/hive/tree/trunk/ql/src/test/queries/clientpositive)
   and most of them do an EXPLAIN. Thus I think this change might modify 
  quite a few .q file tests. In which case it might be better to do a smaller 
  change which only impacts Spark.

I worried about that too - after a little grep I found it might affect quite a 
few places in MapWork. Perhaps we'll have to do something as Tez does. Sorry, 
I'll make change to it.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24297/#review49570
---


On Aug. 5, 2014, 6:09 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24297/
 ---
 
 (Updated Aug. 5, 2014, 6:09 a.m.)
 
 
 Review request for hive and Brock Noland.
 
 
 Bugs: HIVE-7607
 https://issues.apache.org/jira/browse/HIVE-7607
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently, when using Explain under Spark mode, it displays dependency 
 information like this:
 
  STAGE PLANS:
   Stage: Stage-1
 Spark
   Edges:
 Reducer 2 
 [org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@29a09c49, 
 org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@6f7491f8]
   DagName: chao_20140804145151_acc57d5a-27fa-44c0-aabc-052b318ed832:2
 
 I think it should be improved by giving more information on the dependencies, 
 such as work information and edge type.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1 
 
 Diff: https://reviews.apache.org/r/24297/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 24297: Spark Explain should give useful information on dependencies

2014-08-05 Thread Chao Sun


 On Aug. 5, 2014, 6:15 a.m., Brock Noland wrote:
  Hi Chao,
  
  Can you share what the output looks like with the patch?
  
  Hive has thousands of .q file tests 
  (https://github.com/apache/hive/tree/trunk/ql/src/test/queries/clientpositive)
   and most of them do an EXPLAIN. Thus I think this change might modify 
  quite a few .q file tests. In which case it might be better to do a smaller 
  change which only impacts Spark.
 
 Chao Sun wrote:
 I worried about that too - after a little grep I found it might affect 
 quite a few places in MapWork. Perhaps we'll have to do something as Tez 
 does. Sorry, I'll make change to it.

Hi Brock,

I've updated the patch. It basically is the same as what Tez does. Another 
option would be to override toString() in SparkWork, without modifying 
ExplainWork.
Please let me know which one you think is better. Thanks.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24297/#review49570
---


On Aug. 5, 2014, 6:09 a.m., Chao Sun wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24297/
 ---
 
 (Updated Aug. 5, 2014, 6:09 a.m.)
 
 
 Review request for hive and Brock Noland.
 
 
 Bugs: HIVE-7607
 https://issues.apache.org/jira/browse/HIVE-7607
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently, when using Explain under Spark mode, it displays dependency 
 information like this:
 
  STAGE PLANS:
   Stage: Stage-1
 Spark
   Edges:
 Reducer 2 
 [org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@29a09c49, 
 org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@6f7491f8]
   DagName: chao_20140804145151_acc57d5a-27fa-44c0-aabc-052b318ed832:2
 
 I think it should be improved by giving more information on the dependencies, 
 such as work information and edge type.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1 
 
 Diff: https://reviews.apache.org/r/24297/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chao Sun
 




Re: Review Request 24297: Spark Explain should give useful information on dependencies

2014-08-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24297/
---

(Updated Aug. 5, 2014, 5:43 p.m.)


Review request for hive and Brock Noland.


Bugs: HIVE-7607
https://issues.apache.org/jira/browse/HIVE-7607


Repository: hive-git


Description
---

Currently, when using Explain under Spark mode, it displays dependency 
information like this:

 STAGE PLANS:
  Stage: Stage-1
Spark
  Edges:
Reducer 2 
[org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@29a09c49, 
org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@6f7491f8]
  DagName: chao_20140804145151_acc57d5a-27fa-44c0-aabc-052b318ed832:2

I think it should be improved by giving more information on the dependencies, 
such as work information and edge type.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1 

Diff: https://reviews.apache.org/r/24297/diff/


Testing
---


Thanks,

Chao Sun



Review Request 24352: StarterProject: Fix exception handling in POC code

2014-08-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24352/
---

Review request for hive and Brock Noland.


Repository: hive-git


Description
---

The POC code just printed exceptions to stderr. We should either:
1) LOG at INFO/WARN/ERROR
2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 20a1938 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 358cbc7 

Diff: https://reviews.apache.org/r/24352/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 24352: StarterProject: Fix exception handling in POC code

2014-08-05 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24352/
---

(Updated Aug. 6, 2014, 4:53 a.m.)


Review request for hive and Brock Noland.


Changes
---

Hi Brock,

Thanks for the comments! I've addressed these issues and updated the patch. 
Please take a look.


Bugs: HIVE-7560
https://issues.apache.org/jira/browse/HIVE-7560


Repository: hive-git


Description
---

The POC code just printed exceptions to stderr. We should either:
1) LOG at INFO/WARN/ERROR
2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 20a1938 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 358cbc7 

Diff: https://reviews.apache.org/r/24352/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 24195: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark

2014-08-04 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24195/
---

(Updated Aug. 4, 2014, 9:50 p.m.)


Review request for hive.


Changes
---

Hi Brock,

Thanks for the suggestions! I've updated the diff. Please have a look. :)


Bugs: HIVE-7561
https://issues.apache.org/jira/browse/HIVE-7561


Repository: hive-git


Description
---

Hive uses the assert keyword all over the place. The problem is that 
assertions are rarely enabled since they have to be specifically enabled. In 
the Spark code, e.g. GenSparkUtils, let's use Preconditions.*.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 8c58333 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 25eea14 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ceb7b6c 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
3a0f4c9 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 86d14f1 

Diff: https://reviews.apache.org/r/24195/diff/


Testing
---


Thanks,

Chao Sun



Review Request 24195: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark

2014-08-01 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24195/
---

Review request for hive.


Repository: hive-git


Description
---

Hive uses the assert keyword all over the place. The problem is that 
assertions are rarely enabled since they have to be specifically enabled. In 
the Spark code, e.g. GenSparkUtils, let's use Preconditions.*.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 8c58333 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 25eea14 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ceb7b6c 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
3a0f4c9 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 86d14f1 

Diff: https://reviews.apache.org/r/24195/diff/


Testing
---


Thanks,

Chao Sun



Re: Review Request 24195: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark

2014-08-01 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24195/
---

(Updated Aug. 1, 2014, 11:45 p.m.)


Review request for hive.


Bugs: HIVE-7561
https://issues.apache.org/jira/browse/HIVE-7561


Repository: hive-git


Description
---

Hive uses the assert keyword all over the place. The problem is that 
assertions are rarely enabled since they have to be specifically enabled. In 
the Spark code, e.g. GenSparkUtils, let's use Preconditions.*.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java
 8c58333 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 25eea14 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ceb7b6c 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 
3a0f4c9 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 86d14f1 

Diff: https://reviews.apache.org/r/24195/diff/


Testing
---


Thanks,

Chao Sun



Review Request 24127: Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination

2014-07-30 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24127/
---

Review request for hive.


Repository: hive-git


Description
---

An attempt to fix the last patch by moving groupBy op to ShuffleTran.
Also, since now SparkTran::transform may have input/output value types other 
than BytesWritable, we need to make it generic as well..
Also added a CompTran class, which is basically a composition of 
transformations. It offers better type compatibility than ChainedTran.
This is NOT the perfect solution, and may subject to further change.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ChainedTran.java 4991568 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CompTran.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 01a70e9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
841db87 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 98d08e6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java d1af86d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java 33e7d45 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java cf85af1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
440dd93 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 6aa732f 

Diff: https://reviews.apache.org/r/24127/diff/


Testing
---


Thanks,

Chao Sun



Review Request 23530: HIVE-6560: varchar and char types cannot be cast to binary

2014-07-15 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23530/
---

Review request for hive.


Bugs: HIVE-6560
https://issues.apache.org/jira/browse/HIVE-6560


Repository: hive-git


Description
---

HIVE-6560: varchar and char types cannot be cast to binary


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToBinary.java 
b31b81b 
  ql/src/test/queries/clientpositive/udf_binary.q PRE-CREATION 
  ql/src/test/results/clientpositive/udf_binary.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/23530/diff/


Testing
---

N/A


Thanks,

Chao Sun



<    1   2   3   4