Review Request 26296: HIVE-8331 - HIVE-8303 followup, investigate result diff [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26296/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-8331 https://issues.apache.org/jira/browse/HIVE-8331 Repository: hive-git Description --- HIVE-8303 patch introduced some result diffs for some spark tests. We need to investigate those, including parallel_join0.q, union22.q, vectorized_shufflejoin.q, union_remove_18.q, and maybe more. Also the investigation includes the test failures related to spark. Specifically, union_remove_18.q demonstrated random order. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java fed6ccd ql/src/test/results/clientpositive/spark/union_remove_18.q.out 60ab60b Diff: https://reviews.apache.org/r/26296/diff/ Testing --- Thanks, Chao Sun
Review Request 26181: HIVE-8262 - Create CacheTran that transforms the input RDD by caching it [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26181/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-8262 https://issues.apache.org/jira/browse/HIVE-8262 Repository: hive-git Description --- In a few cases we need to cache a RDD to avoid recompute it for better performance. However, caching a map input RDD is different from caching a regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the input to MapWork, is to cache, the result RDD that is transformed from the original Hadoop RDD by applying a map function, in which key, value pairs are copied. To cache intermediate RDDs, such as that from a shuffle, is just calling .cache(). This task is to create a CacheTran to capture this, which can be used to plug in Spark Plan when caching is desirable. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CachedTran.java PRE-CREATION Diff: https://reviews.apache.org/r/26181/diff/ Testing --- Thanks, Chao Sun
Review Request 26211: HIVE-8314 - Restore thrift string interning of HIVE-7975
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26211/ --- Review request for hive and Szehon Ho. Bugs: HIVE-8314 https://issues.apache.org/jira/browse/HIVE-8314 Repository: hive-git Description --- In HIVE-7975 did string interning in thrift-generated code by having a google-replacer plugin run with -Pthriftif that does the replacements. In commit of HIVE-7482, it was removed as it was done without that plugin. Thrift code should be regenerated. Diffs - metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FieldSchema.java a993810 metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java 312807e metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java 24d65bb metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java d0b9843 Diff: https://reviews.apache.org/r/26211/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26120: HIVE-8278 - Restoring a graph representation of SparkPlan [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26120/ --- (Updated Sept. 29, 2014, 8:16 p.m.) Review request for hive and Xuefu Zhang. Changes --- Updated according to Xuefu's suggestions. Bugs: HIVE-8278 https://issues.apache.org/jira/browse/HIVE-8278 Repository: hive-git Description --- HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator logic. As a side effect, however, a visual representation of SparkPlan got lost. Such representation is helpful for debugging and performance profiling. In addition, it would be also good to separate plan generation and plan execution. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java f8b3283 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 15af0f9 Diff: https://reviews.apache.org/r/26120/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26120: HIVE-8278 - Restoring a graph representation of SparkPlan [Spark Branch]
On Sept. 29, 2014, 11:34 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java, line 34 https://reviews.apache.org/r/26120/diff/2/?file=708344#file708344line34 Do we need rootTrans? I thought rootTrans is just the same as the keyset of mapInputs, so I removed it. Correct me if I'm wrong. On Sept. 29, 2014, 11:34 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java, line 72 https://reviews.apache.org/r/26120/diff/2/?file=708344#file708344line72 So we are not do union of those any more? I forgot ... now fixed. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26120/#review54904 --- On Sept. 29, 2014, 8:16 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26120/ --- (Updated Sept. 29, 2014, 8:16 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8278 https://issues.apache.org/jira/browse/HIVE-8278 Repository: hive-git Description --- HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator logic. As a side effect, however, a visual representation of SparkPlan got lost. Such representation is helpful for debugging and performance profiling. In addition, it would be also good to separate plan generation and plan execution. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java f8b3283 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 15af0f9 Diff: https://reviews.apache.org/r/26120/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26120: HIVE-8278 - Restoring a graph representation of SparkPlan [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26120/ --- (Updated Sept. 30, 2014, 12:06 a.m.) Review request for hive and Xuefu Zhang. Changes --- Thanks Xuefu for the comments. Now a updated patch. Bugs: HIVE-8278 https://issues.apache.org/jira/browse/HIVE-8278 Repository: hive-git Description --- HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator logic. As a side effect, however, a visual representation of SparkPlan got lost. Such representation is helpful for debugging and performance profiling. In addition, it would be also good to separate plan generation and plan execution. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java f8b3283 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 15af0f9 Diff: https://reviews.apache.org/r/26120/diff/ Testing --- Thanks, Chao Sun
Review Request 26120: HIVE-8278 - Restoring a graph representation of SparkPlan [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26120/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-8278 https://issues.apache.org/jira/browse/HIVE-8278 Repository: hive-git Description --- HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator logic. As a side effect, however, a visual representation of SparkPlan got lost. Such representation is helpful for debugging and performance profiling. In addition, it would be also good to separate plan generation and plan execution. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java f8b3283 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 15af0f9 Diff: https://reviews.apache.org/r/26120/diff/ Testing --- Thanks, Chao Sun
Review Request 26047: HIVE-8256 - Add SORT_QUERY_RESULTS for test that doesn't guarantee order #2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26047/ --- Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-8256 https://issues.apache.org/jira/browse/HIVE-8256 Repository: hive-git Description --- Following HIVE-8035, we need to further add SORT_QUERY_RESULTS to a few more tests that doesn't guarantee output order. Diffs - ql/src/test/queries/clientpositive/groupby7.q 1235e3c ql/src/test/queries/clientpositive/groupby_complex_types.q bb1e6d2 ql/src/test/queries/clientpositive/table_access_keys_stats.q 23209d8 ql/src/test/results/clientpositive/groupby7.q.out ee0153a ql/src/test/results/clientpositive/groupby_complex_types.q.out 1697dd9 ql/src/test/results/clientpositive/table_access_keys_stats.q.out adea0f6 Diff: https://reviews.apache.org/r/26047/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26001: HIVE-8233 - multi-table insertion doesn't work with ForwardOperator [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26001/ --- (Updated Sept. 25, 2014, 6:42 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- I forgot that I need to change groupby11.q as well. Bugs: hive-8233 https://issues.apache.org/jira/browse/hive-8233 Repository: hive-git Description --- Right now, for multi-table insertion, we will start from multiple FileSinkOperators, and break from their lowest common ancestor, adding temporary FileSinkOperator and TableScanOperators. A special case is when the LCA is a ForwardOperator, in which case we don't break it, since it's already been optimized. However, there's a issue, considering the following plan: ... RS_0 | FOR | / \ GBY_1 GBY_2 | | ... ... | | RS_1 RS_2 | | ... ... | | FS_1 FS_2 which may result to: RW / \ RWRW Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream branches will get duplicated (and same) input. Diffs (updated) - itests/src/test/resources/testconfiguration.properties cd83998 ql/src/test/queries/clientpositive/spark_groupby11.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby7_noskew_multi_single_reducer.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_map.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_map_skew.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_noskew.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby9.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby_multi_insert_common_distinct.q PRE-CREATION ql/src/test/queries/clientpositive/spark_union17.q PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby11.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby7_noskew_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby9.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby_multi_insert_common_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_union17.q.out PRE-CREATION Diff: https://reviews.apache.org/r/26001/diff/ Testing --- Thanks, Chao Sun
Review Request 26001: HIVE-8233 - multi-table insertion doesn't work with ForwardOperator [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26001/ --- Review request for hive, Brock Noland and Xuefu Zhang. Bugs: hive-8233 https://issues.apache.org/jira/browse/hive-8233 Repository: hive-git Description --- Right now, for multi-table insertion, we will start from multiple FileSinkOperators, and break from their lowest common ancestor, adding temporary FileSinkOperator and TableScanOperators. A special case is when the LCA is a ForwardOperator, in which case we don't break it, since it's already been optimized. However, there's a issue, considering the following plan: ... RS_0 | FOR | / \ GBY_1 GBY_2 | | ... ... | | RS_1 RS_2 | | ... ... | | FS_1 FS_2 which may result to: RW / \ RWRW Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream branches will get duplicated (and same) input. Diffs - itests/src/test/resources/testconfiguration.properties 637fbc1 ql/src/test/queries/clientpositive/spark_groupby7_noskew_multi_single_reducer.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_map.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_map_skew.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_noskew.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby9.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby_multi_insert_common_distinct.q PRE-CREATION ql/src/test/queries/clientpositive/spark_union17.q PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby7_noskew_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby9.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby_multi_insert_common_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_union17.q.out PRE-CREATION Diff: https://reviews.apache.org/r/26001/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26001: HIVE-8233 - multi-table insertion doesn't work with ForwardOperator [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26001/ --- (Updated Sept. 24, 2014, 9:04 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- Made these qfiles as spark-only tests. Bugs: hive-8233 https://issues.apache.org/jira/browse/hive-8233 Repository: hive-git Description --- Right now, for multi-table insertion, we will start from multiple FileSinkOperators, and break from their lowest common ancestor, adding temporary FileSinkOperator and TableScanOperators. A special case is when the LCA is a ForwardOperator, in which case we don't break it, since it's already been optimized. However, there's a issue, considering the following plan: ... RS_0 | FOR | / \ GBY_1 GBY_2 | | ... ... | | RS_1 RS_2 | | ... ... | | FS_1 FS_2 which may result to: RW / \ RWRW Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream branches will get duplicated (and same) input. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 637fbc1 ql/src/test/queries/clientpositive/spark_groupby7_noskew_multi_single_reducer.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_map.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_map_skew.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby8_noskew.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby9.q PRE-CREATION ql/src/test/queries/clientpositive/spark_groupby_multi_insert_common_distinct.q PRE-CREATION ql/src/test/queries/clientpositive/spark_union17.q PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby7_noskew_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby8_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby9.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_groupby_multi_insert_common_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/spark_union17.q.out PRE-CREATION Diff: https://reviews.apache.org/r/26001/diff/ Testing --- Thanks, Chao Sun
Review Request 26007: HIVE-8249 - Refactoring SparkPlan and SparkPlanGenerator [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26007/ --- Review request for hive, Brock Noland and Xuefu Zhang. Bugs: hive-8249 https://issues.apache.org/jira/browse/hive-8249 Repository: hive-git Description --- Currently, the code for SparkPlanGenerator seems a little bit messy, and the logic is not quite clear. This JIRA is created to refactor this and related classes. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java acd42be ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 46e4b6d ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 7ab2ca0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 546b448 Diff: https://reviews.apache.org/r/26007/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26007: HIVE-8249 - Refactoring SparkPlan and SparkPlanGenerator [Spark Branch]
On Sept. 24, 2014, 11:51 p.m., Xuefu Zhang wrote: OK, I'll change it in the next patch, once the test result is out. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26007/#review54481 --- On Sept. 24, 2014, 10:50 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26007/ --- (Updated Sept. 24, 2014, 10:50 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: hive-8249 https://issues.apache.org/jira/browse/hive-8249 Repository: hive-git Description --- Currently, the code for SparkPlanGenerator seems a little bit messy, and the logic is not quite clear. This JIRA is created to refactor this and related classes. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java acd42be ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 46e4b6d ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 7ab2ca0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 546b448 Diff: https://reviews.apache.org/r/26007/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 26007: HIVE-8249 - Refactoring SparkPlan and SparkPlanGenerator [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26007/ --- (Updated Sept. 25, 2014, 1:02 a.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- Changed UnionTran to IdentityTran. Thanks Xuefu for the suggestion! Bugs: hive-8249 https://issues.apache.org/jira/browse/hive-8249 Repository: hive-git Description --- Currently, the code for SparkPlanGenerator seems a little bit messy, and the logic is not quite clear. This JIRA is created to refactor this and related classes. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java acd42be ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 46e4b6d ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 7ab2ca0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 546b448 Diff: https://reviews.apache.org/r/26007/diff/ Testing --- Thanks, Chao Sun
Review Request 25943: HIVE-8207 - Add .q tests for multi-table insertion [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25943/ --- Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-8207 https://issues.apache.org/jira/browse/HIVE-8207 Repository: hive-git Description --- Now that multi-table insertion is committed to branch, we should enable those related qtests. Diffs - ql/src/test/queries/clientpositive/groupby10.q 7750cb9 ql/src/test/queries/clientpositive/groupby11.q 0bf92ac ql/src/test/queries/clientpositive/groupby7.q 1235e3c ql/src/test/queries/clientpositive/groupby_complex_types.q bb1e6d2 ql/src/test/queries/clientpositive/subquery_multiinsert.q ed36d9e ql/src/test/queries/clientpositive/table_access_keys_stats.q 23209d8 ql/src/test/results/clientpositive/spark/add_part_multiple.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/column_access_stats.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/date_udf.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby11.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_map_multi_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_noskew_multi_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_cube1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_position.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_ppr.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/innerjoin.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input12.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input13.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input14.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input17.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input18.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input1_limit.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input_part2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/insert_into3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/join_nullsafe.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/metadata_only_queries_with_filters.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_gby.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_gby2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/parallel.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ppd_transform.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/table_access_keys_stats.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25943/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25943: HIVE-8207 - Add .q tests for multi-table insertion [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25943/ --- (Updated Sept. 24, 2014, 1:15 a.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- Fixing the last patch - sorry, I shouldn't modify qfiles in this JIRA, since it affects results from MR/Tez (Thanks for Xuefu for pointing out). Also, I didn't change testconfiguration.properties in the last patch. Bugs: HIVE-8207 https://issues.apache.org/jira/browse/HIVE-8207 Repository: hive-git Description --- Now that multi-table insertion is committed to branch, we should enable those related qtests. Diffs (updated) - itests/src/test/resources/testconfiguration.properties aa04c0a ql/src/test/results/clientpositive/spark/add_part_multiple.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/column_access_stats.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/date_udf.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_map_multi_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby3_noskew_multi_distinct.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_map_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby7_noskew.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_cube1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_multi_single_reducer3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_position.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_ppr.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_rollup1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/innerjoin.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input12.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input13.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input14.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input17.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input18.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input1_limit.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input_part2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/insert_into3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/join_nullsafe.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/metadata_only_queries_with_filters.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_gby.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_gby2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_gby3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_lateral_view.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multi_insert_move_tasks_share_dependencies.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/multigroupby_singlemr.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/parallel.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ppd_multi_insert.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ppd_transform.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/subquery_multiinsert.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25943/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote: Nice work. Besides comment below, I think there are some improvement can be done, either here or in a different patch: 1. If we have a module that can compile an op tree (given by top ops) into a spark task, then we can reuse it after the original op tree is broken into several trees. From each tree, we compile it into a spark task. In the end, we hook up parent child relation ship. The current logic is a little complicated and hard to understand. 2. Tests 3. Optimizations I agree. I can do these in separate following patches. On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java, line 142 https://reviews.apache.org/r/25394/diff/3/?file=693788#file693788line142 Here we are mapping the children of lca to lca itself. Why is this necessary, as you can find the chidren of lca later without the map. Cannot we just store lca here? The problem is because we are only generating one FS but multiple TSs. After the FS and the first TS is generated, the relation between child-parent is lost (since the optree is modified), and hence we need to store this information somewhere else, to be used when process the rest TSs. On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java, line 140 https://reviews.apache.org/r/25394/diff/3/?file=693788#file693788line140 This seems covering only the case where all FSs have a commont FORWARD parent. What if only some of them sharing a FORWARD parent, but other FSs and the FORWARD operator sharing some common parent? I think the rule for whether to break the plan goes like this: A plan needs to be broken if and only if there are more than one FileSinkOperator that can be traced back to a common parent and the tracing has to pass a ReduceSinkOperator on the way. In this case the LCA is not a FOR, then break at this point is safe (might not be optimal), is that right? Personally, after so many attempts, I'm a bit inclined to just do what MR does: go top-down and keep the first RS in the same SparkWork. For the rests RS, just break the plan. On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java, line 120 https://reviews.apache.org/r/25394/diff/3/?file=693788#file693788line120 I feel that the logic here can be simplified. Could we just pop all paths and then check if the root is the same and keep doing so until the common parent is found? I'm not quite sure. I would happily accept if you have a better algorithm :) (the one I'm using is a just standard algorithm for finding LCA). The LCA could be at different place in each path. How do you proceed to pop all paths? Also, there could be multiple common parents, but we need to identify the lowest one. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/#review53871 --- On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 18, 2014, 6:38 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a0703f5b6bfd8a628b13864fac75ef4977cf ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b90cb1989805a7ff4e39a9635bbcea9c66c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e03a3f9d665e21e1c1b10b19dc286b842f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 ql/src/java/org/apache/hadoop/hive/ql/parse/spark
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java, line 142 https://reviews.apache.org/r/25394/diff/3/?file=693788#file693788line142 Here we are mapping the children of lca to lca itself. Why is this necessary, as you can find the chidren of lca later without the map. Cannot we just store lca here? Chao Sun wrote: The problem is because we are only generating one FS but multiple TSs. After the FS and the first TS is generated, the relation between child-parent is lost (since the optree is modified), and hence we need to store this information somewhere else, to be used when process the rest TSs. It might be tricky to just store LCA. When the graph walker reaches a node, it needs to check whether that node is a child of LCA, and if so, break the plan. You could say that since we have LCA, we have all its children info. However, after the first child, the children for the LCA are changed, so we need to store this info somewhere, IMHO. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/#review53871 --- On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 18, 2014, 6:38 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a0703f5b6bfd8a628b13864fac75ef4977cf ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b90cb1989805a7ff4e39a9635bbcea9c66c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e03a3f9d665e21e1c1b10b19dc286b842f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf643a0e90fc4acc21187f6d78cefdb1b691a ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
On Sept. 19, 2014, 8:14 p.m., Xuefu Zhang wrote: Fixed most of the issues through a offline chat with Xuefu. Thanks! - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/#review54004 --- On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 18, 2014, 6:38 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a0703f5b6bfd8a628b13864fac75ef4977cf ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b90cb1989805a7ff4e39a9635bbcea9c66c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e03a3f9d665e21e1c1b10b19dc286b842f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf643a0e90fc4acc21187f6d78cefdb1b691a ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
On Sept. 19, 2014, 5:45 p.m., Xuefu Zhang wrote: Nice work. Besides comment below, I think there are some improvement can be done, either here or in a different patch: 1. If we have a module that can compile an op tree (given by top ops) into a spark task, then we can reuse it after the original op tree is broken into several trees. From each tree, we compile it into a spark task. In the end, we hook up parent child relation ship. The current logic is a little complicated and hard to understand. 2. Tests 3. Optimizations Chao Sun wrote: I agree. I can do these in separate following patches. Following up discussion with Xuefu: 4. we should create a separate context specifically for multi-insertion. This can be done in a separate JIRA. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/#review53871 --- On Sept. 18, 2014, 6:38 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 18, 2014, 6:38 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a0703f5b6bfd8a628b13864fac75ef4977cf ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b90cb1989805a7ff4e39a9635bbcea9c66c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e03a3f9d665e21e1c1b10b19dc286b842f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf643a0e90fc4acc21187f6d78cefdb1b691a ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 20, 2014, 12:04 a.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- Made some changes according to suggestions by Xuefu, also added more comments. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a07 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf64 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 20, 2014, 1:33 a.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a07 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf64 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION ql/src/test/results/clientpositive/spark/insert1.q.out 49fb1d4 ql/src/test/results/clientpositive/spark/union18.q.out 9a40807 ql/src/test/results/clientpositive/spark/union19.q.out 131591f ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1bc55f4 Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
On Sept. 20, 2014, 1:03 a.m., Brock Noland wrote: Awesome work I have a few minor comments that can be addressed in a *follow on* patch. Thanks brock for the comments! I've attached the updated patch. On Sept. 20, 2014, 1:03 a.m., Brock Noland wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java, line 92 https://reviews.apache.org/r/25394/diff/4/?file=698349#file698349line92 it sounds like we'll be creating a multi-insert specific context? In that context can we make all the members private? Yes, I'll do that in the following patch. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/#review54065 --- On Sept. 20, 2014, 1:33 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 20, 2014, 1:33 a.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a07 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b9 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf64 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION ql/src/test/results/clientpositive/spark/insert1.q.out 49fb1d4 ql/src/test/results/clientpositive/spark/union18.q.out 9a40807 ql/src/test/results/clientpositive/spark/union19.q.out 131591f ql/src/test/results/clientpositive/spark/union_remove_6.q.out 1bc55f4 Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 18, 2014, 6:38 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- Main changed the way for detecting multi-insertion pattern. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4211a0703f5b6bfd8a628b13864fac75ef4977cf ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 695d8b90cb1989805a7ff4e39a9635bbcea9c66c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e03a3f9d665e21e1c1b10b19dc286b842f ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290f00430dbc34dbbc1a0cef0d0eb59e6029 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMergeTaskProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf643a0e90fc4acc21187f6d78cefdb1b691a ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 5, 2014, 6:18 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 5fcaf64 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25280: Refactoring GraphTran to make it conform to SparkTran interface. [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25280/ --- (Updated Sept. 5, 2014, 6:20 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7939 https://issues.apache.org/jira/browse/HIVE-7939 Repository: hive-git Description --- Currently, GraphTran uses its own execute method, which executes the operator plan in a DFS fashion, and does something special for union. The goal for this JIRA is to do some refactoring and make it conform to the SparkTran interface. The initial idea is to use varargs for SparkTran::transform. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java 5d4414a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java b03a51c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 76b74e7 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java 46e4b6d ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 9b11fe4 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 19894b0 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 5ec7d0f ql/src/test/results/clientpositive/spark/union17.q.out.sorted PRE-CREATION ql/src/test/results/clientpositive/spark/union20.q.out.sorted PRE-CREATION ql/src/test/results/clientpositive/spark/union21.q.out.sorted PRE-CREATION ql/src/test/results/clientpositive/spark/union27.q.out.sorted PRE-CREATION Diff: https://reviews.apache.org/r/25280/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25394: HIVE-7503: Support Hive's multi-table insert query with Spark [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25394/ --- (Updated Sept. 5, 2014, 8:35 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7503 https://issues.apache.org/jira/browse/HIVE-7503 Repository: hive-git Description --- For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java 864965e ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkMultiInsertionProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkTableScanProcessor.java PRE-CREATION Diff: https://reviews.apache.org/r/25394/diff/ Testing --- Thanks, Chao Sun
Review Request 25404: NPE while reading null decimal value
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25404/ --- Review request for hive, Brock Noland and Xuefu Zhang. Repository: hive-git Description --- Say you have this table dec_test: dec decimal(10,0) If the table has a row that is 99.5, and if we do select * from dec_test; it will crash with NPE: 2014-09-05 14:08:56,023 ERROR [main]: CliDriver (SessionState.java:printError(545)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137) ... 12 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39) at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87) ... 19 more Diffs - common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 00ea481c2eed84de12815eedb079e965aa2ee701 Diff: https://reviews.apache.org/r/25404/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25404: NPE while reading null decimal value
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25404/ --- (Updated Sept. 5, 2014, 11:06 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-8008 https://issues.apache.org/jira/browse/HIVE-8008 Repository: hive-git Description --- Say you have this table dec_test: dec decimal(10,0) If the table has a row that is 99.5, and if we do select * from dec_test; it will crash with NPE: 2014-09-05 14:08:56,023 ERROR [main]: CliDriver (SessionState.java:printError(545)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137) ... 12 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39) at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87) ... 19 more Diffs - common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 00ea481c2eed84de12815eedb079e965aa2ee701 Diff: https://reviews.apache.org/r/25404/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25404: NPE while reading null decimal value
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25404/ --- (Updated Sept. 5, 2014, 11:38 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- Sorry for the extra blank line at end of the test method.. Bugs: HIVE-8008 https://issues.apache.org/jira/browse/HIVE-8008 Repository: hive-git Description --- Say you have this table dec_test: dec decimal(10,0) If the table has a row that is 99.5, and if we do select * from dec_test; it will crash with NPE: 2014-09-05 14:08:56,023 ERROR [main]: CliDriver (SessionState.java:printError(545)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137) ... 12 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39) at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87) ... 19 more Diffs (updated) - HIVE-8008.patch2 PRE-CREATION common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 00ea481c2eed84de12815eedb079e965aa2ee701 common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 769410d474fdc0ecbd63c7fe8944b2f6d23d5e5a Diff: https://reviews.apache.org/r/25404/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25404: NPE while reading null decimal value
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25404/ --- (Updated Sept. 5, 2014, 11:54 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- sorry. that last patch has a patch in it by accident. Bugs: HIVE-8008 https://issues.apache.org/jira/browse/HIVE-8008 Repository: hive-git Description --- Say you have this table dec_test: dec decimal(10,0) If the table has a row that is 99.5, and if we do select * from dec_test; it will crash with NPE: 2014-09-05 14:08:56,023 ERROR [main]: CliDriver (SessionState.java:printError(545)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137) ... 12 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39) at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87) ... 19 more Diffs (updated) - common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 00ea481c2eed84de12815eedb079e965aa2ee701 common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 769410d474fdc0ecbd63c7fe8944b2f6d23d5e5a Diff: https://reviews.apache.org/r/25404/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25404: NPE while reading null decimal value
On Sept. 6, 2014, 12:03 a.m., Lars Francke wrote: Good find! Only two minor style issues. +1 Thanks! Sorry I wasn't paying enough attention to long lines - will fix those. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25404/#review52529 --- On Sept. 5, 2014, 11:54 p.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25404/ --- (Updated Sept. 5, 2014, 11:54 p.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-8008 https://issues.apache.org/jira/browse/HIVE-8008 Repository: hive-git Description --- Say you have this table dec_test: dec decimal(10,0) If the table has a row that is 99.5, and if we do select * from dec_test; it will crash with NPE: 2014-09-05 14:08:56,023 ERROR [main]: CliDriver (SessionState.java:printError(545)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137) ... 12 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39) at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87) ... 19 more Diffs - common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 00ea481c2eed84de12815eedb079e965aa2ee701 common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 769410d474fdc0ecbd63c7fe8944b2f6d23d5e5a Diff: https://reviews.apache.org/r/25404/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 25404: NPE while reading null decimal value
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25404/ --- (Updated Sept. 6, 2014, 12:44 a.m.) Review request for hive, Brock Noland and Xuefu Zhang. Changes --- Fixing style issue: wrapping long lines. Bugs: HIVE-8008 https://issues.apache.org/jira/browse/HIVE-8008 Repository: hive-git Description --- Say you have this table dec_test: dec decimal(10,0) If the table has a row that is 99.5, and if we do select * from dec_test; it will crash with NPE: 2014-09-05 14:08:56,023 ERROR [main]: CliDriver (SessionState.java:printError(545)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:151) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1531) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:90) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:544) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:536) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137) ... 12 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:265) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:71) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:70) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:39) at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:87) ... 19 more Diffs (updated) - common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 00ea481c2eed84de12815eedb079e965aa2ee701 common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 769410d474fdc0ecbd63c7fe8944b2f6d23d5e5a Diff: https://reviews.apache.org/r/25404/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 24297: Spark Explain should give useful information on dependencies
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24297/ --- (Updated Aug. 5, 2014, 6:09 a.m.) Review request for hive and Brock Noland. Bugs: HIVE-7607 https://issues.apache.org/jira/browse/HIVE-7607 Repository: hive-git Description --- Currently, when using Explain under Spark mode, it displays dependency information like this: STAGE PLANS: Stage: Stage-1 Spark Edges: Reducer 2 [org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@29a09c49, org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@6f7491f8] DagName: chao_20140804145151_acc57d5a-27fa-44c0-aabc-052b318ed832:2 I think it should be improved by giving more information on the dependencies, such as work information and edge type. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1 Diff: https://reviews.apache.org/r/24297/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 24297: Spark Explain should give useful information on dependencies
On Aug. 5, 2014, 6:15 a.m., Brock Noland wrote: Hi Chao, Can you share what the output looks like with the patch? Hive has thousands of .q file tests (https://github.com/apache/hive/tree/trunk/ql/src/test/queries/clientpositive) and most of them do an EXPLAIN. Thus I think this change might modify quite a few .q file tests. In which case it might be better to do a smaller change which only impacts Spark. I worried about that too - after a little grep I found it might affect quite a few places in MapWork. Perhaps we'll have to do something as Tez does. Sorry, I'll make change to it. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24297/#review49570 --- On Aug. 5, 2014, 6:09 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24297/ --- (Updated Aug. 5, 2014, 6:09 a.m.) Review request for hive and Brock Noland. Bugs: HIVE-7607 https://issues.apache.org/jira/browse/HIVE-7607 Repository: hive-git Description --- Currently, when using Explain under Spark mode, it displays dependency information like this: STAGE PLANS: Stage: Stage-1 Spark Edges: Reducer 2 [org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@29a09c49, org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@6f7491f8] DagName: chao_20140804145151_acc57d5a-27fa-44c0-aabc-052b318ed832:2 I think it should be improved by giving more information on the dependencies, such as work information and edge type. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1 Diff: https://reviews.apache.org/r/24297/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 24297: Spark Explain should give useful information on dependencies
On Aug. 5, 2014, 6:15 a.m., Brock Noland wrote: Hi Chao, Can you share what the output looks like with the patch? Hive has thousands of .q file tests (https://github.com/apache/hive/tree/trunk/ql/src/test/queries/clientpositive) and most of them do an EXPLAIN. Thus I think this change might modify quite a few .q file tests. In which case it might be better to do a smaller change which only impacts Spark. Chao Sun wrote: I worried about that too - after a little grep I found it might affect quite a few places in MapWork. Perhaps we'll have to do something as Tez does. Sorry, I'll make change to it. Hi Brock, I've updated the patch. It basically is the same as what Tez does. Another option would be to override toString() in SparkWork, without modifying ExplainWork. Please let me know which one you think is better. Thanks. - Chao --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24297/#review49570 --- On Aug. 5, 2014, 6:09 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24297/ --- (Updated Aug. 5, 2014, 6:09 a.m.) Review request for hive and Brock Noland. Bugs: HIVE-7607 https://issues.apache.org/jira/browse/HIVE-7607 Repository: hive-git Description --- Currently, when using Explain under Spark mode, it displays dependency information like this: STAGE PLANS: Stage: Stage-1 Spark Edges: Reducer 2 [org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@29a09c49, org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@6f7491f8] DagName: chao_20140804145151_acc57d5a-27fa-44c0-aabc-052b318ed832:2 I think it should be improved by giving more information on the dependencies, such as work information and edge type. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1 Diff: https://reviews.apache.org/r/24297/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 24297: Spark Explain should give useful information on dependencies
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24297/ --- (Updated Aug. 5, 2014, 5:43 p.m.) Review request for hive and Brock Noland. Bugs: HIVE-7607 https://issues.apache.org/jira/browse/HIVE-7607 Repository: hive-git Description --- Currently, when using Explain under Spark mode, it displays dependency information like this: STAGE PLANS: Stage: Stage-1 Spark Edges: Reducer 2 [org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@29a09c49, org.apache.hadoop.hive.ql.plan.SparkWork$Dependency@6f7491f8] DagName: chao_20140804145151_acc57d5a-27fa-44c0-aabc-052b318ed832:2 I think it should be improved by giving more information on the dependencies, such as work information and edge type. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1 Diff: https://reviews.apache.org/r/24297/diff/ Testing --- Thanks, Chao Sun
Review Request 24352: StarterProject: Fix exception handling in POC code
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24352/ --- Review request for hive and Brock Noland. Repository: hive-git Description --- The POC code just printed exceptions to stderr. We should either: 1) LOG at INFO/WARN/ERROR 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 20a1938 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 358cbc7 Diff: https://reviews.apache.org/r/24352/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 24352: StarterProject: Fix exception handling in POC code
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24352/ --- (Updated Aug. 6, 2014, 4:53 a.m.) Review request for hive and Brock Noland. Changes --- Hi Brock, Thanks for the comments! I've addressed these issues and updated the patch. Please take a look. Bugs: HIVE-7560 https://issues.apache.org/jira/browse/HIVE-7560 Repository: hive-git Description --- The POC code just printed exceptions to stderr. We should either: 1) LOG at INFO/WARN/ERROR 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 20a1938 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 358cbc7 Diff: https://reviews.apache.org/r/24352/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 24195: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24195/ --- (Updated Aug. 4, 2014, 9:50 p.m.) Review request for hive. Changes --- Hi Brock, Thanks for the suggestions! I've updated the diff. Please have a look. :) Bugs: HIVE-7561 https://issues.apache.org/jira/browse/HIVE-7561 Repository: hive-git Description --- Hive uses the assert keyword all over the place. The problem is that assertions are rarely enabled since they have to be specifically enabled. In the Spark code, e.g. GenSparkUtils, let's use Preconditions.*. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java 8c58333 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 25eea14 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ceb7b6c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 3a0f4c9 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 86d14f1 Diff: https://reviews.apache.org/r/24195/diff/ Testing --- Thanks, Chao Sun
Review Request 24195: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24195/ --- Review request for hive. Repository: hive-git Description --- Hive uses the assert keyword all over the place. The problem is that assertions are rarely enabled since they have to be specifically enabled. In the Spark code, e.g. GenSparkUtils, let's use Preconditions.*. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java 8c58333 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 25eea14 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ceb7b6c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 3a0f4c9 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 86d14f1 Diff: https://reviews.apache.org/r/24195/diff/ Testing --- Thanks, Chao Sun
Re: Review Request 24195: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24195/ --- (Updated Aug. 1, 2014, 11:45 p.m.) Review request for hive. Bugs: HIVE-7561 https://issues.apache.org/jira/browse/HIVE-7561 Repository: hive-git Description --- Hive uses the assert keyword all over the place. The problem is that assertions are rarely enabled since they have to be specifically enabled. In the Spark code, e.g. GenSparkUtils, let's use Preconditions.*. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkReduceSinkMapJoinProc.java 8c58333 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 25eea14 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java ceb7b6c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java 3a0f4c9 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 86d14f1 Diff: https://reviews.apache.org/r/24195/diff/ Testing --- Thanks, Chao Sun
Review Request 24127: Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24127/ --- Review request for hive. Repository: hive-git Description --- An attempt to fix the last patch by moving groupBy op to ShuffleTran. Also, since now SparkTran::transform may have input/output value types other than BytesWritable, we need to make it generic as well.. Also added a CompTran class, which is basically a composition of transformations. It offers better type compatibility than ChainedTran. This is NOT the perfect solution, and may subject to further change. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ChainedTran.java 4991568 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CompTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 01a70e9 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 841db87 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 98d08e6 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java d1af86d ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java 33e7d45 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java cf85af1 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 440dd93 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 6aa732f Diff: https://reviews.apache.org/r/24127/diff/ Testing --- Thanks, Chao Sun
Review Request 23530: HIVE-6560: varchar and char types cannot be cast to binary
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23530/ --- Review request for hive. Bugs: HIVE-6560 https://issues.apache.org/jira/browse/HIVE-6560 Repository: hive-git Description --- HIVE-6560: varchar and char types cannot be cast to binary Diffs - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToBinary.java b31b81b ql/src/test/queries/clientpositive/udf_binary.q PRE-CREATION ql/src/test/results/clientpositive/udf_binary.q.out PRE-CREATION Diff: https://reviews.apache.org/r/23530/diff/ Testing --- N/A Thanks, Chao Sun