[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns
[ https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282246#comment-14282246 ] Ashutosh Chauhan commented on HIVE-4809: +1 > ReduceSinkOperator of PTFOperator can have redundant key columns > > > Key: HIVE-4809 > URL: https://issues.apache.org/jira/browse/HIVE-4809 > Project: Hive > Issue Type: Improvement > Components: PTF-Windowing >Affects Versions: 0.11.0 >Reporter: Yin Huai >Assignee: Navis > Attachments: HIVE-4809.1.patch.txt > > > For example, we have a simple query like this ... > {code:sql} > SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x; > {\code} > The plan of it is ... > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > x > TableScan > alias: x > Reduce Output Operator > key expressions: > expr: a > type: int > expr: a > type: int > sort order: ++ > Map-reduce partition columns: > expr: a > type: int > tag: -1 > value expressions: > expr: a > type: int > expr: b > type: string > Reduce Operator Tree: > Extract > PTF Operator > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: string > expr: _wcol0 > type: bigint > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {\code} > The ReduceSinkOperator has two "a" in its key columns. This redundancy can > increase the size of map output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns
[ https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281680#comment-14281680 ] Ashutosh Chauhan commented on HIVE-4809: Can you create a RB for this ? > ReduceSinkOperator of PTFOperator can have redundant key columns > > > Key: HIVE-4809 > URL: https://issues.apache.org/jira/browse/HIVE-4809 > Project: Hive > Issue Type: Improvement > Components: PTF-Windowing >Affects Versions: 0.11.0 >Reporter: Yin Huai >Assignee: Navis > Attachments: HIVE-4809.1.patch.txt > > > For example, we have a simple query like this ... > {code:sql} > SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x; > {\code} > The plan of it is ... > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > x > TableScan > alias: x > Reduce Output Operator > key expressions: > expr: a > type: int > expr: a > type: int > sort order: ++ > Map-reduce partition columns: > expr: a > type: int > tag: -1 > value expressions: > expr: a > type: int > expr: b > type: string > Reduce Operator Tree: > Extract > PTF Operator > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: string > expr: _wcol0 > type: bigint > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {\code} > The ReduceSinkOperator has two "a" in its key columns. This redundancy can > increase the size of map output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns
[ https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281613#comment-14281613 ] Hive QA commented on HIVE-4809: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12692892/HIVE-4809.1.patch.txt {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 7231 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-script_pipe.q-insert_values_non_partitioned.q-insert_update_delete.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-scriptfile1.q-union2.q-vectorized_bucketmapjoin1.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_decimal_10_0.q-vector_decimal_trailing.q-lvj_mapjoin.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_partitioned_date_time.q-vector_non_string_partition.q-tez_dml.q-and-12-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestNegativeMinimrCliDriver-mapreduce_stack_trace_hadoop20.q - did not produce a TEST-*.xml file TestNegativeMinimrCliDriver-udf_local_resource.q-mapreduce_stack_trace_turnoff_hadoop20.q-mapreduce_stack_trace.q-and-5-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2409/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2409/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2409/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12692892 - PreCommit-HIVE-TRUNK-Build > ReduceSinkOperator of PTFOperator can have redundant key columns > > > Key: HIVE-4809 > URL: https://issues.apache.org/jira/browse/HIVE-4809 > Project: Hive > Issue Type: Improvement > Components: PTF-Windowing >Affects Versions: 0.11.0 >Reporter: Yin Huai >Assignee: Navis > Attachments: HIVE-4809.1.patch.txt > > > For example, we have a simple query like this ... > {code:sql} > SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x; > {\code} > The plan of it is ... > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > x > TableScan > alias: x > Reduce Output Operator > key expressions: > expr: a > type: int > expr: a > type: int > sort order: ++ > Map-reduce partition columns: > expr: a > type: int > tag: -1 > value expressions: > expr: a > type: int > expr: b > type: string > Reduce Operator Tree: > Extract > PTF Operator > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: string > expr: _wcol0 > type: bigint > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {\code} > The ReduceSinkOperator has two "a" in its key columns. This redundan
[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns
[ https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699294#comment-13699294 ] Yin Huai commented on HIVE-4809: For a OVER clause, we can have partitioning columns (specified by PARTITION BY) and ordering columns (specified by ORDER BY). In the current implementation, we use the key columns of ReduceSinkOperator (RS) to take care both grouping (for those partitioning columns) and ordering (for those ordering columns). So, we first add all partitioning columns and then add all ordering columns to the key columns of the RS. If we do not specify ordering columns, we will use partitioning columns as ordering columns. Seems we cannot completely remove those duplicate key columns right now (because key columns of RS need to take care both grouping and ordering). But, we can optimize certain cases. For example, if ordering columns are not specified, we do not assign those partition columns to ordering columns. > ReduceSinkOperator of PTFOperator can have redundant key columns > > > Key: HIVE-4809 > URL: https://issues.apache.org/jira/browse/HIVE-4809 > Project: Hive > Issue Type: Improvement >Reporter: Yin Huai >Assignee: Yin Huai > > For example, we have a simple query like this ... > {code:sql} > SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x; > {\code} > The plan of it is ... > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > x > TableScan > alias: x > Reduce Output Operator > key expressions: > expr: a > type: int > expr: a > type: int > sort order: ++ > Map-reduce partition columns: > expr: a > type: int > tag: -1 > value expressions: > expr: a > type: int > expr: b > type: string > Reduce Operator Tree: > Extract > PTF Operator > Select Operator > expressions: > expr: _col0 > type: int > expr: _col1 > type: string > expr: _wcol0 > type: bigint > outputColumnNames: _col0, _col1, _col2 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {\code} > The ReduceSinkOperator has two "a" in its key columns. This redundancy can > increase the size of map output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira