[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns

2015-01-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282246#comment-14282246
 ] 

Ashutosh Chauhan commented on HIVE-4809:


+1

> ReduceSinkOperator of PTFOperator can have redundant key columns
> 
>
> Key: HIVE-4809
> URL: https://issues.apache.org/jira/browse/HIVE-4809
> Project: Hive
>  Issue Type: Improvement
>  Components: PTF-Windowing
>Affects Versions: 0.11.0
>Reporter: Yin Huai
>Assignee: Navis
> Attachments: HIVE-4809.1.patch.txt
>
>
> For example, we have a simple query like this ...
> {code:sql}
> SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
> {\code}
> The plan of it is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> x 
>   TableScan
> alias: x
> Reduce Output Operator
>   key expressions:
> expr: a
> type: int
> expr: a
> type: int
>   sort order: ++
>   Map-reduce partition columns:
> expr: a
> type: int
>   tag: -1
>   value expressions:
> expr: a
> type: int
> expr: b
> type: string
>   Reduce Operator Tree:
> Extract
>   PTF Operator
> Select Operator
>   expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _wcol0
> type: bigint
>   outputColumnNames: _col0, _col1, _col2
>   File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
> {\code}
> The ReduceSinkOperator has two "a" in its key columns. This redundancy can 
> increase the size of map output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns

2015-01-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281680#comment-14281680
 ] 

Ashutosh Chauhan commented on HIVE-4809:


Can you create a RB for this ?

> ReduceSinkOperator of PTFOperator can have redundant key columns
> 
>
> Key: HIVE-4809
> URL: https://issues.apache.org/jira/browse/HIVE-4809
> Project: Hive
>  Issue Type: Improvement
>  Components: PTF-Windowing
>Affects Versions: 0.11.0
>Reporter: Yin Huai
>Assignee: Navis
> Attachments: HIVE-4809.1.patch.txt
>
>
> For example, we have a simple query like this ...
> {code:sql}
> SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
> {\code}
> The plan of it is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> x 
>   TableScan
> alias: x
> Reduce Output Operator
>   key expressions:
> expr: a
> type: int
> expr: a
> type: int
>   sort order: ++
>   Map-reduce partition columns:
> expr: a
> type: int
>   tag: -1
>   value expressions:
> expr: a
> type: int
> expr: b
> type: string
>   Reduce Operator Tree:
> Extract
>   PTF Operator
> Select Operator
>   expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _wcol0
> type: bigint
>   outputColumnNames: _col0, _col1, _col2
>   File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
> {\code}
> The ReduceSinkOperator has two "a" in its key columns. This redundancy can 
> increase the size of map output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns

2015-01-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281613#comment-14281613
 ] 

Hive QA commented on HIVE-4809:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12692892/HIVE-4809.1.patch.txt

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 7231 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-script_pipe.q-insert_values_non_partitioned.q-insert_update_delete.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-scriptfile1.q-union2.q-vectorized_bucketmapjoin1.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_decimal_10_0.q-vector_decimal_trailing.q-lvj_mapjoin.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_partitioned_date_time.q-vector_non_string_partition.q-tez_dml.q-and-12-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestNegativeMinimrCliDriver-mapreduce_stack_trace_hadoop20.q - did not produce 
a TEST-*.xml file
TestNegativeMinimrCliDriver-udf_local_resource.q-mapreduce_stack_trace_turnoff_hadoop20.q-mapreduce_stack_trace.q-and-5-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2409/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2409/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2409/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12692892 - PreCommit-HIVE-TRUNK-Build

> ReduceSinkOperator of PTFOperator can have redundant key columns
> 
>
> Key: HIVE-4809
> URL: https://issues.apache.org/jira/browse/HIVE-4809
> Project: Hive
>  Issue Type: Improvement
>  Components: PTF-Windowing
>Affects Versions: 0.11.0
>Reporter: Yin Huai
>Assignee: Navis
> Attachments: HIVE-4809.1.patch.txt
>
>
> For example, we have a simple query like this ...
> {code:sql}
> SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
> {\code}
> The plan of it is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> x 
>   TableScan
> alias: x
> Reduce Output Operator
>   key expressions:
> expr: a
> type: int
> expr: a
> type: int
>   sort order: ++
>   Map-reduce partition columns:
> expr: a
> type: int
>   tag: -1
>   value expressions:
> expr: a
> type: int
> expr: b
> type: string
>   Reduce Operator Tree:
> Extract
>   PTF Operator
> Select Operator
>   expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _wcol0
> type: bigint
>   outputColumnNames: _col0, _col1, _col2
>   File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
> {\code}
> The ReduceSinkOperator has two "a" in its key columns. This redundan

[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns

2013-07-03 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699294#comment-13699294
 ] 

Yin Huai commented on HIVE-4809:


For a OVER clause, we can have partitioning columns (specified by PARTITION BY) 
and ordering columns (specified by ORDER BY). In the current implementation, we 
use the key columns of ReduceSinkOperator (RS) to take care both grouping (for 
those partitioning columns) and ordering (for those ordering columns). So, we 
first add all partitioning columns and then add all ordering columns to the key 
columns of the RS. If we do not specify ordering columns, we will use 
partitioning columns as ordering columns. Seems we cannot completely remove 
those duplicate key columns right now (because key columns of RS need to take 
care both grouping and ordering). But, we can optimize certain cases. For 
example, if ordering columns are not specified, we do not assign those 
partition columns to ordering columns.

> ReduceSinkOperator of PTFOperator can have redundant key columns
> 
>
> Key: HIVE-4809
> URL: https://issues.apache.org/jira/browse/HIVE-4809
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> For example, we have a simple query like this ...
> {code:sql}
> SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
> {\code}
> The plan of it is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> x 
>   TableScan
> alias: x
> Reduce Output Operator
>   key expressions:
> expr: a
> type: int
> expr: a
> type: int
>   sort order: ++
>   Map-reduce partition columns:
> expr: a
> type: int
>   tag: -1
>   value expressions:
> expr: a
> type: int
> expr: b
> type: string
>   Reduce Operator Tree:
> Extract
>   PTF Operator
> Select Operator
>   expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _wcol0
> type: bigint
>   outputColumnNames: _col0, _col1, _col2
>   File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
> {\code}
> The ReduceSinkOperator has two "a" in its key columns. This redundancy can 
> increase the size of map output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira