[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns

2015-01-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14282246#comment-14282246
 ] 

Ashutosh Chauhan commented on HIVE-4809:


+1

 ReduceSinkOperator of PTFOperator can have redundant key columns
 

 Key: HIVE-4809
 URL: https://issues.apache.org/jira/browse/HIVE-4809
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Affects Versions: 0.11.0
Reporter: Yin Huai
Assignee: Navis
 Attachments: HIVE-4809.1.patch.txt


 For example, we have a simple query like this ...
 {code:sql}
 SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
 {\code}
 The plan of it is ...
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 x 
   TableScan
 alias: x
 Reduce Output Operator
   key expressions:
 expr: a
 type: int
 expr: a
 type: int
   sort order: ++
   Map-reduce partition columns:
 expr: a
 type: int
   tag: -1
   value expressions:
 expr: a
 type: int
 expr: b
 type: string
   Reduce Operator Tree:
 Extract
   PTF Operator
 Select Operator
   expressions:
 expr: _col0
 type: int
 expr: _col1
 type: string
 expr: _wcol0
 type: bigint
   outputColumnNames: _col0, _col1, _col2
   File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {\code}
 The ReduceSinkOperator has two a in its key columns. This redundancy can 
 increase the size of map output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns

2015-01-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281680#comment-14281680
 ] 

Ashutosh Chauhan commented on HIVE-4809:


Can you create a RB for this ?

 ReduceSinkOperator of PTFOperator can have redundant key columns
 

 Key: HIVE-4809
 URL: https://issues.apache.org/jira/browse/HIVE-4809
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Affects Versions: 0.11.0
Reporter: Yin Huai
Assignee: Navis
 Attachments: HIVE-4809.1.patch.txt


 For example, we have a simple query like this ...
 {code:sql}
 SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
 {\code}
 The plan of it is ...
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 x 
   TableScan
 alias: x
 Reduce Output Operator
   key expressions:
 expr: a
 type: int
 expr: a
 type: int
   sort order: ++
   Map-reduce partition columns:
 expr: a
 type: int
   tag: -1
   value expressions:
 expr: a
 type: int
 expr: b
 type: string
   Reduce Operator Tree:
 Extract
   PTF Operator
 Select Operator
   expressions:
 expr: _col0
 type: int
 expr: _col1
 type: string
 expr: _wcol0
 type: bigint
   outputColumnNames: _col0, _col1, _col2
   File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {\code}
 The ReduceSinkOperator has two a in its key columns. This redundancy can 
 increase the size of map output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns

2015-01-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281613#comment-14281613
 ] 

Hive QA commented on HIVE-4809:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12692892/HIVE-4809.1.patch.txt

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 7231 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-script_pipe.q-insert_values_non_partitioned.q-insert_update_delete.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-scriptfile1.q-union2.q-vectorized_bucketmapjoin1.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_decimal_10_0.q-vector_decimal_trailing.q-lvj_mapjoin.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_partitioned_date_time.q-vector_non_string_partition.q-tez_dml.q-and-12-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestNegativeMinimrCliDriver-mapreduce_stack_trace_hadoop20.q - did not produce 
a TEST-*.xml file
TestNegativeMinimrCliDriver-udf_local_resource.q-mapreduce_stack_trace_turnoff_hadoop20.q-mapreduce_stack_trace.q-and-5-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2409/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2409/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2409/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12692892 - PreCommit-HIVE-TRUNK-Build

 ReduceSinkOperator of PTFOperator can have redundant key columns
 

 Key: HIVE-4809
 URL: https://issues.apache.org/jira/browse/HIVE-4809
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Affects Versions: 0.11.0
Reporter: Yin Huai
Assignee: Navis
 Attachments: HIVE-4809.1.patch.txt


 For example, we have a simple query like this ...
 {code:sql}
 SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
 {\code}
 The plan of it is ...
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 x 
   TableScan
 alias: x
 Reduce Output Operator
   key expressions:
 expr: a
 type: int
 expr: a
 type: int
   sort order: ++
   Map-reduce partition columns:
 expr: a
 type: int
   tag: -1
   value expressions:
 expr: a
 type: int
 expr: b
 type: string
   Reduce Operator Tree:
 Extract
   PTF Operator
 Select Operator
   expressions:
 expr: _col0
 type: int
 expr: _col1
 type: string
 expr: _wcol0
 type: bigint
   outputColumnNames: _col0, _col1, _col2
   File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {\code}
 The ReduceSinkOperator has two a in its key columns. This redundancy can 
 increase the size of map output.



--
This message was sent by 

[jira] [Commented] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns

2013-07-03 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699294#comment-13699294
 ] 

Yin Huai commented on HIVE-4809:


For a OVER clause, we can have partitioning columns (specified by PARTITION BY) 
and ordering columns (specified by ORDER BY). In the current implementation, we 
use the key columns of ReduceSinkOperator (RS) to take care both grouping (for 
those partitioning columns) and ordering (for those ordering columns). So, we 
first add all partitioning columns and then add all ordering columns to the key 
columns of the RS. If we do not specify ordering columns, we will use 
partitioning columns as ordering columns. Seems we cannot completely remove 
those duplicate key columns right now (because key columns of RS need to take 
care both grouping and ordering). But, we can optimize certain cases. For 
example, if ordering columns are not specified, we do not assign those 
partition columns to ordering columns.

 ReduceSinkOperator of PTFOperator can have redundant key columns
 

 Key: HIVE-4809
 URL: https://issues.apache.org/jira/browse/HIVE-4809
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai

 For example, we have a simple query like this ...
 {code:sql}
 SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
 {\code}
 The plan of it is ...
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 x 
   TableScan
 alias: x
 Reduce Output Operator
   key expressions:
 expr: a
 type: int
 expr: a
 type: int
   sort order: ++
   Map-reduce partition columns:
 expr: a
 type: int
   tag: -1
   value expressions:
 expr: a
 type: int
 expr: b
 type: string
   Reduce Operator Tree:
 Extract
   PTF Operator
 Select Operator
   expressions:
 expr: _col0
 type: int
 expr: _col1
 type: string
 expr: _wcol0
 type: bigint
   outputColumnNames: _col0, _col1, _col2
   File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {\code}
 The ReduceSinkOperator has two a in its key columns. This redundancy can 
 increase the size of map output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira