[
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966046#action_12966046
]
Sreekanth Ramakrishnan commented on HIVE-1695:
----------------------------------------------
Group By operator Plan:
{noformat}
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF test a) (TOK_TABREF test1 b) (= (.
(TOK_TABLE_OR_COL a) key) (. (TOK_TABLE_OR_COL b) key)))) (TOK_INSERT
(TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_HINTLIST (TOK_HINT
TOK_MAPJOIN (TOK_HINTARGLIST b))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) key)))
(TOK_GROUPBY (. (TOK_TABLE_OR_COL a) key))))
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-1 depends on stages: Stage-4
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
b
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
b
TableScan
alias: b
HashTable Sink Operator
condition expressions:
0 {key}
1
handleSkewJoin: false
keys:
0 [Column[key]]
1 [Column[key]]
Position of Big Table: 0
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
a
TableScan
alias: a
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {key}
1
handleSkewJoin: false
keys:
0 [Column[key]]
1 [Column[key]]
outputColumnNames: _col0
Position of Big Table: 0
Reduce Output Operator
key expressions:
expr: _col0
type: int
sort order: +
Map-reduce partition columns:
expr: _col0
type: int
tag: -1
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Group By Operator
bucketGroup: false
keys:
expr: KEY._col0
type: int
mode: mergepartial
outputColumnNames: _col0
Select Operator
expressions:
expr: _col0
type: int
outputColumnNames: _col0
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
{noformat}
I have successfully got group by and order by working with new NodeProcessor
Which I implemented, cross checked it with the results from before and after
the plan alterations were done.
> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
> Key: HIVE-1695
> URL: https://issues.apache.org/jira/browse/HIVE-1695
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Amareshwari Sriramadasu
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map
> only job followed by a Map-Reduce job. It can be combined into single
> MapReduce Job.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.