[
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572273#comment-13572273
]
Phabricator commented on HIVE-2340:
-----------------------------------
navis has commented on the revision "HIVE-2340 [jira] optimize orderby followed
by a groupby".
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:138
ok.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:787
I wish I could but CommonJoinResolver is a physical optimizer, which means
there is no RS-RS operator tree which could me merged on that stage.
I'm thinking of disabling this optimization if user configured
hive.auto.convert.join=true or hive.auto.convert.join.noconditionaltask=true.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:251
I'll add more explanations on hive-default.xml.template
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java:99
For rules with same cost, DefaultRuleDispatcher selects last one, something
like this,
{code}
if ((cost >= 0) && (cost <= minCost)) {
minCost = cost;
rule = r;
}
{code}
So R2 will be selected.
conf/hive-default.xml.template:1034 It's commented on
https://issues.apache.org/jira/browse/HIVE-2340?focusedCommentId=13568361&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13568361
This optimization merges two RSs by moving key/parts/num-reducers of child-RS
to parent-RS, which means if num-reducer of child-RS is fixed (order by or
forced bucketing) and small, it can resulted to very slow, single MR. For
preventing this, the configuration makes min threshold for applying this
optimization. It's not good enough, but I cannot think of better idea.
REVISION DETAIL
https://reviews.facebook.net/D1209
To: JIRA, navis
Cc: hagleitn, njain
> optimize orderby followed by a groupby
> --------------------------------------
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
> Issue Type: Sub-task
> Components: Query Processor
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt,
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch,
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY
> optimizer(cluster-by following group-by).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira