[
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570645#comment-13570645
]
Gunther Hagleitner commented on HIVE-2340:
------------------------------------------
[~navis]: I think in general the logic should be to copy numReducers from
parent to child not the other way around. If hive makes a decent estimate of
reducers for the parent, that's probably the number you want to carry into the
combined reduce stage, because that means each reducer is doing the desired
amount of work. Buckets and order by are the only special cases I can think of,
where the number needs to be fixed.
For those special cases without knowing the cardinalities of join/group
by/tables, it's indeed difficult to guess if the optimization should be on or
off. However, what do you think of using a max ratio of parent reducers/child
reducers instead of a fixed minimum number of reducers for the child? With a
default of 4 maybe. I.e.: If there are less than 4 times as many reducers in
the parent than in the child collapse (assuming another job will be more
expensive than the lower number of reducers), else leave it alone. The
optimization is only good if the input sizes of the child and parent reducers
are similar and expressing this as a ratio of number of reducers is probably
the closest we can get right now.
This would enable the optimization for a larger body of queries (small tables,
single input split, empty group by expr, etc).
> optimize orderby followed by a groupby
> --------------------------------------
>
> Key: HIVE-2340
> URL: https://issues.apache.org/jira/browse/HIVE-2340
> Project: Hive
> Issue Type: Sub-task
> Components: Query Processor
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Labels: perfomance
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt,
> HIVE-2340.D1209.10.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch,
> HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY
> optimizer(cluster-by following group-by).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira