[
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030340#comment-14030340
]
Lefty Leverenz commented on HIVE-7158:
--------------------------------------
Does the design doc need guidance about this (or is it time to add Tez
documentation to the user docs)?
* [Hive on Tez | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez]
At a minimum, Configuration Properties needs to document these parameters:
* new parameter: hive.tez.auto.reducer.parallelism
* new parameter: hive.tez.max.partition.factor
* new parameter: hive.tez.min.partition.factor
* new default for [hive.exec.reducers.bytes.per.reducer |
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.bytes.per.reducer]
(with version information)
* new default for [hive.exec.reducers.max |
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.max]
(with version information)
> Use Tez auto-parallelism in Hive
> --------------------------------
>
> Key: HIVE-7158
> URL: https://issues.apache.org/jira/browse/HIVE-7158
> Project: Hive
> Issue Type: Bug
> Reporter: Gunther Hagleitner
> Assignee: Gunther Hagleitner
> Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch,
> HIVE-7158.4.patch, HIVE-7158.5.patch
>
>
> Tez can optionally sample data from a fraction of the tasks of a vertex and
> use that information to choose the number of downstream tasks for any given
> scatter gather edge.
> Hive estimates the count of reducers by looking at stats and estimates for
> each operator in the operator pipeline leading up to the reducer. However, if
> this estimate turns out to be too large, Tez can reign in the resources used
> to compute the reducer.
> It does so by combining partitions of the upstream vertex. It cannot,
> however, add reducers at this stage.
> I'm proposing to let users specify whether they want to use auto-parallelism
> or not. If they do there will be scaling factors to determine max and min
> reducers Tez can choose from. We will then partition by max reducers, letting
> Tez sample and reign in the count up until the specified min.
--
This message was sent by Atlassian JIRA
(v6.2#6252)