[
https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Sherman reassigned HIVE-17935:
-
Assignee: (was: Andrew Sherman)
> Turn on hive.optimize.sort.dynamic.partition by default
> ---
>
> Key: HIVE-17935
> URL: https://issues.apache.org/jira/browse/HIVE-17935
> Project: Hive
> Issue Type: Bug
>Reporter: Andrew Sherman
>Priority: Major
> Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch,
> HIVE-17935.3.patch, HIVE-17935.4.patch, HIVE-17935.5.patch,
> HIVE-17935.6.patch, HIVE-17935.7.patch, HIVE-17935.8.patch
>
>
> The config option hive.optimize.sort.dynamic.partition is an optimization for
> Hive’s dynamic partitioning feature. It was originally implemented in
> [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this
> optimization, the dynamic partition columns and bucketing columns (in case of
> bucketed tables) are sorted before being fed to the reducers. Since the
> partitioning and bucketing columns are sorted, each reducer can keep only one
> record writer open at any time thereby reducing the memory pressure on the
> reducers. There were some early problems with this optimization and it was
> disabled by default in HiveConf in
> [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then
> setting hive.optimize.sort.dynamic.partition=true has been used to solve
> problems where dynamic partitioning produces with (1) too many small files on
> HDFS, which is bad for the cluster and can increase overhead for future Hive
> queries over those partitions, and (2) OOM issues in the map tasks because it
> trying to simultaneously write to 100 different files.
> It now seems that the feature is probably mature enough that it can be
> enabled by default.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)