[jira] [Assigned] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default

2018-12-11 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned HIVE-17935:
-

Assignee: (was: Andrew Sherman)

> Turn on hive.optimize.sort.dynamic.partition by default
> ---
>
> Key: HIVE-17935
> URL: https://issues.apache.org/jira/browse/HIVE-17935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Priority: Major
> Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch, 
> HIVE-17935.3.patch, HIVE-17935.4.patch, HIVE-17935.5.patch, 
> HIVE-17935.6.patch, HIVE-17935.7.patch, HIVE-17935.8.patch
>
>
> The config option hive.optimize.sort.dynamic.partition is an optimization for 
> Hive’s dynamic partitioning feature. It was originally implemented in 
> [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this 
> optimization, the dynamic partition columns and bucketing columns (in case of 
> bucketed tables) are sorted before being fed to the reducers. Since the 
> partitioning and bucketing columns are sorted, each reducer can keep only one 
> record writer open at any time thereby reducing the memory pressure on the 
> reducers. There were some early problems with this optimization and it was 
> disabled by default in HiveConf in 
> [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then 
> setting hive.optimize.sort.dynamic.partition=true has been used to solve 
> problems where dynamic partitioning produces with (1) too many small files on 
> HDFS, which is bad for the cluster and can increase overhead for future Hive 
> queries over those partitions, and (2) OOM issues in the map tasks because it 
> trying to simultaneously write to 100 different files. 
> It now seems that the feature is probably mature enough that it can be 
> enabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default

2017-10-30 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned HIVE-17935:
-


> Turn on hive.optimize.sort.dynamic.partition by default
> ---
>
> Key: HIVE-17935
> URL: https://issues.apache.org/jira/browse/HIVE-17935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>
> The config option hive.optimize.sort.dynamic.partition is an optimization for 
> Hive’s dynamic partitioning feature. It was originally implemented in 
> [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this 
> optimization, the dynamic partition columns and bucketing columns (in case of 
> bucketed tables) are sorted before being fed to the reducers. Since the 
> partitioning and bucketing columns are sorted, each reducer can keep only one 
> record writer open at any time thereby reducing the memory pressure on the 
> reducers. There were some early problems with this optimization and it was 
> disabled by default in HiveConf in 
> [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then 
> setting hive.optimize.sort.dynamic.partition=true has been used to solve 
> problems where dynamic partitioning produces with (1) too many small files on 
> HDFS, which is bad for the cluster and can increase overhead for future Hive 
> queries over those partitions, and (2) OOM issues in the map tasks because it 
> trying to simultaneously write to 100 different files. 
> It now seems that the feature is probably mature enough that it can be 
> enabled by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)