[
https://issues.apache.org/jira/browse/HIVE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941454#comment-13941454
]
Vikram Dixit K commented on HIVE-6455:
--------------------------------------
Changes look good. +1 pending tests.
> Scalable dynamic partitioning and bucketing optimization
> --------------------------------------------------------
>
> Key: HIVE-6455
> URL: https://issues.apache.org/jira/browse/HIVE-6455
> Project: Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.13.0
> Reporter: Prasanth J
> Assignee: Prasanth J
> Labels: optimization
> Attachments: HIVE-6455.1.patch, HIVE-6455.1.patch,
> HIVE-6455.10.patch, HIVE-6455.10.patch, HIVE-6455.11.patch,
> HIVE-6455.12.patch, HIVE-6455.13.patch, HIVE-6455.13.patch,
> HIVE-6455.14.patch, HIVE-6455.15.patch, HIVE-6455.16.patch,
> HIVE-6455.17.patch, HIVE-6455.17.patch.txt, HIVE-6455.18.patch,
> HIVE-6455.19.patch, HIVE-6455.2.patch, HIVE-6455.3.patch, HIVE-6455.4.patch,
> HIVE-6455.4.patch, HIVE-6455.5.patch, HIVE-6455.6.patch, HIVE-6455.7.patch,
> HIVE-6455.8.patch, HIVE-6455.9.patch, HIVE-6455.9.patch
>
>
> The current implementation of dynamic partition works by keeping at least one
> record writer open per dynamic partition directory. In case of bucketing
> there can be multispray file writers which further adds up to the number of
> open record writers. The record writers of column oriented file format (like
> ORC, RCFile etc.) keeps some sort of in-memory buffers (value buffer or
> compression buffers) open all the time to buffer up the rows and compress
> them before flushing it to disk. Since these buffers are maintained per
> column basis the amount of constant memory that will required at runtime
> increases as the number of partitions and number of columns per partition
> increases. This often leads to OutOfMemory (OOM) exception in mappers or
> reducers depending on the number of open record writers. Users often tune the
> JVM heapsize (runtime memory) to get over such OOM issues.
> With this optimization, the dynamic partition columns and bucketing columns
> (in case of bucketed tables) are sorted before being fed to the reducers.
> Since the partitioning and bucketing columns are sorted, each reducers can
> keep only one record writer open at any time thereby reducing the memory
> pressure on the reducers. This optimization is highly scalable as the number
> of partition and number of columns per partition increases at the cost of
> sorting the columns.
--
This message was sent by Atlassian JIRA
(v6.2#6252)