[
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carl Steinbach updated HIVE-2082:
---------------------------------
Component/s: Query Processor
Fix Version/s: 0.8.0
> Reduce memory consumption in preparing MapReduce job
> ----------------------------------------------------
>
> Key: HIVE-2082
> URL: https://issues.apache.org/jira/browse/HIVE-2082
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Ning Zhang
> Assignee: Ning Zhang
> Fix For: 0.8.0
>
> Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions
> is large. One reason is that each partition maintains a list of FieldSchema
> which are intended to deal with schema evolution. However they are not used
> currently and Hive uses the table level schema for all partitions. This will
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by
> almost half (1.2GB to 700BM for 20k partitions).
> Another large chunk of memory consumption is in the MapReduce job setup phase
> when a PartitionDesc is created from each Partition object. A property object
> is maintained in PartitionDesc which contains a full list of columns and
> types. Due to the same reason, these should be the same as in the table level
> schema. Also the deserializer initialization takes large amount of memory,
> which should be avoided. My initial testing for these optimizations cut the
> memory consumption in half (700MB to 300MB for 20k partitions).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira