[ 
https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016515#comment-13016515
 ] 

Namit Jain commented on HIVE-2082:
----------------------------------

minor comments in review board

> Reduce memory consumption in preparing MapReduce job
> ----------------------------------------------------
>
>                 Key: HIVE-2082
>                 URL: https://issues.apache.org/jira/browse/HIVE-2082
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch
>
>
> Hive client side consume a lot of memory when the number of input partitions 
> is large. One reason is that each partition maintains a list of FieldSchema 
> which are intended to deal with schema evolution. However they are not used 
> currently and Hive uses the table level schema for all partitions. This will 
> be fixed in HIVE-2050. The memory consumption by this part will be reduced by 
> almost half (1.2GB to 700BM for 20k partitions). 
> Another large chunk of memory consumption is in the MapReduce job setup phase 
> when a PartitionDesc is created from each Partition object. A property object 
> is maintained in PartitionDesc which contains a full list of columns and 
> types. Due to the same reason, these should be the same as in the table level 
> schema. Also the deserializer initialization takes large amount of memory, 
> which should be avoided. My initial testing for these optimizations cut the 
> memory consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to