[ https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016435#comment-13016435 ]
Edward Capriolo commented on HIVE-2082: --------------------------------------- I am curious as to how this is compatible with https://issues.apache.org/jira/browse/HIVE-1913. > Reduce memory consumption in preparing MapReduce job > ---------------------------------------------------- > > Key: HIVE-2082 > URL: https://issues.apache.org/jira/browse/HIVE-2082 > Project: Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch > > > Hive client side consume a lot of memory when the number of input partitions > is large. One reason is that each partition maintains a list of FieldSchema > which are intended to deal with schema evolution. However they are not used > currently and Hive uses the table level schema for all partitions. This will > be fixed in HIVE-2050. The memory consumption by this part will be reduced by > almost half (1.2GB to 700BM for 20k partitions). > Another large chunk of memory consumption is in the MapReduce job setup phase > when a PartitionDesc is created from each Partition object. A property object > is maintained in PartitionDesc which contains a full list of columns and > types. Due to the same reason, these should be the same as in the table level > schema. Also the deserializer initialization takes large amount of memory, > which should be avoided. My initial testing for these optimizations cut the > memory consumption in half (700MB to 300MB for 20k partitions). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira