Reduce memory consumption in preparing MapReduce job
----------------------------------------------------

                 Key: HIVE-2082
                 URL: https://issues.apache.org/jira/browse/HIVE-2082
             Project: Hive
          Issue Type: Improvement
            Reporter: Ning Zhang
            Assignee: Ning Zhang


Hive client side consume a lot of memory when the number of input partitions is 
large. One reason is that each partition maintains a list of FieldSchema which 
are intended to deal with schema evolution. However they are not used currently 
and Hive uses the table level schema for all partitions. This will be fixed in 
HIVE-2050. The memory consumption by this part will be reduced by almost half 
(1.2GB to 700BM for 20k partitions). 

Another large chunk of memory consumption is in the MapReduce job setup phase 
when a PartitionDesc is created from each Partition object. A property object 
is maintained in PartitionDesc which contains a full list of columns and types. 
Due to the same reason, these should be the same as in the table level schema. 
Also the deserializer initialization takes large amount of memory, which should 
be avoided. My initial testing for these optimizations cut the memory 
consumption in half (700MB to 300MB for 20k partitions). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to