[ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999218#comment-12999218 ]
MengWang commented on MAPREDUCE-2345: ------------------------------------- jobtracker's memory mainly used for TaskInProgress objects. We submit a Job with 100,087 tasks, jt's memory usage as follows: org.apache.hadoop.mapred.TaskInProgress object 100,087 Shallow size 29,625,752 Retained size 325,065,944 (96%) Our optimization work as follows: (1)Reduce duplicated strings jobtracker stores too many duplicated strings, for example: splitClass name, splite locations, counters group name, couters name, display name, jtIdentifier of JobID, jobdir of MapOutputFile. we use a StringCache reduced nearly 15% memory. (2)Counters should be delay initialized tips with no attempttask assigned should not create Counters. (3)Reconstruct completed TIP's counters when a task completed, the tip of this task become bigger because of counters. To speed up Counters update and lookup, Counters use HashMap and a cache, which cost too much memory. So we seperated counter values from Counters structure, all tasks share a CounterMap object, which map <CounterGroupName, CounterName> -> index of a long array, and every tip store a array of its counter values. Using this method, JT's memory reduced nearly 50%. > Optimize jobtracker's memory usage > ------------------------------------- > > Key: MAPREDUCE-2345 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker > Affects Versions: 0.21.0 > Reporter: MengWang > Labels: hadoop > Fix For: 0.23.0 > > Attachments: jt-memory-useage.bmp > > > To many tasks will eat up a considerable amount of JobTracker's heap space. > According to our observation, 50GB heap size can support to 5,000,000 tasks, > so we should optimize jobtracker's memory usage for more jobs and tasks. > Yourkit java profile show that counters, duplicate strings, Task waste too > much memory. Our optimization around these three points reduced jobtracker's > memory to 1/3. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira