[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999218#comment-12999218
 ] 

MengWang commented on MAPREDUCE-2345:
-------------------------------------

jobtracker's memory mainly used for TaskInProgress objects. We submit a Job 
with 100,087 tasks, jt's memory usage as follows:
org.apache.hadoop.mapred.TaskInProgress 
object 100,087
Shallow size 29,625,752
Retained size 325,065,944 (96%)

Our optimization work as follows:
(1)Reduce duplicated strings
   jobtracker stores too many duplicated strings, for example: splitClass name, 
splite locations, counters group name, couters name, display name, jtIdentifier 
of JobID, jobdir of MapOutputFile. 
   we use a StringCache reduced nearly 15% memory.
(2)Counters should be delay initialized
   tips with no attempttask assigned should not create Counters.
(3)Reconstruct completed TIP's counters
   when a task completed, the tip of this task become bigger because of 
counters. To speed up Counters update and lookup, Counters use HashMap and a 
cache, which cost too much memory. So we seperated counter values from Counters 
structure, all tasks share a CounterMap object, which map <CounterGroupName, 
CounterName> -> index of a long array, and every tip store a array of its 
counter values.
   Using this method, JT's memory reduced nearly 50%.

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> To many tasks will eat up a considerable amount of JobTracker's heap space. 
> According to our observation, 50GB heap size can support to 5,000,000 tasks, 
> so we should optimize jobtracker's memory usage for more jobs and tasks. 
> Yourkit java profile show that counters, duplicate strings, Task waste too 
> much memory. Our optimization around these three points reduced jobtracker's 
> memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to