[jira] Commented: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete

Devaraj Das (JIRA) Sat, 17 Jan 2009 00:24:25 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664806#action_12664806
 ]


Devaraj Das commented on HADOOP-4766:
-------------------------------------

After having done some experiments and talking offline to some folks, I have 
begun to think that FOR NOW the better way to solve this issue is to keep a 
track of number of total tasks in memory (as Runping & Koji had suggested 
earlier). Things like freeMemory, totalMemory seems to be quite dependent on 
the GC in use. Also it is not guaranteed that the usedMemory will go down once 
some jobs are removed since it is entirely up to the JVM when to actually do a 
full GC. Until that happens, we might end up in a situation where we purge 
every newly completed job even though memory is available (since the check for 
totalMemory - freeMemory in the patch might still show memory usage to be above 
the threshold). This would mean that users would start hitting the JobHistory 
for viewing jobs much more frequently. JobHistory file-loading/parsing is quite 
an expensive operation and should be avoided if possible. 
As a follow-up jira, we might move the JobHistory server to a completely 
different process outside the JobTracker, and always purge completed jobs 
immediately. That would keep the UI aspects for serving completed jobs 
completely outside the JobTracker and should really help the JobTracker overall.

> Hadoop performance degrades significantly as more and more jobs complete
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-4766
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4766
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.2, 0.19.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Blocker
>         Attachments: HADOOP-4766-v1.patch, HADOOP-4766-v2.10.patch, 
> HADOOP-4766-v2.4.patch, HADOOP-4766-v2.6.patch, HADOOP-4766-v2.7-0.18.patch, 
> HADOOP-4766-v2.7-0.19.patch, HADOOP-4766-v2.7.patch, 
> HADOOP-4766-v2.8-0.18.patch, HADOOP-4766-v2.8-0.19.patch, 
> HADOOP-4766-v2.8.patch, map_scheduling_rate.txt
>
>
> When I ran the gridmix 2 benchmark load on a fresh cluster of 500 nodes with 
> hadoop trunk, 
> the gridmix load, consisting of 202 map/reduce jobs of various sizes, 
> completed in 32 minutes. 
> Then I ran the same set of the jobs on the same cluster, yhey completed in 43 
> minutes.
> When I ran them the third times, it took (almost) forever --- the job tracker 
> became non-responsive.
> The job  tracker's heap size was set to 2GB. 
> The cluster is configured to keep up to 500 jobs in memory.
> The job tracker kept one cpu busy all the time. Look like it was due to GC.
> I believe the release 0.18/0.19 have the similar behavior.
> I believe 0.18 and 0.18 also have the similar behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete

Reply via email to