[jira] Updated: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete

Amar Kamat (JIRA) Thu, 08 Jan 2009 06:15:26 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Amar Kamat updated HADOOP-4766:
-------------------------------

    Attachment: HADOOP-4766-v2.8-0.18.patch
                HADOOP-4766-v2.8-0.19.patch
                HADOOP-4766-v2.8.patch

Attaching a patch incorporating Devaraj's offline comment. Devaraj pointed out 
that if the JobTracker is running low on memory and  there are some jobs to 
retire then the (old) patch will retire jobs ignoring memory issue and go back 
to sleep. Only in the next pass will it detect that that other completed jobs 
needs to be cleared. The current patch first tries to clear old jobs and if the 
JobTracker runs low on memory then more completed jobs are cleared until the 
memory is under control.

> Hadoop performance degrades significantly as more and more jobs complete
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-4766
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4766
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.2, 0.19.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.18.3, 0.19.1, 0.20.0
>
>         Attachments: HADOOP-4766-v1.patch, HADOOP-4766-v2.4.patch, 
> HADOOP-4766-v2.6.patch, HADOOP-4766-v2.7-0.18.patch, 
> HADOOP-4766-v2.7-0.19.patch, HADOOP-4766-v2.7.patch, 
> HADOOP-4766-v2.8-0.18.patch, HADOOP-4766-v2.8-0.19.patch, 
> HADOOP-4766-v2.8.patch, map_scheduling_rate.txt
>
>
> When I ran the gridmix 2 benchmark load on a fresh cluster of 500 nodes with 
> hadoop trunk, 
> the gridmix load, consisting of 202 map/reduce jobs of various sizes, 
> completed in 32 minutes. 
> Then I ran the same set of the jobs on the same cluster, yhey completed in 43 
> minutes.
> When I ran them the third times, it took (almost) forever --- the job tracker 
> became non-responsive.
> The job  tracker's heap size was set to 2GB. 
> The cluster is configured to keep up to 500 jobs in memory.
> The job tracker kept one cpu busy all the time. Look like it was due to GC.
> I believe the release 0.18/0.19 have the similar behavior.
> I believe 0.18 and 0.18 also have the similar behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete

Reply via email to