[jira] Commented: (HADOOP-3813) RPC queue overload of JobTracker

Hadoop QA (JIRA) Wed, 23 Jul 2008 19:08:54 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616328#action_12616328
 ]


Hadoop QA commented on HADOOP-3813:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12386711/patch-3813.txt
  against trunk revision 679202.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified 
tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2929/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2929/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2929/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2929/console

This message is automatically generated.

> RPC queue overload of JobTracker
> --------------------------------
>
>                 Key: HADOOP-3813
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3813
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-3813.txt
>
>
> On a cluster with about 1700 nodes, when a job with about 100,000 maps and 
> 10,000 reduces completed, the JobTracker, even with 80 handlers, could not 
> handle the rpc call load during promotion of the job, such that at the end, 
> because of the discarded heartbeats, the JobTracker lost nearly all 
> TaskTrackers (about 10 TaskTrackers left). Promotion took more than 40 
> minutes.
> They reconnected and everything recovered, but this might have been just luck.
> Shouldn't there be an adaptive throttling of the rate in heartbeats and 
> TaskCompletionEvents?
> Sample messsages:
> 2008-07-22 18:21:55,831 WARN org.apache.hadoop.ipc.Server: Call queue 
> overflow discarding oldest call heartbeat([EMAIL PROTECTED], false, true, 
> 18137) from xxx
> 2008-07-22 18:21:55,834WARN org.apache.hadoop.ipc.Server: Call queue overflow 
> discarding oldest call getTaskCompletionEvents(job_200807190635_0012, 119567, 
> 50) from yyy
> ...
> 2008-07-22 19:02:28,821 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 1 on 9020, call heartbeat([EMAIL PROTECTED], false, true, 18199) from zzz: 
> discarded for being too old (40936)
> 2008-07-22 19:02:28,821 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 34 on 9020, call getTaskCompletionEvents(job_200807190635_0012, 119567, 50) 
> from uuu: discarded for being too old (40978)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3813) RPC queue overload of JobTracker

Reply via email to