[ 
https://issues.apache.org/jira/browse/HADOOP-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HADOOP-1651:
--------------------------------

    Attachment: 1651.2.patch

This patch has some offline comments from Owen incorporated:
1) The InterruptedException causes the progress reporting thread to go away. In 
the current code base, it is just ignored.
2) The call to defaultConf.addFinalResource in TaskTracker.java has been put 
back (removing that might have some interesting & not-very-evident implications 
on the framework's ipc module, and we might break something by removing that 
...)

> Some improvements in progress reporting
> ---------------------------------------
>
>                 Key: HADOOP-1651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1651
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.15.0
>
>         Attachments: 1651.1.patch, 1651.2.patch, 1651.patch
>
>
> Some improvements that can be done:
> 1) Progress reporting interval can be made slightly large. It is currently 1 
> second. Propose to make it 3 seconds to reduce the load on the TaskTracker.
> 2) Progress reports can potentially be missed. In the loop, if the first 
> attempt at reporting a progress doesn't go through, it is not retried. The 
> next communication will be a 'ping'. 
> 3) If there is an exception while reporting progress or doing ping, the 
> client should sleep for sometime before retrying.
> 4) The TaskUmbilicalProtocol client can always stay connected to the server. 
> Currently, the default idle timeout on the IPC client is set to 1000 msec 
> (this means that the client will disconnect if the connection has been idle 
> for 1000 msec). This might lead to unnecessary tearing-down/setting-up of 
> connections for the TaskUmbilicalProtocol and can be avoided by having a high 
> idle timeout for this protocol. The idea behind having the idle timeout was 
> to not hold on to server connections unnecessarily and hence be more scalable 
> when there are 1000s of clients, especially applicable to those protocols 
> involving the JT and the NameNode.  We don't run into scalability issues with 
> TaskUmbilical protocol since it is limited to a few Tasks and the 
> corresponding TaskTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to