[ 
https://issues.apache.org/jira/browse/HADOOP-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515943
 ] 

Arun C Murthy commented on HADOOP-1651:
---------------------------------------

I don't think we need to log the 'Ping thread started' at *info*, no?

Else, +1 for the patch.

Somewhat related gripe: I'd think it might be useful to stop the {{Task}} -> 
{{TaskTracker}} communication thread on Task completion i.e. at {{Task.done}} 
... what do others think? I'm ok with this being done as a part of a separate 
issue of course.

> Some improvements in progress reporting
> ---------------------------------------
>
>                 Key: HADOOP-1651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1651
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.15.0
>
>         Attachments: 1651.1.patch, 1651.2.patch, 1651.patch
>
>
> Some improvements that can be done:
> 1) Progress reporting interval can be made slightly large. It is currently 1 
> second. Propose to make it 3 seconds to reduce the load on the TaskTracker.
> 2) Progress reports can potentially be missed. In the loop, if the first 
> attempt at reporting a progress doesn't go through, it is not retried. The 
> next communication will be a 'ping'. 
> 3) If there is an exception while reporting progress or doing ping, the 
> client should sleep for sometime before retrying.
> 4) The TaskUmbilicalProtocol client can always stay connected to the server. 
> Currently, the default idle timeout on the IPC client is set to 1000 msec 
> (this means that the client will disconnect if the connection has been idle 
> for 1000 msec). This might lead to unnecessary tearing-down/setting-up of 
> connections for the TaskUmbilicalProtocol and can be avoided by having a high 
> idle timeout for this protocol. The idea behind having the idle timeout was 
> to not hold on to server connections unnecessarily and hence be more scalable 
> when there are 1000s of clients, especially applicable to those protocols 
> involving the JT and the NameNode.  We don't run into scalability issues with 
> TaskUmbilical protocol since it is limited to a few Tasks and the 
> corresponding TaskTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to