[ 
https://issues.apache.org/jira/browse/HADOOP-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515340
 ] 

Vivek Ratan commented on HADOOP-1651:
-------------------------------------

the pseudo-code in my previous comment was not formatted correctly. Here it is 
again: 

{code}
while (1) {
  sleep();
  try { 
    call sendProgress() or ping(); 
  }
  catch() { 
    set progressFlag to whatever it was at beginning of try block; 
    decrement retry count; quit if we've retried enough 
  }
}
{code}

> Some improvements in progress reporting
> ---------------------------------------
>
>                 Key: HADOOP-1651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1651
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.15.0
>
>         Attachments: 1651.patch
>
>
> Some improvements that can be done:
> 1) Progress reporting interval can be made slightly large. It is currently 1 
> second. Propose to make it 3 seconds to reduce the load on the TaskTracker.
> 2) Progress reports can potentially be missed. In the loop, if the first 
> attempt at reporting a progress doesn't go through, it is not retried. The 
> next communication will be a 'ping'. 
> 3) If there is an exception while reporting progress or doing ping, the 
> client should sleep for sometime before retrying.
> 4) The TaskUmbilicalProtocol client can always stay connected to the server. 
> Currently, the default idle timeout on the IPC client is set to 1000 msec 
> (this means that the client will disconnect if the connection has been idle 
> for 1000 msec). This might lead to unnecessary tearing-down/setting-up of 
> connections for the TaskUmbilicalProtocol and can be avoided by having a high 
> idle timeout for this protocol. The idea behind having the idle timeout was 
> to not hold on to server connections unnecessarily and hence be more scalable 
> when there are 1000s of clients, especially applicable to those protocols 
> involving the JT and the NameNode.  We don't run into scalability issues with 
> TaskUmbilical protocol since it is limited to a few Tasks and the 
> corresponding TaskTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to