Some improvements in progress reporting
---------------------------------------

                 Key: HADOOP-1651
                 URL: https://issues.apache.org/jira/browse/HADOOP-1651
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
            Reporter: Devaraj Das
            Assignee: Devaraj Das
             Fix For: 0.14.0


Some improvements that can be done:
1) Progress reporting interval can be made slightly large. It is currently 1 
second. Propose to make it 3 seconds to reduce the load on the TaskTracker.
2) Progress reports can potentially be missed. In the loop, if the first 
attempt at reporting a progress doesn't go through, it is not retried. The next 
communication will be a 'ping'. 3) If there is an exception while reporting 
progress or doing ping, the client should sleep for sometime before retrying.
4) The TaskUmbilicalProtocol client can always stay connected to the server. 
Currently, the default idle timeout on the IPC client is set to 1000 msec (this 
means that the client will disconnect if the connection has been idle for 1000 
msec). This might lead to unnecessary tearing-down/setting-up of connections 
for the TaskUmbilicalProtocol and can be avoided by having a high idle timeout 
for this protocol. The idea behind having the idle timeout was to not hold on 
to server connections unnecessarily and hence be more scalable when there are 
1000s of clients, especially applicable to those protocols involving the JT and 
the NameNode.  We don't run into scalability issues with TaskUmbilical protocol 
since it is limited to a few Tasks and the corresponding TaskTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to