[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6224:
---------------------------------
    Attachment: MAPREDUCE-6224.branch-1.000.patch

> resolve the hosts in DNSToSwitchMapping before inter tracker server start to 
> avoid IPC timeout in Task Tracker heartbeat
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6224
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6224
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: MAPREDUCE-6224.branch-1.000.patch
>
>
> Resolve the hosts to fill up the cache in CachedDNSToSwitchMapping before 
> inter tracker server start to avoid IPC timeout in Task Tracker heartbeat.
> We saw IPC timeout happen in Task Tracker heartbeat for a large MR1 cluster 
> which use topology script(ShellCommandExecutor) to resolve the Network 
> Topology for Task Tracker host in ScriptBasedMapping.
> The reason is 
> Right after inter tracker server start in Job Tracker, Job Tracker receive a 
> lots HeartBeat from the Task Tracker. 
> heartbeat function call resolveAndAddToTopology to resolve the Network 
> Topology for Task Tracker host in ScriptBasedMapping which implement 
> CachedDNSToSwitchMapping.
> ScriptBasedMapping#resolve will check whether the host is in the cache,
> If the host is not in the cache, it will run topology script to get the 
> host's Network Topology using ShellCommandExecutor. Normally running topology 
> script is time consuming, which may cause the IPC time if too many heartbeat 
> happened at the same time for a large MR1 cluster.
> The solution is to resolve the Network Topology for all hosts in the hosts 
> list from HostsFileReader before receive any heartbeat from Task Tracker, so 
> the cache in ScriptBasedMapping will be filled up, and when heartbeat call 
> resolveAndAddToTopology, it will get the result from the cache instead of 
> running topology script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to