[
https://issues.apache.org/jira/browse/HADOOP-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572581#action_12572581
]
Devaraj Das commented on HADOOP-1985:
-------------------------------------
Actually it is easier than that .. only one check is required without adding
extra space per a TIP. If the last element of the cache in question is the TIP
that we are trying to insert, we don't insert... Here is how it looks like -
consider the earlier inner 'for' loop inside createCache. The modified code
there:
{noformat}
+ if (hostMaps == null) {
+ hostMaps = new ArrayList<TaskInProgress>();
+ cache.put(node, hostMaps);
+ hostMaps.add(maps[i]);
+ }
+ //check whether the hostMaps already contains an entry for a TIP
+ //This will be true for nodes that are racks and multiple nodes in
+ //the rack contain the input for a tip. Note that if it already
+ //exists in the hostMaps, it must be the last element there since
+ //we process one TIP at a time sequentially in the split-size order
+ if (hostMaps.get(hostMaps.size() - 1) != maps[i]) {
+ hostMaps.add(maps[i]);
+ }
{noformat}
> Abstract node to switch mapping into a topology service class used by
> namenode and jobtracker
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-1985
> URL: https://issues.apache.org/jira/browse/HADOOP-1985
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs, mapred
> Reporter: eric baldeschwieler
> Assignee: Devaraj Das
> Fix For: 0.17.0
>
> Attachments: 1985.new.patch, 1985.v1.patch, 1985.v10.patch,
> 1985.v11.patch, 1985.v19.patch, 1985.v2.patch, 1985.v20.patch,
> 1985.v23.patch, 1985.v3.patch, 1985.v4.patch, 1985.v5.patch, 1985.v6.patch,
> 1985.v9.patch, jobinprogress.patch
>
>
> In order to implement switch locality in MapReduce, we need to have switch
> location in both the namenode and job tracker. Currently the namenode asks
> the data nodes for this info and they run a local script to answer this
> question. In our environment and others that I know of there is no reason to
> push this to each node. It is easier to maintain a centralized script that
> maps node DNS names to switch strings.
> I propose that we build a new class that caches known DNS name to switch
> mappings and invokes a loadable class or a configurable system call to
> resolve unknown DNS to switch mappings. We can then add this to the namenode
> to support the current block to switch mapping needs and simplify the data
> nodes. We can also add this same callout to the job tracker and then
> implement rack locality logic there without needing to chane the filesystem
> API or the split planning API.
> Not only is this the least intrusive path to building racklocal MR I can ID,
> it is also future compatible to future infrastructures that may derive
> topology on the fly, etc, etc...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.