[ 
https://issues.apache.org/jira/browse/HADOOP-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488787
 ] 

Owen O'Malley commented on HADOOP-968:
--------------------------------------

1. I notice that a lot of your iterators are not typed causing you to do casts 
of itr.next().
2. In many cases, the  loop "for(Item item: itemSet){..}" is easier to read and 
more concise.
3. Maps should not be iterated through using:
      for(Map.Entry<Key,Value> item: myMap) {...}
   rather than:
      Iterator itr = myMap.keySet().iterator();
      while (itr.hasNext()) {
         Value value = myMap.get(itr.next());
         ...
      }
4. It looks like each reduce from a job will cause its job's FetchState to be 
added to the list a multiple time, so it will fetch multiple times per a loop.
5. I'd remove the sleep from queryJobTracker and move it to the 
MapEventsFetcherThread's run loop.
6. The doFetch is badly named, since it doesn't actually do the fetch. It 
should be called findReduces or something.
7. The name of the parameter of the first parameter in 
TaskUmbilicalProtocol.getMapCompletionEvents is "taskid", but if fact it is a 
job id.
8. The MapEventsFetcherThread's name doesn't need to include the task in the 
normal case, but I guess for unit tests it might be useful.
9. I assume that the shuffle code in ReduceTask matches the old code in 
ReduceTaskRunner. *smile*

> Reduce shuffle and merge should be done a child JVM
> ---------------------------------------------------
>
>                 Key: HADOOP-968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-968
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>             Fix For: 0.13.0
>
>         Attachments: 968.apr06.patch, 968.apr10.patch, 968.patch
>
>
> The Reduce's shuffle and initial merge is done in the TaskTracker's JVM. It 
> would be better to have it run in the Task's child JVM. The advantages are:
>   1. The class path and environment would be set up correctly.
>   2. User code doesn't need to be loaded into the TaskTracker.
>   3. Lower memory usage and contention in the TaskTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to