On Feb 9, 2010, at 9:47 PM, psdc1978 wrote:

Hi,

I've some question about the MapRed ports and how a reduce knows where the map output is to fetch.

I know that MapRed uses jetty has a webserver.

- The JobTracker send tasks to the TaskTracker execute them through port 50060?


TT sends a heartbeat RPC periodically, the response to which contains the new tasks to be launched.

- Which port TaskTracker uses to send status about the task that its executing to the JobTracker? Is it through port 50030?


The TT uses the JT's RPC port (which is *not* 50030 by default), configured by mapred.job.tracker.

- The Reduce task in the shuffle phase must copy the map outputs. In which class is the part of the code where Reduce will fetch the map output? This part of the code is executed by the TaskTracker process?


The reduce task itself (in a separate JVM from the TT) fetches map outputs, look at o.a.h.mapred.ReduceTask:ReduceCopier.fetchOutputs().

- The directory where the map output is to the reduce task use, is sent by the JobTracker? If so, this means that the JobTracker was informed by the task tracker where a map run, right?


JT knows where each successful map-task was scheduled, the reduce-task gets this information via TaskCompletionEvents (ReduceTask.ReduceCopier.GetMapEventsThread).

- The class org.apache.hadoop.mapred.ReduceTask is used? If so, which process use this class? Is it the TaskTracker process?


That is code being run in the child jvm of the ReduceTask.

Arun

Reply via email to