On Feb 9, 2010, at 9:47 PM, psdc1978 wrote:
Hi,
I've some question about the MapRed ports and how a reduce knows
where the map output is to fetch.
I know that MapRed uses jetty has a webserver.
- The JobTracker send tasks to the TaskTracker execute them through
port 50060?
TT sends a heartbeat RPC periodically, the response to which contains
the new tasks to be launched.
- Which port TaskTracker uses to send status about the task that its
executing to the JobTracker? Is it through port 50030?
The TT uses the JT's RPC port (which is *not* 50030 by default),
configured by mapred.job.tracker.
- The Reduce task in the shuffle phase must copy the map outputs. In
which class is the part of the code where Reduce will fetch the map
output? This part of the code is executed by the TaskTracker process?
The reduce task itself (in a separate JVM from the TT) fetches map
outputs, look at o.a.h.mapred.ReduceTask:ReduceCopier.fetchOutputs().
- The directory where the map output is to the reduce task use, is
sent by the JobTracker? If so, this means that the JobTracker was
informed by the task tracker where a map run, right?
JT knows where each successful map-task was scheduled, the reduce-task
gets this information via TaskCompletionEvents
(ReduceTask.ReduceCopier.GetMapEventsThread).
- The class org.apache.hadoop.mapred.ReduceTask is used? If so,
which process use this class? Is it the TaskTracker process?
That is code being run in the child jvm of the ReduceTask.
Arun