Hello,I recall asking this question but this is in addition to what I'ev askd.
Firstly, to recap my question and Arun's specific response:
-- On May 20, 2008, at 9:03 AM, Saptarshi Guha wrote: > Hello, >-- Does the "Data-local map tasks" counter mean the number of tasks that the had the input data already present on the machine on they are running on?
-- i.e the wasn't a need to ship the data to them. Response from Arun-- Yes. Your understanding is correct. More specifically it means that the map-task got scheduled on a machine on which one of the -- replicas of it's input-split-block was present and was served by the datanode running on that machine. *smile* Arun
Now, Is Hadoop designed to schedule a map task on a machine which has one of the replicas of it's input split block? Failing that, does then assign a map task on machine close to one that contains a replica of it's input split block?
Are there any performance metrics for this? Many thanks Saptarshi Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha
smime.p7s
Description: S/MIME cryptographic signature