Data-local tasks

Saptarshi Guha Mon, 30 Jun 2008 06:42:06 -0700

Hello,

I recall asking this question but this is in addition to what I'ev askd.

        Firstly, to recap my question and Arun's specific response:


--      On May 20, 2008, at 9:03 AM, Saptarshi Guha wrote: > Hello, >

-- Does the "Data-local map tasks" counter mean the number of tasks that the had the input data already present on the machine on they are running on?

--      i.e the wasn't a need to ship the data to them.

        Response from Arun

-- Yes. Your understanding is correct. More specifically it means that the map-task got scheduled on a machine on which one of the -- replicas of it's input-split-block was present and was served by the datanode running on that machine. *smile* Arun

Now, Is Hadoop designed to schedule a map task on a machine which has one of the replicas of it's input split block? Failing that, does then assign a map task on machine close to one that contains a replica of it's input split block?

        Are there any performance metrics for this?

        Many thanks
        Saptarshi


Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha

smime.p7s
Description: S/MIME cryptographic signature

Data-local tasks

Reply via email to