MR does not read the files in the front-end (unless a partitioner such as the TOP demands it). The actual block-level read is done via the DFSClient class (its sub-classes DFSInputStream and DFSOutputStream - the first one should be where your interest lies.)
All MR cares about is scheduling the data locally, so it just takes the block locations (metadata) to conjure up split objects for the scheduler and the task and sends it across. On Thu, Sep 13, 2012 at 5:40 AM, Vivi Lang <sqlxwei...@gmail.com> wrote: > Hi all, > > Is there anyone who can tell me that when we lanuch a mapreduce task, for > example, wordcount, after the JobClient obtained the block locations (the > related hosts/datanodes are stored in the specified split), which > function/class will be called for reading those blocks from the datanode? > > Thanks, > Vivian -- Harsh J