Well, I think you are asking : if you have 3 machines, and you want to start 3 maps, one for each input file, will each maps reside on each different machine?
The answer is: not necessarily. In responding heartbeat from a task tracker, task scheduler tries to assign one local map task for each job in the queue. If no local map task is available, scheduler assigns a non-local map task. So, no matter you have 1 or 3 replications for your file in HDFS, there's chances one machine takes 2 (or 3) maps. Usually, your 3 machines heartbeat to job tracker almost at the same time and get a local map each. This is most likely to happen. But if one of your machine stuck for some reason for a while, depends on how long a map will take, another machine may take its map. On Fri, Mar 4, 2011 at 9:36 AM, maha <m...@umail.ucsb.edu> wrote: > Hi, > > Using 3 Machines, each has an input-File ' f ' in its local disk in > addition to HDFS , assuming my program spawns a mapper/file . > > Does that mean that mappers will be running on different machines? > > Thank you, > Maha