The short answer is yes it can be worth it because your job can finish faster if you are not only allowing local mappers. But this is of course a trade off. The best performance (but not latency) can be obtained when using only local mappers. You should read about delay scheduling which allows the user to pick what is the 'best'. Fair scheduler has it for hadoop 1 and capacity scheduler has it but for hadoop 2.
Regards Bertrand On Thu, Dec 6, 2012 at 6:14 AM, <jayunit...@gmail.com> wrote: > If there is a job with files f1 and f2, and a Mapper (m1) is running > against a file (f2) which is far from the local machine(m1), will the > overhead of copying f2 over to m1 be worth it?. > > That is .... - is the amount of resources required to read data off a > remote machine (m2) worth it? Or would it be better if that remote (m2) > now simply processed both files (f1, f2) in turn? > > Jay Vyas > http://jayunit100.blogspot.com -- Bertrand Dechoux