As far as I understood, MapReduce is waiting for all Mappers to finish until it starts running Reduce tasks. Am I mistaken here? If I am not, then I do not see any more synchrony being introduced than there already is (no locks required). Of course I am not aware of all the internals, but MapReduce is working with a single JobTracker, which distributes Reduce tasks to the different nodes (see http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Overview). So the only point where my "theory" would break is, if Reducer start before Mappers finish. Otherwise the JobTracker should be able to schedule Reduce tasks in a specific order.
On Mon, Dec 20, 2010 at 4:45 AM, Harsh J <qwertyman...@gmail.com> wrote: > You could use sort of a distributed lock service to achieve this > (ZooKeeper can help). But such things ought to be avoided as David > pointed out above. > > On Sun, Dec 19, 2010 at 9:09 PM, Martin Becker <_martinbec...@web.de> wrote: >> Hello everybody, >> >> is there a possibility to make sure that certain/all reduce tasks, >> i.e. the reducers to certain keys, are executed in a specified order? >> This is Job internal, so the Job Scheduler is probably the wrong place to >> start? >> Does the order induced by the Comparable interface influence the >> execution order at all? >> >> Thanks in advance, >> Martin >> > > > > -- > Harsh J > www.harshj.com >