As far as I understood, MapReduce is waiting for all Mappers to finish
until it starts running Reduce tasks. Am I mistaken here? If I am not,
then I do not see any more synchrony being introduced than there
already is (no locks required). Of course I am not aware of all the
internals, but MapReduce is working with a single JobTracker, which
distributes Reduce tasks to the different nodes (see
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Overview).
So the only point where my "theory" would break is, if Reducer start
before Mappers finish. Otherwise the JobTracker should be able to
schedule Reduce tasks in a specific order.

On Mon, Dec 20, 2010 at 4:45 AM, Harsh J <qwertyman...@gmail.com> wrote:
> You could use sort of a distributed lock service to achieve this
> (ZooKeeper can help). But such things ought to be avoided as David
> pointed out above.
>
> On Sun, Dec 19, 2010 at 9:09 PM, Martin Becker <_martinbec...@web.de> wrote:
>> Hello everybody,
>>
>> is there a possibility to make sure that certain/all reduce tasks,
>> i.e. the reducers to certain keys, are executed in a specified order?
>> This is Job internal, so the Job Scheduler is probably the wrong place to 
>> start?
>> Does the order induced by the Comparable interface influence the
>> execution order at all?
>>
>> Thanks in advance,
>> Martin
>>
>
>
>
> --
> Harsh J
> www.harshj.com
>

Reply via email to