On Mar 18, 2009, at 10:26 AM, Stuart White wrote:

I'd like to implement some coordination between Mapper tasks running
on the same node.  I was thinking of using ZooKeeper to provide this
coordination.

This is a very bad idea in the general case. It can be made to work, but you need to have a dedicated cluster so that you are sure they are all active at the same time. Otherwise, you have no guarantee that all of the maps are running at the same time.

In most cases, you are much better off using the standard communication between the maps and reduces and making multiple passes of jobs.

I think I remember hearing that MapReduce and/or HDFS use ZooKeeper
under-the-covers.

There are no immediate plans to implement HA yet.

-- Owen

Reply via email to