For distribution of load you can start reading some chapters from different
types of hadoop scheduler. I have not yet studied other implementation like
hadoop, however a very simplified version of distribution concept  is the
following:

a) Tasktracker ask for work (heartbeat consist of a status of the worker
node - # free slots)
b) Jobtracker pick a job from a list which is sorted based on the specified
policy (fairscheduling, fifo, lifo, other sla)
c) Tasktracker executes the map/reduce job

Like mentioned before there are a lot more details.. In b) there exists an
implementation of delay scheduling which is there to improve throughput by
taking account of input data location for a picked job. There you have a
preemption mechanism that regulate the fairness between pools,etc..

 A good start is book that Preshant mentioned...

On 23 April 2012 23:49, Prashant Kommireddi <prash1...@gmail.com> wrote:

> Shailesh, there's a lot that goes into distributing work across
> tasks/nodes. It's not just distributing work but also fault-tolerance,
> data locality etc that come into play. It might be good to refer
> Hadoop apache docs or Tom White's definitive guide.
>
> Sent from my iPhone
>
> On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala <shailesh2...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am trying to design my own MapReduce Implementation and I want to know
> > how hadoop is able to distribute its workload across multiple computers.
> > Can anyone shed more light on this? thanks!
>

Reply via email to