subject:"Distributing MapReduce on a computer cluster"

Re: Distributing MapReduce on a computer cluster

2012-04-25 Thread Merto Mertek

For distribution of load you can start reading some chapters from different
types of hadoop scheduler. I have not yet studied other implementation like
hadoop, however a very simplified version of distribution concept  is the
following:

a) Tasktracker ask for work (heartbeat consist of a status of the worker
node - # free slots)
b) Jobtracker pick a job from a list which is sorted based on the specified
policy (fairscheduling, fifo, lifo, other sla)
c) Tasktracker executes the map/reduce job

Like mentioned before there are a lot more details.. In b) there exists an
implementation of delay scheduling which is there to improve throughput by
taking account of input data location for a picked job. There you have a
preemption mechanism that regulate the fairness between pools,etc..

 A good start is book that Preshant mentioned...

On 23 April 2012 23:49, Prashant Kommireddi  wrote:

> Shailesh, there's a lot that goes into distributing work across
> tasks/nodes. It's not just distributing work but also fault-tolerance,
> data locality etc that come into play. It might be good to refer
> Hadoop apache docs or Tom White's definitive guide.
>
> Sent from my iPhone
>
> On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala 
> wrote:
>
> > Hello,
> >
> > I am trying to design my own MapReduce Implementation and I want to know
> > how hadoop is able to distribute its workload across multiple computers.
> > Can anyone shed more light on this? thanks!
>

Re: Distributing MapReduce on a computer cluster

2012-04-23 Thread Prashant Kommireddi

Shailesh, there's a lot that goes into distributing work across
tasks/nodes. It's not just distributing work but also fault-tolerance,
data locality etc that come into play. It might be good to refer
Hadoop apache docs or Tom White's definitive guide.

Sent from my iPhone

On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala  wrote:

> Hello,
>
> I am trying to design my own MapReduce Implementation and I want to know
> how hadoop is able to distribute its workload across multiple computers.
> Can anyone shed more light on this? thanks!

Re: Distributing MapReduce on a computer cluster

Re: Distributing MapReduce on a computer cluster

2 matches

Site Navigation

Mail list logo

Footer information