Re: Hadoop for computationally intensive tasks (no data)

Steve Loughran Fri, 05 Sep 2008 03:07:45 -0700

Tenaali Ram wrote:

Hi,


I am new to hadoop. What I have understood so far is- hadoop is used to
process huge data using map-reduce paradigm.

I am working on problem where I need to perform large number of
computations, most computations can be done independently of each other (so
I think each mapper can handle one or more such computations). However there
is no data involved. Its just number crunching job. Is it suited for Hadoop
?

well, you can have the MR jobs stick data out into the filesystem. Soeven though they don't start of located, they end up running where theoutput needs to go.

Has anyone used hadoop for merely number crunching? If yes, how should I
define input for the job and ensure that computations are distributed to all
nodes in the grid?

The current scheduler moves work to near where the data sources are,going for the same machine or same rack, looking for a task tracker witha spare "slot". There isn't yet any scheduler that worried more aboutpure computation, where you need to consider current CPU load, memoryconsumption and power budget -whether your rack is running so hot its atrisk of being shut down, or at least throttled back. That's the kind ofscheduling where the e-science and grid toolkit people have the edge.

But now that the version of hadoop in SVN has support for plug-inscheduling, someone has the opportunity to write a new scheduler, onethat focuses on pure computation...

Re: Hadoop for computationally intensive tasks (no data)

Reply via email to