Re: Appropriate use of Hadoop for non-map/reduce tasks?

Eric Baldeschwieler Wed, 26 Dec 2007 13:25:47 -0800

This is a very interesting thread.  God sounds cool.

We've been discussing a proposal to generalize the TT / JT servers tohandle more generic tasks and move job specific work out of the jobtracker and into client code so the whole system is both much moregeneral and has more coherent layering. The result would look morelike condor/pbs like systems (or presumably borg) with map-reduce asa user job.

Such a system would allow the current map-reduce code to coexist withother work-queuing libraries or maybe even persistent services on thesame Hadoop cluster, although that would be a stretch goal. We'llkick off a thread with some documents soon.

Our primary goal in going this way would be to get better utilizationout of map-reduce clusters and support a richer scheduling model.The ability to support alternative job frameworks would just be gravy!


Merry XMas all.

E14

PS cross posting to hadoop-dev since this is morphing into a devdiscussion. I just created HADOOP-2491 to capture discussion on thisnew topic


On Dec 22, 2007, at 2:39 PM, Chad Walters wrote:

I should further say that god functions only on a per machinebasis. We have then built a number of scripts that do auto-configuration of our various services, using configs pulled fromLDAP and code pulled from our package repo. We use this toconfigure our various server processes and also to configure Hadoopclusters (HDFS and Map/Reduce). But god is a key part of thesystem, since it helps us provide a uniform interface for startingand stopping all our services.
Chad


On 12/22/07 1:30 PM, "Chad Walters" <[EMAIL PROTECTED]> wrote:

I am not really sure that Hadoop is right for what Jeff is describing.

I think there may be two separate problems:
1. Batch tasks that may take a long time but are expected to havea finite termination
 2.  Long-lived server processes that have an indefinite lifetime
For #1, we pretty much use Hadoop, although we have built a fairlyextensive framework inside of these long map tasks to trackprogress and handle various failure conditions that can arise. Ifpeople are really interested, I'll poke around and see if any of itis general enough to warrant contributing back, but I think a lotof it is probably fairly specific to the kinds of failure cases weexpect from the components involved in the long map task.
For #2, we are using something called "god" (http://god.rubyforge.org/). One of our developers ended up starting thisproject because he didn't like monit. We liked the way it was goingand now we now we use it throughout our datacenter to start, stop,and health check our server processes. It supports both polling andevent-driven actions and is pretty extensible. Check it out to seeif it might satisfy some of your needs.
Chad
On 12/22/07 11:40 AM, "Jeff Hammerbacher"<[EMAIL PROTECTED]> wrote:
yo,
from my understanding, the map/reduce codebase grew out of thecodebase for"the borg", google's system for managing long-running processes.we coulddefinitely use this sort of functionality, and the jobtracker/tasktrackerparadigm goes part of the way there. sqs really helps when youwant to runa set of recurring, dependent processes (a problem our groupdefinitelyneeds to solve), but it doesn't really seem to address the issue ofmanaging
those processes when they're long-lived.

for instance, when we deploy our search servers, we have a script that
basically says "daemonize this process on this many boxes, and ifit entersa condition that doesn't look healthy, take this action (likerestart, or
rebuild the index, etc.)".  given how hard-coded the task-type is into
map/reduce (er, "map" and "reduce"), it's hard to specify new typesof errorconditions and running conditions for your processes. also, thejobtracker
doesn't have any high availability guarantees, so you could run into a
situation where your processes are fine but the jobtracker goes down.
 zookeeper could help here.  it'd be sweet if hadoop could handle this
long-lived process management scenario.

kirk, i'd be interested in hearing more about your processes and the
requirements you have of your process manager.  we're exploring other
solutions to this problem and i'd be happy to connect you with thefolks
here who are thinking about the issue.

later,
jeff

On Dec 21, 2007 12:42 PM, John Heidemann <[EMAIL PROTECTED]> wrote:
On Fri, 21 Dec 2007 12:24:57 PST, John Heidemann wrote:
On Thu, 20 Dec 2007 18:46:58 PST, Kirk True wrote:
Hi all,

A lot of the ideas I have for incorporating Hadoop into internal
projects revolves around distributing long-running tasks overmultiplemachines. I've been able to get a quick prototype up in Hadoop forone of
those projects and it seems to work pretty well.
...
He's not saying "is Hadoop optimal" for things that aren't really
map/reduce, but "is it reasonable" for those things?
(Kirk, is that right?)
...
Sorry to double reply, but I left out my comment to (my view of)Kirk's
question.

In addition to what Ted said, I'm not sure how well Hadoop works with
long-running jobs, particuarlly how well that interacts with itsfault
tolerance code.
And more generally, if you're not doing map/reduce than you'dprobably
have to build your own fault tolerance methods.

  -John Heidemann

Re: Appropriate use of Hadoop for non-map/reduce tasks?

Reply via email to