I would definitely checkout Oozie for this use case. -Joey
On Thu, Sep 29, 2011 at 12:51 PM, Aaron Baff <aaron.b...@telescope.tv> wrote: > I saw this, but wasn't sure if it was something that ran on the client and > just submitted the Job's in sequence, or if that gave it all to the > JobTracker, and the JobTracker took care of submitting the Jobs in sequence > appropriately. > > Basically, I'm looking for a completely stateless client, that doesn't need > to ping the JobTracker every now and then to see if a Job has completed, and > then submit the next one. The ideal flow would be the client gets in a > request to run the series of Jobs, it preps them all, gets them all > configured, and then passes them off to the JobTracker which runs them all in > order without the client application needing to do anthing further. > > Sounds like that doesn't really exist as part of Hadoop framework, and needs > something like Oozie (or a home-built system) to do this. > > --Aaron > -----Original Message----- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Wednesday, September 28, 2011 9:37 PM > To: common-user@hadoop.apache.org > Subject: Re: Running multiple MR Job's in sequence > > Within the Hadoop core project, there is JobControl you can utilize > for this. You can view its API at > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html > and it is fairly simple to use (Create jobs in regular java API, build > a dependency flow using JobControl atop these jobconf objects). > > Apache Oozie and other such tools offer higher abstractions on > controlling a workflow, and can be considered when your needs can get > a bit complex than just a series (easy to handle failure scenarios > between dependent jobs, perform minor fs operations in pre/post > processing, etc.). > > On Thu, Sep 29, 2011 at 5:26 AM, Aaron Baff <aaron.b...@telescope.tv> wrote: >> Is it possible to submit a series of MR Jobs to the JobTracker to run in >> sequence (one finishes, take the output of that if successful and feed it >> into the next, etc), or does it need to run client side by using the >> JobControl or something like Oozie, or rolling our own? What I'm looking for >> is a fire & forget, and occasionally check back to see if it's done. So >> client-side doesn't need to really know anything or keep track of anything. >> Does something like that exist within the Hadoop framework? >> >> --Aaron >> > > > > -- > Harsh J > -- Joseph Echeverria Cloudera, Inc. 443.305.9434