Re: Running multiple MR Job's in sequence

Joey Echeverria Thu, 29 Sep 2011 10:03:27 -0700

I would definitely checkout Oozie for this use case.

-Joey


On Thu, Sep 29, 2011 at 12:51 PM, Aaron Baff <aaron.b...@telescope.tv> wrote:
> I saw this, but wasn't sure if it was something that ran on the client and 
> just submitted the Job's in sequence, or if that gave it all to the 
> JobTracker, and the JobTracker took care of submitting the Jobs in sequence 
> appropriately.
>
> Basically, I'm looking for a completely stateless client, that doesn't need 
> to ping the JobTracker every now and then to see if a Job has completed, and 
> then submit the next one. The ideal flow would be the client gets in a 
> request to run the series of Jobs, it preps them all, gets them all 
> configured, and then passes them off to the JobTracker which runs them all in 
> order without the client application needing to do anthing further.
>
> Sounds like that doesn't really exist as part of Hadoop framework, and needs 
> something like Oozie (or a home-built system) to do this.
>
> --Aaron
> -----Original Message-----
> From: Harsh J [mailto:ha...@cloudera.com]
> Sent: Wednesday, September 28, 2011 9:37 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Running multiple MR Job's in sequence
>
> Within the Hadoop core project, there is JobControl you can utilize
> for this. You can view its API at
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html
> and it is fairly simple to use (Create jobs in regular java API, build
> a dependency flow using JobControl atop these jobconf objects).
>
> Apache Oozie and other such tools offer higher abstractions on
> controlling a workflow, and can be considered when your needs can get
> a bit complex than just a series (easy to handle failure scenarios
> between dependent jobs, perform minor fs operations in pre/post
> processing, etc.).
>
> On Thu, Sep 29, 2011 at 5:26 AM, Aaron Baff <aaron.b...@telescope.tv> wrote:
>> Is it possible to submit a series of MR Jobs to the JobTracker to run in 
>> sequence (one finishes, take the output of that if successful and feed it 
>> into the next, etc), or does it need to run client side by using the 
>> JobControl or something like Oozie, or rolling our own? What I'm looking for 
>> is a fire & forget, and occasionally check back to see if it's done. So 
>> client-side doesn't need to really know anything or keep track of anything. 
>> Does something like that exist within the Hadoop framework?
>>
>> --Aaron
>>
>
>
>
> --
> Harsh J
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Running multiple MR Job's in sequence

Reply via email to