Re: Can jobs be configured to be sequential
Hi Paco, Thanks - This is exactly what I was looking for.. Regards, Ravi - Original Message - From: Paco NATHAN [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Saturday, October 18, 2008 9:46 AM Subject: Re: Can jobs be configured to be sequential Hi Ravion, The problem you are describing sounds like a workflow where you must be careful to verify certain conditions before proceeding to a next step. We have similar kinds of use cases for Hadoop apps at work, which are essentially ETL. I recommend that you look at http://cascading.org as an abstraction layer for managing these kinds of workflows. We've found it quite useful. Best, Paco On Fri, Oct 17, 2008 at 8:29 PM, Ravion [EMAIL PROTECTED] wrote: Dear all, We have in our Data Warehouse System, about 600 ETL( Extract Transform Load) jobs to create interim data model. SOme jobs are dependent on completion of others. Assume that I create a group id intdependent jobs. Say a group G1 contains 100 jobs , G2 contains another 200 jobs which are dependent on completion of Group G1 and so on. Can we leverage on Haddop so that Hadoop executed G1 first, on failure it wont execute G2 otherwise will continue for G2 and so on.. ? Or do I need to configure N ( where N = total number of groups) Haddop jobs independently and handle by ourselves? Please share your thoughts, thanks Warmest regards, Ravion
Can jobs be configured to be sequential
Dear all, We have in our Data Warehouse System, about 600 ETL( Extract Transform Load) jobs to create interim data model. SOme jobs are dependent on completion of others. Assume that I create a group id intdependent jobs. Say a group G1 contains 100 jobs , G2 contains another 200 jobs which are dependent on completion of Group G1 and so on. Can we leverage on Haddop so that Hadoop executed G1 first, on failure it wont execute G2 otherwise will continue for G2 and so on.. ? Or do I need to configure N ( where N = total number of groups) Haddop jobs independently and handle by ourselves? Please share your thoughts, thanks Warmest regards, Ravion
Re: Can jobs be configured to be sequential
Hi Ravion, The problem you are describing sounds like a workflow where you must be careful to verify certain conditions before proceeding to a next step. We have similar kinds of use cases for Hadoop apps at work, which are essentially ETL. I recommend that you look at http://cascading.org as an abstraction layer for managing these kinds of workflows. We've found it quite useful. Best, Paco On Fri, Oct 17, 2008 at 8:29 PM, Ravion [EMAIL PROTECTED] wrote: Dear all, We have in our Data Warehouse System, about 600 ETL( Extract Transform Load) jobs to create interim data model. SOme jobs are dependent on completion of others. Assume that I create a group id intdependent jobs. Say a group G1 contains 100 jobs , G2 contains another 200 jobs which are dependent on completion of Group G1 and so on. Can we leverage on Haddop so that Hadoop executed G1 first, on failure it wont execute G2 otherwise will continue for G2 and so on.. ? Or do I need to configure N ( where N = total number of groups) Haddop jobs independently and handle by ourselves? Please share your thoughts, thanks Warmest regards, Ravion