Hi Ravion,

The problem you are describing sounds like a workflow where you must
be careful to verify certain conditions before proceeding to a next
step.

We have similar kinds of use cases for Hadoop apps at work, which are
essentially ETL.  I recommend that you look at http://cascading.org as
an abstraction layer for managing these kinds of workflows. We've
found it quite useful.

Best,
Paco


On Fri, Oct 17, 2008 at 8:29 PM, Ravion <[EMAIL PROTECTED]> wrote:
> Dear all,
>
> We have in our Data Warehouse System, about 600  ETL( Extract Transform Load) 
> jobs to create interim data model. SOme jobs are dependent on completion of 
> others.
>
> Assume that I create a group id intdependent jobs. Say a group G1 contains 
> 100 jobs , G2 contains another 200 jobs which are dependent on completion of 
> Group G1 and so on.
>
> Can we leverage on Haddop so that Hadoop executed G1 first, on failure it 
> wont execute G2 otherwise will continue for G2 and so  on.. ?
>
> Or do I need to configure "N" ( where N =  total number of groups) Haddop 
> jobs independently and handle by ourselves?
>
> Please share your thoughts, thanks
>
> Warmest regards,
> Ravion

Reply via email to