If we can get rid of oozie entirely, yes we can explore other
possibilities. But if we are still going to use oozie for DAG execution, we
are going to add add another bottleneck in the whole execution(currently,
falcon is not in the workflow execution path) and I don't think its worth
it.

The features that are outlined above are all available in basic forms in
oozie and it should be easy to enhance them/make them as extension points.



-Shwetha

On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan <[email protected]>
wrote:

> Here are few more gaps that we ought to solve for while we are on the
> subject:
>
> 1. Ability to attach to start & finish events of workflow execution.
> Currently we have post processing hook to listen to finish events, but we
> do run into scenarios where there are occasional failures with
> post-processing and there is potential phase lag in learning about the
> events.
> 2. Strict enforcement of concurrency control possibly spanning process
> boundaries.
> 3. Ability to tune how backlogs have to be caught up (old instances to be
> given higher priority, newer instances to be given higher priority, or some
> sort of weights to allow both to make progress at varying rates). There
> have been asks for routing current vs older instances to different queues
> by users as an alternative.
> 4. Ability to have a notion of non-time based feed instances and related
> coordination.
> 5. Currently keeping track of and managing SLAs is also a challenge, but
> with #1 addressed, this might be a lesser concern.
>
> Regards
> Srikanth Sundarrajan
>
> > Subject: Re: [DISCUSS] Orchestration in Falcon
> > From: [email protected]
> > Date: Tue, 23 Dec 2014 06:30:30 +0530
> > To: [email protected]
> >
> > @venkatesh, the question really is how do we enable these gating pre
> conditions. Seems hard enough to add them to oozie, but am not intimately
> familiar with oozie to comment on how hard or easy it is. Like I responded
> to @ajay on the same thread, if we are to do away with coordination through
> oozie, we can follow up this discussion with approaches and design. Though
> I had quartz in my mind, wanted to leave that out of discussion to see if
> there is consensus for moving away from oozie coords and implementing them
> through other means.
> >
> > Sent from my iPhone
> >
> > > On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" <
> [email protected]> wrote:
> > >
> > > What is the purpose of this decoupling? Why build this into Falcon?
> > > Scheduling is so common that there are dime a dozen schedulers today
> and
> > > they are all extensible with custom triggers. Making it part of Falcon
> will
> > > suffer the same issues that Oozie has today.
> > >
> > > I'm sorry but I'm a HUGE -1 to this being built into Falcon codebase.
> > >
> > > However, I'm +1 to reusing Quartz scheduler that already exists -
> stand it
> > > up outside or embed it like we do for active MQ.
> > >
> > > Phase 2 - I'd like to see we write a simple DAG execution layer in
> YARN as
> > > an app master with out DB and keeps state on HDFS as an alternate to
> Oozie.
> > >
> > > Then we will have a nimble falcon which can kick ass.
> > >
> > >
> > > On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan <
> [email protected]>
> > > wrote:
> > >
> > >> Hello Team,
> > >>
> > >> Since its inception Falcon has used Oozie for process orchestration as
> > >> well as feed life cycle phase executions, while this has worked
> reasonably
> > >> and allowed to make higher level capabilities available through
> Falcon, we
> > >> are increasing seeing scenarios where this is proving to be a limiting
> > >> factor. In its current form, Falcon relies on Oozie for both
> scheduling and
> > >> for workflow execution, due to which the scheduling is limited to time
> > >> based/cron based scheduling with additional gating conditions on data
> > >> availability. Also this imposes restrictions on datesets being
> > >> periodic/cyclic in nature.
> > >>
> > >> From an orchestration stand point, it would help if we can support
> > >> standard gating / scheduling primitives via Falcon:
> > >>
> > >> 1. Simple periodic scheduling with no gating conditions
> > >> 2. Cron based scheduling (day of week, day of the month, specific
> hours
> > >> and non-periodic) with no gating conditions
> > >> 3. Availability of new data (assuming monotonically increasing data
> > >> version, availavility of new versions)
> > >> 4. Changes to existing data (reinstatement - similar to late data
> handling)
> > >> 5. External trigger/notifications
> > >> 6. Availability of specific instances of data as declared as mandatory
> > >> dependency
> > >> 7. Availability of a minimum subset of instances of data declared as
> > >> mandatory depedency (at least 10 hourly instances of a day with 24
> > >> instances for ex)
> > >> 8. Valid combinations of the above.
> > >>
> > >> In this context, I would like to propose that we move away from Oozie
> for
> > >> the orchestration requirements and have them implemented natively
> within
> > >> Falcon. It will no doubt make Falcon server bulkier and heavier in
> both
> > >> code and deployment, but seems like without it, the orchestration
> within
> > >> Falcon will be limited by capabilities available within Oozie.
> > >>
> > >> Please do note that this suggestion is restricted to the scheduling
> and
> > >> not to the workflow execution.
> > >>
> > >> Would like to hear from fellow developers and users on what your
> thoughts
> > >> are. Please do chime in with your views.
> > >>
> > >> Regards
> > >> Srikanth Sundarrajan
> > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Venkatesh
> > >
> > > “Perfection (in design) is achieved not when there is nothing more to
> add,
> > > but rather when there is nothing more to take away.”
> > > - Antoine de Saint-Exupéry
>
>

Reply via email to