Here are few more gaps that we ought to solve for while we are on the
subject:
1. Ability to attach to start & finish events of workflow execution.
Currently we have post processing hook to listen to finish events, but
we
do run into scenarios where there are occasional failures with
post-processing and there is potential phase lag in learning about the
events.
2. Strict enforcement of concurrency control possibly spanning process
boundaries.
3. Ability to tune how backlogs have to be caught up (old instances to
be
given higher priority, newer instances to be given higher priority, or
some
sort of weights to allow both to make progress at varying rates).
There
have been asks for routing current vs older instances to different
queues
by users as an alternative.
4. Ability to have a notion of non-time based feed instances and
related
coordination.
5. Currently keeping track of and managing SLAs is also a challenge,
but
with #1 addressed, this might be a lesser concern.
Regards
Srikanth Sundarrajan
> Subject: Re: [DISCUSS] Orchestration in Falcon
> From: [email protected]
> Date: Tue, 23 Dec 2014 06:30:30 +0530
> To: [email protected]
>
> @venkatesh, the question really is how do we enable these gating pre
conditions. Seems hard enough to add them to oozie, but am not
intimately
familiar with oozie to comment on how hard or easy it is. Like I
responded
to @ajay on the same thread, if we are to do away with coordination
through
oozie, we can follow up this discussion with approaches and design.
Though
I had quartz in my mind, wanted to leave that out of discussion to see
if
there is consensus for moving away from oozie coords and implementing
them
through other means.
>
> Sent from my iPhone
>
> > On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" <
[email protected]> wrote:
> >
> > What is the purpose of this decoupling? Why build this into Falcon?
> > Scheduling is so common that there are dime a dozen schedulers today
and
> > they are all extensible with custom triggers. Making it part of Falcon
will
> > suffer the same issues that Oozie has today.
> >
> > I'm sorry but I'm a HUGE -1 to this being built into Falcon codebase.
> >
> > However, I'm +1 to reusing Quartz scheduler that already exists -
stand it
> > up outside or embed it like we do for active MQ.
> >
> > Phase 2 - I'd like to see we write a simple DAG execution layer in
YARN as
> > an app master with out DB and keeps state on HDFS as an alternate to
Oozie.
> >
> > Then we will have a nimble falcon which can kick ass.
> >
> >
> > On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan <
[email protected]>
> > wrote:
> >
> >> Hello Team,
> >>
> >> Since its inception Falcon has used Oozie for process orchestration as
> >> well as feed life cycle phase executions, while this has worked
reasonably
> >> and allowed to make higher level capabilities available through
Falcon, we
> >> are increasing seeing scenarios where this is proving to be a limiting
> >> factor. In its current form, Falcon relies on Oozie for both
scheduling and
> >> for workflow execution, due to which the scheduling is limited to time
> >> based/cron based scheduling with additional gating conditions on data
> >> availability. Also this imposes restrictions on datesets being
> >> periodic/cyclic in nature.
> >>
> >> From an orchestration stand point, it would help if we can support
> >> standard gating / scheduling primitives via Falcon:
> >>
> >> 1. Simple periodic scheduling with no gating conditions
> >> 2. Cron based scheduling (day of week, day of the month, specific
hours
> >> and non-periodic) with no gating conditions
> >> 3. Availability of new data (assuming monotonically increasing data
> >> version, availavility of new versions)
> >> 4. Changes to existing data (reinstatement - similar to late data
handling)
> >> 5. External trigger/notifications
> >> 6. Availability of specific instances of data as declared as mandatory
> >> dependency
> >> 7. Availability of a minimum subset of instances of data declared as
> >> mandatory depedency (at least 10 hourly instances of a day with 24
> >> instances for ex)
> >> 8. Valid combinations of the above.
> >>
> >> In this context, I would like to propose that we move away from Oozie
for
> >> the orchestration requirements and have them implemented natively
within
> >> Falcon. It will no doubt make Falcon server bulkier and heavier in
both
> >> code and deployment, but seems like without it, the orchestration
within
> >> Falcon will be limited by capabilities available within Oozie.
> >>
> >> Please do note that this suggestion is restricted to the scheduling
and
> >> not to the workflow execution.
> >>
> >> Would like to hear from fellow developers and users on what your
thoughts
> >> are. Please do chime in with your views.
> >>
> >> Regards
> >> Srikanth Sundarrajan
> >
> >
> >
> >
> > --
> > Regards,
> > Venkatesh
> >
> > “Perfection (in design) is achieved not when there is nothing more to
add,
> > but rather when there is nothing more to take away.”
> > - Antoine de Saint-Exupéry