Hi all,

I second Shwetha there. I think we can achieve such features in Oozie (with some adaptations).

Regards
JB

Le 2014-12-23 10:53, Shwetha G S a écrit :
If we can get rid of oozie entirely, yes we can explore other
possibilities. But if we are still going to use oozie for DAG execution, we are going to add add another bottleneck in the whole execution(currently, falcon is not in the workflow execution path) and I don't think its worth
it.

The features that are outlined above are all available in basic forms in oozie and it should be easy to enhance them/make them as extension points.



-Shwetha

On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan <[email protected]>
wrote:

Here are few more gaps that we ought to solve for while we are on the
subject:

1. Ability to attach to start & finish events of workflow execution.
Currently we have post processing hook to listen to finish events, but we
do run into scenarios where there are occasional failures with
post-processing and there is potential phase lag in learning about the
events.
2. Strict enforcement of concurrency control possibly spanning process
boundaries.
3. Ability to tune how backlogs have to be caught up (old instances to be given higher priority, newer instances to be given higher priority, or some sort of weights to allow both to make progress at varying rates). There have been asks for routing current vs older instances to different queues
by users as an alternative.
4. Ability to have a notion of non-time based feed instances and related
coordination.
5. Currently keeping track of and managing SLAs is also a challenge, but
with #1 addressed, this might be a lesser concern.

Regards
Srikanth Sundarrajan

> Subject: Re: [DISCUSS] Orchestration in Falcon
> From: [email protected]
> Date: Tue, 23 Dec 2014 06:30:30 +0530
> To: [email protected]
>
> @venkatesh, the question really is how do we enable these gating pre
conditions. Seems hard enough to add them to oozie, but am not intimately familiar with oozie to comment on how hard or easy it is. Like I responded to @ajay on the same thread, if we are to do away with coordination through oozie, we can follow up this discussion with approaches and design. Though I had quartz in my mind, wanted to leave that out of discussion to see if there is consensus for moving away from oozie coords and implementing them
through other means.
>
> Sent from my iPhone
>
> > On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" <
[email protected]> wrote:
> >
> > What is the purpose of this decoupling? Why build this into Falcon?
> > Scheduling is so common that there are dime a dozen schedulers today
and
> > they are all extensible with custom triggers. Making it part of Falcon
will
> > suffer the same issues that Oozie has today.
> >
> > I'm sorry but I'm a HUGE -1 to this being built into Falcon codebase.
> >
> > However, I'm +1 to reusing Quartz scheduler that already exists -
stand it
> > up outside or embed it like we do for active MQ.
> >
> > Phase 2 - I'd like to see we write a simple DAG execution layer in
YARN as
> > an app master with out DB and keeps state on HDFS as an alternate to
Oozie.
> >
> > Then we will have a nimble falcon which can kick ass.
> >
> >
> > On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan <
[email protected]>
> > wrote:
> >
> >> Hello Team,
> >>
> >> Since its inception Falcon has used Oozie for process orchestration as
> >> well as feed life cycle phase executions, while this has worked
reasonably
> >> and allowed to make higher level capabilities available through
Falcon, we
> >> are increasing seeing scenarios where this is proving to be a limiting
> >> factor. In its current form, Falcon relies on Oozie for both
scheduling and
> >> for workflow execution, due to which the scheduling is limited to time
> >> based/cron based scheduling with additional gating conditions on data
> >> availability. Also this imposes restrictions on datesets being
> >> periodic/cyclic in nature.
> >>
> >> From an orchestration stand point, it would help if we can support
> >> standard gating / scheduling primitives via Falcon:
> >>
> >> 1. Simple periodic scheduling with no gating conditions
> >> 2. Cron based scheduling (day of week, day of the month, specific
hours
> >> and non-periodic) with no gating conditions
> >> 3. Availability of new data (assuming monotonically increasing data
> >> version, availavility of new versions)
> >> 4. Changes to existing data (reinstatement - similar to late data
handling)
> >> 5. External trigger/notifications
> >> 6. Availability of specific instances of data as declared as mandatory
> >> dependency
> >> 7. Availability of a minimum subset of instances of data declared as
> >> mandatory depedency (at least 10 hourly instances of a day with 24
> >> instances for ex)
> >> 8. Valid combinations of the above.
> >>
> >> In this context, I would like to propose that we move away from Oozie
for
> >> the orchestration requirements and have them implemented natively
within
> >> Falcon. It will no doubt make Falcon server bulkier and heavier in
both
> >> code and deployment, but seems like without it, the orchestration
within
> >> Falcon will be limited by capabilities available within Oozie.
> >>
> >> Please do note that this suggestion is restricted to the scheduling
and
> >> not to the workflow execution.
> >>
> >> Would like to hear from fellow developers and users on what your
thoughts
> >> are. Please do chime in with your views.
> >>
> >> Regards
> >> Srikanth Sundarrajan
> >
> >
> >
> >
> > --
> > Regards,
> > Venkatesh
> >
> > “Perfection (in design) is achieved not when there is nothing more to
add,
> > but rather when there is nothing more to take away.”
> > - Antoine de Saint-Exupéry


Reply via email to