+1

Regards
JB

On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote:
Can we pick up this thread in the new year when folks are back from break? I am 
in total agreement with Venkatesh here. We ought to have a long term 
sustainable approach. Also I feel that the capabilities that we would like to 
enable on falcon and getting them done through oozie in near term seems to be a 
tall ask anyways.

Regards
Srikanth Sundarrajan

Date: Tue, 23 Dec 2014 16:44:06 -0800
Subject: Re: [DISCUSS] Orchestration in Falcon
From: [email protected]
To: [email protected]

Chugging along with Oozie is bad for Falcon in the long run, for users and
developers. Its horribly complex to work through the many rough edges
architecturally in Oozie. Look at all the patches for security that I had
to fix around Oozie. Its unnecessarily very complex, non-uniform and is NOT
meant to be used by another tool like Falcon but was built around end user.

This is a good discussion to have - may be explore oozie for short-term but
look at alternative solutions for the long-term.

On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan <[email protected]>
wrote:

@jb, There is no doubt merit in mapping them to oozie if possible and if
extensions are simple and straight forward enough.

Also had a quick chat offline with Shwetha and she mentioned about some
work happening in Oozie in this regard. On further digging up, found
https://issues.apache.org/jira/browse/OOZIE-1976. This is possibly what
Shwetha was referring to. From the looks of it, this tries to address item
#7 in the original thread.  May be there are more jiras where additional
work such as a-periodic datasets is being worked on. Perhaps @Shwetha can
throw some light on what is being considered and/or how these
gating/orchestration use cases can be managed.

Regards
Srikanth Sundarrajan

Date: Tue, 23 Dec 2014 11:06:24 +0100
From: [email protected]
To: [email protected]
Subject: Re: [DISCUSS] Orchestration in Falcon

Hi all,

I second Shwetha there. I think we can achieve such features in Oozie
(with some adaptations).

Regards
JB

Le 2014-12-23 10:53, Shwetha G S a écrit :
If we can get rid of oozie entirely, yes we can explore other
possibilities. But if we are still going to use oozie for DAG
execution, we
are going to add add another bottleneck in the whole
execution(currently,
falcon is not in the workflow execution path) and I don't think its
worth
it.

The features that are outlined above are all available in basic forms
in
oozie and it should be easy to enhance them/make them as extension
points.



-Shwetha

On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan
<[email protected]>
wrote:

Here are few more gaps that we ought to solve for while we are on the
subject:

1. Ability to attach to start & finish events of workflow execution.
Currently we have post processing hook to listen to finish events, but
we
do run into scenarios where there are occasional failures with
post-processing and there is potential phase lag in learning about the
events.
2. Strict enforcement of concurrency control possibly spanning process
boundaries.
3. Ability to tune how backlogs have to be caught up (old instances to
be
given higher priority, newer instances to be given higher priority, or
some
sort of weights to allow both to make progress at varying rates).
There
have been asks for routing current vs older instances to different
queues
by users as an alternative.
4. Ability to have a notion of non-time based feed instances and
related
coordination.
5. Currently keeping track of and managing SLAs is also a challenge,
but
with #1 addressed, this might be a lesser concern.

Regards
Srikanth Sundarrajan

Subject: Re: [DISCUSS] Orchestration in Falcon
From: [email protected]
Date: Tue, 23 Dec 2014 06:30:30 +0530
To: [email protected]

@venkatesh, the question really is how do we enable these gating pre
conditions. Seems hard enough to add them to oozie, but am not
intimately
familiar with oozie to comment on how hard or easy it is. Like I
responded
to @ajay on the same thread, if we are to do away with coordination
through
oozie, we can follow up this discussion with approaches and design.
Though
I had quartz in my mind, wanted to leave that out of discussion to see
if
there is consensus for moving away from oozie coords and implementing
them
through other means.

Sent from my iPhone

On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" <
[email protected]> wrote:

What is the purpose of this decoupling? Why build this into
Falcon?
Scheduling is so common that there are dime a dozen schedulers
today
and
they are all extensible with custom triggers. Making it part of
Falcon
will
suffer the same issues that Oozie has today.

I'm sorry but I'm a HUGE -1 to this being built into Falcon
codebase.

However, I'm +1 to reusing Quartz scheduler that already exists -
stand it
up outside or embed it like we do for active MQ.

Phase 2 - I'd like to see we write a simple DAG execution layer in
YARN as
an app master with out DB and keeps state on HDFS as an alternate
to
Oozie.

Then we will have a nimble falcon which can kick ass.


On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan <
[email protected]>
wrote:

Hello Team,

Since its inception Falcon has used Oozie for process
orchestration as
well as feed life cycle phase executions, while this has worked
reasonably
and allowed to make higher level capabilities available through
Falcon, we
are increasing seeing scenarios where this is proving to be a
limiting
factor. In its current form, Falcon relies on Oozie for both
scheduling and
for workflow execution, due to which the scheduling is limited
to time
based/cron based scheduling with additional gating conditions on
data
availability. Also this imposes restrictions on datesets being
periodic/cyclic in nature.

 From an orchestration stand point, it would help if we can
support
standard gating / scheduling primitives via Falcon:

1. Simple periodic scheduling with no gating conditions
2. Cron based scheduling (day of week, day of the month, specific
hours
and non-periodic) with no gating conditions
3. Availability of new data (assuming monotonically increasing
data
version, availavility of new versions)
4. Changes to existing data (reinstatement - similar to late data
handling)
5. External trigger/notifications
6. Availability of specific instances of data as declared as
mandatory
dependency
7. Availability of a minimum subset of instances of data
declared as
mandatory depedency (at least 10 hourly instances of a day with
24
instances for ex)
8. Valid combinations of the above.

In this context, I would like to propose that we move away from
Oozie
for
the orchestration requirements and have them implemented natively
within
Falcon. It will no doubt make Falcon server bulkier and heavier
in
both
code and deployment, but seems like without it, the orchestration
within
Falcon will be limited by capabilities available within Oozie.

Please do note that this suggestion is restricted to the
scheduling
and
not to the workflow execution.

Would like to hear from fellow developers and users on what your
thoughts
are. Please do chime in with your views.

Regards
Srikanth Sundarrajan




--
Regards,
Venkatesh

“Perfection (in design) is achieved not when there is nothing
more to
add,
but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry







--
Regards,
Venkatesh

“Perfection (in design) is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry
                                        


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to