+1. Few more relevant asks: 1. Support for "Last Only" option for process scheduling (In addition to LIFO/FIFO), currently oozie has some issues. 2. Support for Singleton process (lock based), the behaviour of all instances of process is same.
Thanks, -Idris On Thu, Jan 1, 2015 at 7:51 PM, Jean-Baptiste Onofré <[email protected]> wrote: > +1 > > Regards > JB > > > On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote: > >> Can we pick up this thread in the new year when folks are back from >> break? I am in total agreement with Venkatesh here. We ought to have a long >> term sustainable approach. Also I feel that the capabilities that we would >> like to enable on falcon and getting them done through oozie in near term >> seems to be a tall ask anyways. >> >> Regards >> Srikanth Sundarrajan >> >> Date: Tue, 23 Dec 2014 16:44:06 -0800 >>> Subject: Re: [DISCUSS] Orchestration in Falcon >>> From: [email protected] >>> To: [email protected] >>> >>> Chugging along with Oozie is bad for Falcon in the long run, for users >>> and >>> developers. Its horribly complex to work through the many rough edges >>> architecturally in Oozie. Look at all the patches for security that I had >>> to fix around Oozie. Its unnecessarily very complex, non-uniform and is >>> NOT >>> meant to be used by another tool like Falcon but was built around end >>> user. >>> >>> This is a good discussion to have - may be explore oozie for short-term >>> but >>> look at alternative solutions for the long-term. >>> >>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan < >>> [email protected]> >>> wrote: >>> >>> @jb, There is no doubt merit in mapping them to oozie if possible and if >>>> extensions are simple and straight forward enough. >>>> >>>> Also had a quick chat offline with Shwetha and she mentioned about some >>>> work happening in Oozie in this regard. On further digging up, found >>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is possibly what >>>> Shwetha was referring to. From the looks of it, this tries to address >>>> item >>>> #7 in the original thread. May be there are more jiras where additional >>>> work such as a-periodic datasets is being worked on. Perhaps @Shwetha >>>> can >>>> throw some light on what is being considered and/or how these >>>> gating/orchestration use cases can be managed. >>>> >>>> Regards >>>> Srikanth Sundarrajan >>>> >>>> Date: Tue, 23 Dec 2014 11:06:24 +0100 >>>>> From: [email protected] >>>>> To: [email protected] >>>>> Subject: Re: [DISCUSS] Orchestration in Falcon >>>>> >>>>> Hi all, >>>>> >>>>> I second Shwetha there. I think we can achieve such features in Oozie >>>>> (with some adaptations). >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> Le 2014-12-23 10:53, Shwetha G S a écrit : >>>>> >>>>>> If we can get rid of oozie entirely, yes we can explore other >>>>>> possibilities. But if we are still going to use oozie for DAG >>>>>> execution, we >>>>>> are going to add add another bottleneck in the whole >>>>>> execution(currently, >>>>>> falcon is not in the workflow execution path) and I don't think its >>>>>> worth >>>>>> it. >>>>>> >>>>>> The features that are outlined above are all available in basic forms >>>>>> in >>>>>> oozie and it should be easy to enhance them/make them as extension >>>>>> points. >>>>>> >>>>>> >>>>>> >>>>>> -Shwetha >>>>>> >>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan >>>>>> <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Here are few more gaps that we ought to solve for while we are on the >>>>>>> subject: >>>>>>> >>>>>>> 1. Ability to attach to start & finish events of workflow execution. >>>>>>> Currently we have post processing hook to listen to finish events, >>>>>>> but >>>>>>> we >>>>>>> do run into scenarios where there are occasional failures with >>>>>>> post-processing and there is potential phase lag in learning about >>>>>>> the >>>>>>> events. >>>>>>> 2. Strict enforcement of concurrency control possibly spanning >>>>>>> process >>>>>>> boundaries. >>>>>>> 3. Ability to tune how backlogs have to be caught up (old instances >>>>>>> to >>>>>>> be >>>>>>> given higher priority, newer instances to be given higher priority, >>>>>>> or >>>>>>> some >>>>>>> sort of weights to allow both to make progress at varying rates). >>>>>>> There >>>>>>> have been asks for routing current vs older instances to different >>>>>>> queues >>>>>>> by users as an alternative. >>>>>>> 4. Ability to have a notion of non-time based feed instances and >>>>>>> related >>>>>>> coordination. >>>>>>> 5. Currently keeping track of and managing SLAs is also a challenge, >>>>>>> but >>>>>>> with #1 addressed, this might be a lesser concern. >>>>>>> >>>>>>> Regards >>>>>>> Srikanth Sundarrajan >>>>>>> >>>>>>> Subject: Re: [DISCUSS] Orchestration in Falcon >>>>>>>> From: [email protected] >>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530 >>>>>>>> To: [email protected] >>>>>>>> >>>>>>>> @venkatesh, the question really is how do we enable these gating pre >>>>>>>> >>>>>>> conditions. Seems hard enough to add them to oozie, but am not >>>>>>> intimately >>>>>>> familiar with oozie to comment on how hard or easy it is. Like I >>>>>>> responded >>>>>>> to @ajay on the same thread, if we are to do away with coordination >>>>>>> through >>>>>>> oozie, we can follow up this discussion with approaches and design. >>>>>>> Though >>>>>>> I had quartz in my mind, wanted to leave that out of discussion to >>>>>>> see >>>>>>> if >>>>>>> there is consensus for moving away from oozie coords and implementing >>>>>>> them >>>>>>> through other means. >>>>>>> >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>> On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" < >>>>>>>>> >>>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>>> What is the purpose of this decoupling? Why build this into >>>>>>>>> >>>>>>>> Falcon? >>>> >>>>> Scheduling is so common that there are dime a dozen schedulers >>>>>>>>> >>>>>>>> today >>>> >>>>> and >>>>>>> >>>>>>>> they are all extensible with custom triggers. Making it part of >>>>>>>>> >>>>>>>> Falcon >>>> >>>>> will >>>>>>> >>>>>>>> suffer the same issues that Oozie has today. >>>>>>>>> >>>>>>>>> I'm sorry but I'm a HUGE -1 to this being built into Falcon >>>>>>>>> >>>>>>>> codebase. >>>> >>>>> >>>>>>>>> However, I'm +1 to reusing Quartz scheduler that already exists - >>>>>>>>> >>>>>>>> stand it >>>>>>> >>>>>>>> up outside or embed it like we do for active MQ. >>>>>>>>> >>>>>>>>> Phase 2 - I'd like to see we write a simple DAG execution layer in >>>>>>>>> >>>>>>>> YARN as >>>>>>> >>>>>>>> an app master with out DB and keeps state on HDFS as an alternate >>>>>>>>> >>>>>>>> to >>>> >>>>> Oozie. >>>>>>> >>>>>>>> >>>>>>>>> Then we will have a nimble falcon which can kick ass. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan < >>>>>>>>> >>>>>>>> [email protected]> >>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hello Team, >>>>>>>>>> >>>>>>>>>> Since its inception Falcon has used Oozie for process >>>>>>>>>> >>>>>>>>> orchestration as >>>> >>>>> well as feed life cycle phase executions, while this has worked >>>>>>>>>> >>>>>>>>> reasonably >>>>>>> >>>>>>>> and allowed to make higher level capabilities available through >>>>>>>>>> >>>>>>>>> Falcon, we >>>>>>> >>>>>>>> are increasing seeing scenarios where this is proving to be a >>>>>>>>>> >>>>>>>>> limiting >>>> >>>>> factor. In its current form, Falcon relies on Oozie for both >>>>>>>>>> >>>>>>>>> scheduling and >>>>>>> >>>>>>>> for workflow execution, due to which the scheduling is limited >>>>>>>>>> >>>>>>>>> to time >>>> >>>>> based/cron based scheduling with additional gating conditions on >>>>>>>>>> >>>>>>>>> data >>>> >>>>> availability. Also this imposes restrictions on datesets being >>>>>>>>>> periodic/cyclic in nature. >>>>>>>>>> >>>>>>>>>> From an orchestration stand point, it would help if we can >>>>>>>>>> >>>>>>>>> support >>>> >>>>> standard gating / scheduling primitives via Falcon: >>>>>>>>>> >>>>>>>>>> 1. Simple periodic scheduling with no gating conditions >>>>>>>>>> 2. Cron based scheduling (day of week, day of the month, specific >>>>>>>>>> >>>>>>>>> hours >>>>>>> >>>>>>>> and non-periodic) with no gating conditions >>>>>>>>>> 3. Availability of new data (assuming monotonically increasing >>>>>>>>>> >>>>>>>>> data >>>> >>>>> version, availavility of new versions) >>>>>>>>>> 4. Changes to existing data (reinstatement - similar to late data >>>>>>>>>> >>>>>>>>> handling) >>>>>>> >>>>>>>> 5. External trigger/notifications >>>>>>>>>> 6. Availability of specific instances of data as declared as >>>>>>>>>> >>>>>>>>> mandatory >>>> >>>>> dependency >>>>>>>>>> 7. Availability of a minimum subset of instances of data >>>>>>>>>> >>>>>>>>> declared as >>>> >>>>> mandatory depedency (at least 10 hourly instances of a day with >>>>>>>>>> >>>>>>>>> 24 >>>> >>>>> instances for ex) >>>>>>>>>> 8. Valid combinations of the above. >>>>>>>>>> >>>>>>>>>> In this context, I would like to propose that we move away from >>>>>>>>>> >>>>>>>>> Oozie >>>> >>>>> for >>>>>>> >>>>>>>> the orchestration requirements and have them implemented natively >>>>>>>>>> >>>>>>>>> within >>>>>>> >>>>>>>> Falcon. It will no doubt make Falcon server bulkier and heavier >>>>>>>>>> >>>>>>>>> in >>>> >>>>> both >>>>>>> >>>>>>>> code and deployment, but seems like without it, the orchestration >>>>>>>>>> >>>>>>>>> within >>>>>>> >>>>>>>> Falcon will be limited by capabilities available within Oozie. >>>>>>>>>> >>>>>>>>>> Please do note that this suggestion is restricted to the >>>>>>>>>> >>>>>>>>> scheduling >>>> >>>>> and >>>>>>> >>>>>>>> not to the workflow execution. >>>>>>>>>> >>>>>>>>>> Would like to hear from fellow developers and users on what your >>>>>>>>>> >>>>>>>>> thoughts >>>>>>> >>>>>>>> are. Please do chime in with your views. >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Srikanth Sundarrajan >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> Venkatesh >>>>>>>>> >>>>>>>>> “Perfection (in design) is achieved not when there is nothing >>>>>>>>> >>>>>>>> more to >>>> >>>>> add, >>>>>>> >>>>>>>> but rather when there is nothing more to take away.” >>>>>>>>> - Antoine de Saint-Exupéry >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Venkatesh >>> >>> “Perfection (in design) is achieved not when there is nothing more to >>> add, >>> but rather when there is nothing more to take away.” >>> - Antoine de Saint-Exupéry >>> >> >> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
