What is the purpose of this decoupling? Why build this into Falcon? Scheduling is so common that there are dime a dozen schedulers today and they are all extensible with custom triggers. Making it part of Falcon will suffer the same issues that Oozie has today.
I'm sorry but I'm a HUGE -1 to this being built into Falcon codebase. However, I'm +1 to reusing Quartz scheduler that already exists - stand it up outside or embed it like we do for active MQ. Phase 2 - I'd like to see we write a simple DAG execution layer in YARN as an app master with out DB and keeps state on HDFS as an alternate to Oozie. Then we will have a nimble falcon which can kick ass. On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan <[email protected]> wrote: > Hello Team, > > Since its inception Falcon has used Oozie for process orchestration as > well as feed life cycle phase executions, while this has worked reasonably > and allowed to make higher level capabilities available through Falcon, we > are increasing seeing scenarios where this is proving to be a limiting > factor. In its current form, Falcon relies on Oozie for both scheduling and > for workflow execution, due to which the scheduling is limited to time > based/cron based scheduling with additional gating conditions on data > availability. Also this imposes restrictions on datesets being > periodic/cyclic in nature. > > From an orchestration stand point, it would help if we can support > standard gating / scheduling primitives via Falcon: > > 1. Simple periodic scheduling with no gating conditions > 2. Cron based scheduling (day of week, day of the month, specific hours > and non-periodic) with no gating conditions > 3. Availability of new data (assuming monotonically increasing data > version, availavility of new versions) > 4. Changes to existing data (reinstatement - similar to late data handling) > 5. External trigger/notifications > 6. Availability of specific instances of data as declared as mandatory > dependency > 7. Availability of a minimum subset of instances of data declared as > mandatory depedency (at least 10 hourly instances of a day with 24 > instances for ex) > 8. Valid combinations of the above. > > In this context, I would like to propose that we move away from Oozie for > the orchestration requirements and have them implemented natively within > Falcon. It will no doubt make Falcon server bulkier and heavier in both > code and deployment, but seems like without it, the orchestration within > Falcon will be limited by capabilities available within Oozie. > > Please do note that this suggestion is restricted to the scheduling and > not to the workflow execution. > > Would like to hear from fellow developers and users on what your thoughts > are. Please do chime in with your views. > > Regards > Srikanth Sundarrajan > -- Regards, Venkatesh “Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away.” - Antoine de Saint-Exupéry
