Multiple crons also don't solve all use cases. I encountered a question on stackoverflow for scheduling daily DAG except for holidays (like Christmas, Labor Day, Independence Day etc.)
On Sun, Jan 24, 2021 at 9:04 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Yep. I agree with Daniel - adding multiple crons is very difficult to > reason about. you can create arbitrary complex declarative way of defining > complex schedule that you will have hard time understanding. We are > already entering the realm of programming the schedule, which IMHO is > better to do in a "programming" language rather than cron declarations. > > J. > > On Sun, Jan 24, 2021 at 7:48 AM Daniel Imberman <daniel.imber...@gmail.com> > wrote: > >> I worry that multiple crons would become difficult to read for stranger >> use-cases (for example "run on the first trading day after the 15th of the >> month"). If we create a python function or class we can easily create a >> "CronTimeTable" that does exactly what Dmitry is suggesting while still >> leaving open the possibility of creating other custom schedules. >> >> On Sat, Jan 23, 2021, 2:32 PM Kaxil Naik <kaxiln...@gmail.com> wrote: >> >>> I think whatever approach we decide on we should display >>> *next_execution_date* in the webserver for each DAG. This would help >>> most of the users. >>> >>> Regards, >>> Kaxil >>> >>> On Sat, Jan 23, 2021 at 10:25 PM Dmitri Khokhlov <dkhokh...@gmail.com> >>> wrote: >>> >>>> Root problem: >>>> - existing Airflow schedule syntax defines only one interval pattern >>>> per DAG >>>> - there are use-cases that need multiple interval patterns per DAG >>>> (during a day etc) >>>> >>>> I vote for "crontab list" solution from Deng Xiaodong. Example: >>>> >>>> *schedule_interval = ["* 0,22,23 * * *", "30 1-21 * * *"] >>>> >>>> Reasoning: >>>> - it is additive change - does not remove or break existing usage >>>> patterns (very important) >>>> - it is generic and it has compact definition - easy to >>>> read/print/present in UI (a string). that is why it is better than >>>> "function" approach. >>>> - it is complete solution as it allows to define interval based >>>> schedules of any complexity. >>>> - it is relatively easy to implement by OR-ing crontabs times and >>>> choosing next earliest run time and following these instructions from Ash >>>> Berlin-Taylor <a...@apache.org>: >>>> " >>>> The way the scheduler works now it just looks at two columns on the dag >>>> (model) table called I think "next_dagrun_after" (which is the earliest >>>> date that the dag run can be created, and "next execution date" (which is >>>> the value to put in the execution date of the dag run when it's created. >>>> >>>> Both these values are set by the dag parser process, which has full >>>> access to run code. What ever interface for defining new schedule >>>> expression should run in the existing process, much like how James C did in >>>> a subclass. >>>> " >>>> -- >>>> Dmitri >>>> >>>> >>>> On 2021/01/21 19:12:06, Daniel Imberman <daniel.imber...@gmail.com> >>>> wrote: >>>> > My only concern with tying this to the dag_parsing process is that >>>> that process might miss SLAs because it takes too long to loop around. I >>>> could imagine a separate thread or component that can read either TimeTable >>>> objects or SmartSensor objects and run them might make sense. >>>> > Ultimately I don’t see anything about SmartSensors that specifically >>>> need to run in a DAG. It could just as easily be while loop or something >>>> embarrasingly parallel (as sensors/timetables shouldn’t depend on each >>>> other). >>>> > >>>> > On Thu, Jan 21, 2021 at 11:07 AM, Vikram Koka <vik...@astronomer.io> >>>> wrote: >>>> > Great discussion. >>>> > I generally agree with the "Custom scheduling class" / subclass >>>> approach which would run as part of the "scheduler" set of processes, >>>> rather than an internal DAG approach. >>>> > I do think it would be good to have boundaries on what information >>>> this class would operate on and at what frequency. This is primarily from a >>>> performance standpoint, though it could be argued that there are security >>>> concerns with that as well. >>>> > Specifically from the "what information would this have access to" >>>> perspective, I think that interface would be helpful in clarifying some of >>>> the use cases and making sure that those are covered. One example I was >>>> thinking about in the "sunset" example is location. I was originally >>>> thinking of a timezone, but this is more specific than that. >>>> > >>>> > >>>> > On Thu, Jan 21, 2021 at 10:35 AM Ash Berlin-Taylor < a...@apache.org [ >>>> a...@apache.org] > wrote: >>>> > It shouldn't need something that complex (or to my mind hacky) as in >>>> internal DAG. >>>> > >>>> > The way the scheduler works now it just looks at two columns on the >>>> dag (model) table called I think "next_dagrun_after" (which is the earliest >>>> date that the dag run can be created, and "next execution date" (which is >>>> the value to put in the execution date of the dag run when it's created. >>>> > >>>> > Both these values are set by the dag parser process, which has full >>>> access to run code. What ever interface for defining new schedule >>>> expression should run in the existing process, much like how James C did in >>>> a subclass. >>>> > >>>> > Ash >>>> > >>>> > On 21 January 2021 18:21:58 GMT, Daniel Imberman < >>>> daniel.imber...@gmail.com [daniel.imber...@gmail.com] > wrote: I think >>>> James Idea sounds like a pretty good idea. What would you all think of us >>>> doing something similar to how we handle smart sensors for how we implement >>>> this? Have an internal DAG that reads all custom timetables and triggers a >>>> DAG if the function returns True? Seems like a pretty simple/customizeable >>>> solution. >>>> > On Wed, Jan 20, 2021 at 5:52 PM, James Timmins < ja...@astronomer.io >>>> [ja...@astronomer.io] > wrote: >>>> > Django provides a really good model for allowing users to customize >>>> the behavior of Class Based Views. It's in line w/ what Daniel/Kaxil and co >>>> are saying about a consistent backend class. It uses a standard base class >>>> as well as a default concrete implementation. Customization then only >>>> requires setting an explicit class if you're overriding the default. >>>> > Seems that the interface is more important than the backend mechanism >>>> to make this work. There are multiple ways to make this work internally, >>>> but the interface should be in line with future plans for hooks/extensible >>>> areas. >>>> > Just to make things concrete, here's my understanding of what that >>>> would look like / what they're suggesting. >>>> > BaseTimetable abstract class - Defines a ` get_next_execution_time ` >>>> method. This method accepts one argument, an arbitrary datetime value. >>>> Based on that datetime, this method returns the next time the DAG should >>>> start. This makes it easy to schedule past events, and also makes it easy >>>> to print out a "dry run" of execution times for testing purposes. - Defines >>>> a '_check_timetable_arguments ` method that looks for any existing >>>> timetable args in the DAG and makes sure they're used by whatever Timetable >>>> class is selected. Error checking. >>>> > CronTimetable - Default TimetableClass. Built on BaseTimetable. >>>> > If they want a different timetable, they can just extend >>>> BaseTimetable and define a custom `get_next_execution_time` class. Then >>>> pass the class into the DAG constructor under the `timetable_class` >>>> argument. So for `sunset` or `sunrise`, they could easily create a >>>> `SolarTimetable` class and pass that in. >>>> > `get_next_execution_time` can then be called whenever DAGs are parsed >>>> or whenever tasks run. >>>> > On Wed, Jan 20, 2021 at 3:53 PM James Coder < jcode...@gmail.com [ >>>> jcode...@gmail.com] > wrote: >>>> > Kaxil you beat me to it. I actually have a dag where I achieve an >>>> irregular schedule by overriding DAG.next [http://DAG.next] >>>> _dagrun_info(). If that method were swapped out for an object it may be a >>>> semi-easy way to make the schedule “plugable”. >>>> > >>>> > James Coder >>>> > On Jan 20, 2021, at 6:37 PM, Kaxil Naik < kaxiln...@gmail.com [ >>>> kaxiln...@gmail.com] > wrote: >>>> > >>>> > "CronBackend" / "ScheduleIntervalBackend" :D similar to Xcom and >>>> Secrets Backend >>>> > Would be definitely good to have Custom Schedule intervals using >>>> functions/class that is Serializable too. >>>> > >>>> > On Wed, Jan 20, 2021 at 11:02 PM QP Hou <q...@scribd.com.invalid> >>>> wrote: >>>> > On Wed, Jan 20, 2021 at 10:22 AM Daniel Imberman >>>> > < daniel.imber...@gmail.com [daniel.imber...@gmail.com] > wrote: >>>> > > >>>> > > I love the idea of allowing users to create their own scheduling >>>> objects/scheduling python functions. They could either live in the >>>> scheduler or as a seperate process that trips some value in the DB when it >>>> is “true”. Would be great from a “marketplace” standpoint as well as users >>>> could post their custom scheduling objects for others to use. >>>> > > >>>> > >>>> > I like this idea as well, a quick escape patch for custom and complex >>>> > scheduling behaviors without having to wait for upstream support. >>>> >>> > > -- > +48 660 796 129 >