Hi Malthe, I’m all for simplifying the timetable interface. I think currently Timetables require a deep understanding of Airflow’s scheduling semantics which is way too complex for any Airflow user, plus decisions around e.g. catchup must be implemented which I feel should be generic to all Timetables and not left up to a developer.
With that said, I’m not sure what you’re suggesting here? Is this email a conversation starter? Is it a proposal? Note: I cannot see the attachment, believe the mailing list doesn’t allow those. Bas > On 23 Feb 2022, at 13:44, Malthe <mbo...@gmail.com> wrote: > > Hi all, > > I was going to take a stab at adding some custom timetable > functionality to address two requirements: > > 1. The ability to temporarily switch to an alternative timetable for > an interim period. > 2. The ability to introduce relatively custom holiday scheduling which > is well outside the functionality of cron expressions. > > I could add that while (1) could be done using Python at the > DAG-level, I would like to use the timetable interface to allow > accurate predictions into the future. That's for another post, but to > give some context, I have floated a proposal on Slack to show > tentative scheduled days in the calendar view using a "grey dot" > indication to denote that we expect at least on scheduled run (see > attachment). > > Now, I looked at the "afterwork" example to see what's required in > order to implement a custom timetable. And I must admit that I find it > rather daunting given that it's so easy to express what that timetable > is about: > > MON-FRI, daily, run after midnight > > Intuitively, that should be a couple of lines of Python code. It ends > up being quite a lot more than that and that's due to the interface > that must be implemented. > > I think the correct timetable interface is: > > 1. Return the next execution time that's strictly (">") after a particular > time. > 2. Return the earliest runtime for a particular execution time, > accounting for any grace period. > 3. Return context metadata for this execution. > > The scheduler provides the most recent execution time as input and > creates a dagrun if the returned earliest runtime for the next > execution time is at or after the current time. > > Considering again the "afterwork" example, with a grace period of 5 > minutes, we'd expect a dagrun shortly after 5 minutes past midnight > (of Monday, Tuesday, and so forth up until midnight after Friday). The > execution time _is_ the time where a given task runs (minus the grace > period). > > The reasoning behind (3) is because I consider the notion of "data > interval" to be metadata since this is only a concern for the task > implementation. For example, the scheduler does not need to worry > about this at all. > > Other concerns: > > - Backfilling is out of scope for the timetable interface. > - Time restrictions (i.e. start and end date) are likewise out of > scope. The scheduler knows when the DAG starts and ends and doesn't > need help from the timetable here. > - Manual runs are trivial because there is no (2) or (3). In fact, for > most DAGs (which care about a data interval), there should probably > not be a play button at all. > > I didn't complete the exercise, but it stands to reason that with this > interface, the "afterwork" example would be short and simple given the > interface outlined above. > > Thanks