Hi Malthe,

I’m all for simplifying the timetable interface. I think currently Timetables 
require a deep understanding of Airflow’s scheduling semantics which is way too 
complex for any Airflow user, plus decisions around e.g. catchup must be 
implemented which I feel should be generic to all Timetables and not left up to 
a developer.

With that said, I’m not sure what you’re suggesting here? Is this email a 
conversation starter? Is it a proposal?

Note: I cannot see the attachment, believe the mailing list doesn’t allow those.

Bas


> On 23 Feb 2022, at 13:44, Malthe <mbo...@gmail.com> wrote:
> 
> Hi all,
> 
> I was going to take a stab at adding some custom timetable
> functionality to address two requirements:
> 
> 1. The ability to temporarily switch to an alternative timetable for
> an interim period.
> 2. The ability to introduce relatively custom holiday scheduling which
> is well outside the functionality of cron expressions.
> 
> I could add that while (1) could be done using Python at the
> DAG-level, I would like to use the timetable interface to allow
> accurate predictions into the future. That's for another post, but to
> give some context, I have floated a proposal on Slack to show
> tentative scheduled days in the calendar view using a "grey dot"
> indication to denote that we expect at least on scheduled run (see
> attachment).
> 
> Now, I looked at the "afterwork" example to see what's required in
> order to implement a custom timetable. And I must admit that I find it
> rather daunting given that it's so easy to express what that timetable
> is about:
> 
>     MON-FRI, daily, run after midnight
> 
> Intuitively, that should be a couple of lines of Python code. It ends
> up being quite a lot more than that and that's due to the interface
> that must be implemented.
> 
> I think the correct timetable interface is:
> 
> 1. Return the next execution time that's strictly (">") after a particular 
> time.
> 2. Return the earliest runtime for a particular execution time,
> accounting for any grace period.
> 3. Return context metadata for this execution.
> 
> The scheduler provides the most recent execution time as input and
> creates a dagrun if the returned earliest runtime for the next
> execution time is at or after the current time.
> 
> Considering again the "afterwork" example, with a grace period of 5
> minutes, we'd expect a dagrun shortly after 5 minutes past midnight
> (of Monday, Tuesday, and so forth up until midnight after Friday). The
> execution time _is_ the time where a given task runs (minus the grace
> period).
> 
> The reasoning behind (3) is because I consider the notion of "data
> interval" to be metadata since this is only a concern for the task
> implementation. For example, the scheduler does not need to worry
> about this at all.
> 
> Other concerns:
> 
> - Backfilling is out of scope for the timetable interface.
> - Time restrictions (i.e. start and end date) are likewise out of
> scope. The scheduler knows when the DAG starts and ends and doesn't
> need help from the timetable here.
> - Manual runs are trivial because there is no (2) or (3). In fact, for
> most DAGs (which care about a data interval), there should probably
> not be a play button at all.
> 
> I didn't complete the exercise, but it stands to reason that with this
> interface, the "afterwork" example would be short and simple given the
> interface outlined above.
> 
> Thanks

Reply via email to