Hi all,

I was going to take a stab at adding some custom timetable
functionality to address two requirements:

1. The ability to temporarily switch to an alternative timetable for
an interim period.
2. The ability to introduce relatively custom holiday scheduling which
is well outside the functionality of cron expressions.

I could add that while (1) could be done using Python at the
DAG-level, I would like to use the timetable interface to allow
accurate predictions into the future. That's for another post, but to
give some context, I have floated a proposal on Slack to show
tentative scheduled days in the calendar view using a "grey dot"
indication to denote that we expect at least on scheduled run (see
attachment).

Now, I looked at the "afterwork" example to see what's required in
order to implement a custom timetable. And I must admit that I find it
rather daunting given that it's so easy to express what that timetable
is about:

     MON-FRI, daily, run after midnight

Intuitively, that should be a couple of lines of Python code. It ends
up being quite a lot more than that and that's due to the interface
that must be implemented.

I think the correct timetable interface is:

1. Return the next execution time that's strictly (">") after a particular time.
2. Return the earliest runtime for a particular execution time,
accounting for any grace period.
3. Return context metadata for this execution.

The scheduler provides the most recent execution time as input and
creates a dagrun if the returned earliest runtime for the next
execution time is at or after the current time.

Considering again the "afterwork" example, with a grace period of 5
minutes, we'd expect a dagrun shortly after 5 minutes past midnight
(of Monday, Tuesday, and so forth up until midnight after Friday). The
execution time _is_ the time where a given task runs (minus the grace
period).

The reasoning behind (3) is because I consider the notion of "data
interval" to be metadata since this is only a concern for the task
implementation. For example, the scheduler does not need to worry
about this at all.

Other concerns:

- Backfilling is out of scope for the timetable interface.
- Time restrictions (i.e. start and end date) are likewise out of
scope. The scheduler knows when the DAG starts and ends and doesn't
need help from the timetable here.
- Manual runs are trivial because there is no (2) or (3). In fact, for
most DAGs (which care about a data interval), there should probably
not be a play button at all.

I didn't complete the exercise, but it stands to reason that with this
interface, the "afterwork" example would be short and simple given the
interface outlined above.

Thanks

Reply via email to