Re: Simplifying the timetable interface

Malthe Wed, 23 Feb 2022 07:18:16 -0800

> One comment: Please don't use the phrase "execution time" as it's not clear 
> which of the possible meanings it could be (is it the old exectuion_date? Is 
> it the time the dagrun actually starts?)


Agreed. I guess it's the dagrun's logical_date, its time identity.

> Backfilling is not out of scope for a timetable at all. If I run `airflow 
> dags backfill mydagid --start-date 2020-01-01 --end-date 2021-06-30` how many 
> DagRuns are created and what are logical dates/intervals of them?

If the timetable has a daily frequency, then one dagrun per day in
that interval.

> And in case of the very first time a Dag is enabled? I guess it could pass 
> the dag start_date here instead?

Yes, I think the scheduler will pass in start_date, and out comes the
first logical_date which is always strictly after start_date. In the
workday example, you would put as start_date Monday (i.e., morning)
and that would give the first logical date as Monday midnight (i.e.,
evening).

> (Implementation detail: It stores the info in 4 columns in the DagModel 
> table, next_dagrun, next_dagrun_interval_start and _end, and 
> next_dagrun_create_after so that the creation of the DagRun can just be done 
> as a DB lookup. Doesn't materially change your statement)

I suppose that's an optimization to avoid querying for the latest
dagrun over and over, although I would think some caching mechanism
ought to work just as well. But I don't know the specifics about why
it was decided to materialize those on the DagModel.

> If manual triggered dag runs are out of scope, what is the data_interval_end 
> and data_interval_start values (in the context/templates) for  a manually 
> triggered run?

That would be undefined – as in, those variables would not be in the
context at all for a manually triggered run.

> It's not possible from the UI currently, but `airflow dags trigger` can be 
> provided with a specific execution date -- Maybe this should be extended to 
> take the data interval too -- but in terms of User-friendly CLI inferring it 
> if not provided makes it easier to use. (A timetable could choose to return 
> an error for the infer method)

That ought to work fine because that is just manually specifying the
result of (1) – and steps (2) and (3) can still run as normal. It
might be nice to validate that the suggested logical_date is
compatible with the timetable but perhaps that is up to the timetable
to decide.

> Next question about how go about implementing and releasing this? Now that 
> it's been in a release we can't just break backcompat, so either we need to 
> make this a "Base" template that handles most of this logic, or we need to 
> introspect and tell old Timetable from new.

That's a tricky part :-)

Malthe

Re: Simplifying the timetable interface

Reply via email to