I like the idea of supporting start_date=None, but that absolutely should not mean that we interpret start_date as “now”. start_date=now is one of the most common ways to shoot yourself in the foot writing DAGs. I think interpreting start_date=None as “don’t do any sort of catchup and run the next time you’re able” makes some amount of sense, but I like Philippe’s idea a little more. Specifically, it seems like bool is simply not a correct type for catchup, as we can describe at least 3 behaviors that make sense. What if we change the default type to string, and support bool as a legacy at least until 3.0?
Catchup="all" (or True): run all intervals. Make "all" the default. Catchup="none" : do not run any past interval Catchup="last" (or False) run only the most recent interval On Tue, Mar 22, 2022 at 1:15 PM Daniel Standish <[email protected]> wrote: > There's some wiggliness here because of Airflow's behavior of actually > *running* the dag at the end of the interval rather than the start. So > if we have start_date=None, then we default the start date to *now,* then > maybe to be consistent, the first run needs to be not 00:00 tomorrow but > 00:00 the next day. The oddness is amplified when you consider a monthly > dag, where if you deploy now, start date is now, first schedulable run is > next month, therefore first run _more_ than a month away. To fix this I > think we need to add support in our timetables for running at the start of > the interval instead of the end -- and I think this is something that > timetables were introduced to support anyway. > > >
