I would not consider it a bug to have the latest data interval run when you enable a DAG that is set to catchup=False.
I have legitimate use for that feature by having my production environment have catchup_by_default=True but my lower environments are using catchup_by_default=False, meaning if I want to test the DAG behavior *as scheduled* in a lower environment I can just enable the DAG. For example, in a staging environment if I need to test out the functionality of a DAG that was scheduled for @monthly and there was no way to test the most recent data interval, than to test a true data interval of the DAG it could be many days, even weeks until they will occur. Triggering a DAG won’t run the latest data interval, it will use the current time as the logical_date, right? So that will won’t let me test a single *as scheduled* data interval. So in that @monthly senecio it will be impossible for me to test the functionality of a single data interval unless I wait multiple weeks. I see there could be a desire to not run the latest data interval and just start with whatever full interval follows the DAG being turned on. However I think that should be configurable, not fixed permanently. Alternatively it could be ideal to have a way to trigger a specific run for a catchup=False DAG that just got enabled by adding a 3d option to the trigger button drop down to trigger a past scheduled run. Then in that dialog the form can default to the most recent full data interval but then let you also specify a specific past interval based on the DAG's schedule. I often had to debug a DAG in production and I wanted to trigger a specific past data interval, not just the most recent. Alex Begg On Thu, Mar 17, 2022 at 4:58 PM Larry Komenda < [email protected]> wrote: > I agree with this. I'd much rather have to trigger a single manual run the > first time I enable a DAG than to either wait to enable until after I want > it to run or by editing the start_date of the DAG itself. > > I'd be in favor of adjusting this behavior either permanently or by a > configuration. > > On Fri, Mar 4, 2022 at 3:00 PM Philippe Lanoe <[email protected]> > wrote: > >> Hello Daniel, >> >> Thank you for your answer. In your example, as I experienced, the first >> run would not be 2010-01-01 but 2022-03-03, 00:00:00 (it is currently March >> 4 - 21:00 here), which is the execution date corresponding to the start of >> the previous data interval, but the result is the same: an undesired dag >> run. (For instance, in case of cron schedule '00 22 * * *', one dagrun >> would be started immediately with execution date of 2022-03-02, 22:00:00) >> >> I also agree with you that it could be categorized as a bug and I would >> also vote for a fix. >> >> Would be great to have the feedback of others on this. >> >> On Fri, Mar 4, 2022 at 6:17 PM Daniel Standish >> <[email protected]> wrote: >> >>> You are saying, when you turn on for the first time a dag with >>> e.g. @daily schedule, and catchup = False, if start date is 2010-01-01, >>> then it would run first the 2010-01-01 run, then the current run (whatever >>> yesterday is)? That sounds familiar. >>> >>> Yeah I don't like that behavior. I agree that, as you say, it's not >>> the intuitive behavior. Seems it could reasonably be categorized as a >>> bug. I'd prefer we just "fix" it rather than making it configurable. But >>> some might have concerns re backcompat. >>> >>> What do others think? >>> >>> >>>
