Once again - why is it bad to set a start_date in the future, when -
well - you **actually** want to run the first interval in the future ?
What prevents you from setting the start-date to be a fixed time in
the future, where the start date is within the interval you want to
start first? Is it just "I do not want to specify conveniently
whatever past date will be easy to type?"
If this is the only reason,  then it has a big drawback - because
"start_date" is **actually** supposed to be the piece of metadata for
the DAG that will tell you what was the intention of the DAG writer on
when to start it. And precisely one that allows you to start things in
the future.

Am I missing something?

On Sun, Mar 20, 2022 at 7:42 PM Larry Komenda
<[email protected]> wrote:
>
> Alex, that's a good point regarding the need to run a DAG for the most recent 
> schedule interval right away. I hadn't thought of that scenario as I haven't 
> needed to build a DAG with that large of a scheduling gap. In that case I 
> agree with you - it seems like it would make more sense to make this 
> configurable.
>
> Perhaps there could be an additional DAG-level parameter that could be set 
> alongside "catchup" to control this behavior. Or there could be a new 
> parameter that could eventually replace "catchup" that supported 3 options - 
> "catchup", "run most recent interval only", and "run next interval only".
>
> On Sat, Mar 19, 2022 at 1:02 PM Alex Begg <[email protected]> wrote:
>>
>> I would not consider it a bug to have the latest data interval run when you 
>> enable a DAG that is set to catchup=False.
>>
>> I have legitimate use for that feature by having my production environment 
>> have catchup_by_default=True but my lower environments are using 
>> catchup_by_default=False, meaning if I want to test the DAG behavior as 
>> scheduled in a lower environment I can just enable the DAG.
>>
>> For example, in a staging environment if I need to test out the 
>> functionality of a DAG that was scheduled for @monthly and there was no way 
>> to test the most recent data interval, than to test a true data interval of 
>> the DAG it could be many days, even weeks until they will occur.
>>
>> Triggering a DAG won’t run the latest data interval, it will use the current 
>> time as the logical_date, right? So that will won’t let me test a single as 
>> scheduled data interval. So in that @monthly senecio it will be impossible 
>> for me to test the functionality of a single data interval unless I wait 
>> multiple weeks.
>>
>> I see there could be a desire to not run the latest data interval and just 
>> start with whatever full interval follows the DAG being turned on. However I 
>> think that should be configurable, not fixed permanently.
>>
>> Alternatively it could be ideal to have a way to trigger a specific run for 
>> a catchup=False DAG that just got enabled by adding a 3d option to the 
>> trigger button drop down to trigger a past scheduled run. Then in that 
>> dialog the form can default to the most recent full data interval but then 
>> let you also specify a specific past interval based on the DAG's schedule. I 
>> often had to debug a DAG in production and I wanted to trigger a specific 
>> past data interval, not just the most recent.
>>
>> Alex Begg
>>
>> On Thu, Mar 17, 2022 at 4:58 PM Larry Komenda 
>> <[email protected]> wrote:
>>>
>>> I agree with this. I'd much rather have to trigger a single manual run the 
>>> first time I enable a DAG than to either wait to enable until after I want 
>>> it to run or by editing the start_date of the DAG itself.
>>>
>>> I'd be in favor of adjusting this behavior either permanently or by a 
>>> configuration.
>>>
>>> On Fri, Mar 4, 2022 at 3:00 PM Philippe Lanoe <[email protected]> 
>>> wrote:
>>>>
>>>> Hello Daniel,
>>>>
>>>> Thank you for your answer. In your example, as I experienced, the first 
>>>> run would not be 2010-01-01 but 2022-03-03, 00:00:00 (it is currently 
>>>> March 4 - 21:00 here), which is the execution date corresponding to the 
>>>> start of the previous data interval, but the result is the same: an 
>>>> undesired dag run. (For instance, in case of cron schedule '00 22 * * *', 
>>>> one dagrun would be started immediately with execution date of 2022-03-02, 
>>>> 22:00:00)
>>>>
>>>> I also agree with you that it could be categorized as a bug and I would 
>>>> also vote for a fix.
>>>>
>>>> Would be great to have the feedback of others on this.
>>>>
>>>> On Fri, Mar 4, 2022 at 6:17 PM Daniel Standish 
>>>> <[email protected]> wrote:
>>>>>
>>>>> You are saying, when you turn on for the first time a dag with e.g. 
>>>>> @daily schedule, and catchup = False, if start date is 2010-01-01, then 
>>>>> it would run first the 2010-01-01 run, then the current run (whatever 
>>>>> yesterday is)?  That sounds familiar.
>>>>>
>>>>> Yeah I don't like that behavior.  I agree that, as you say, it's not the 
>>>>> intuitive behavior.  Seems it could reasonably be categorized as a bug.  
>>>>> I'd prefer we just "fix" it rather than making it configurable.  But some 
>>>>> might have concerns re backcompat.
>>>>>
>>>>> What do others think?
>>>>>
>>>>>

Reply via email to