Hi Daniel, Thank you for following up with the assessment. That’s an incredibly valuable data point.
I know we may have some opportunity to talk about this topic more at the summit this week, but just for the sake of offering a reference of the other perspective, I would like to share this blog post where another user describes the SLA feature with the following words: ‘In Airflow’s context, SLA can be seen as “for how long your DAG can run before you need to do something about it”https://poatek.com/2022/10/19/how-to-fix-airflow-sla/ Which highlights the desire to use the SLA feature for the purpose of delay detection, as a soft timeout that does not kill the task and simply executes the defined callback. If what you say is true, and there are indeed folks hoping to use the SLA defined on Airflow for bookkeeping the accurate count of SLA misses because they don’t want to do it outside of it, than I think it will be important for us to discuss at length and decide which of these two motivations we are prioritizing when finalizing the design of the next ‘SLA Feature’. Again, I feel that it is much more simple of an endeavor if we drop the sense of urgency if we are designing for accuracy, and vice versa. Or maybe we are better off having two separate implementations for the two - one that prioritizes urgency, and one that prioritizes accuracy in hindsight. And we can also continue to discuss at length what the difficulty is in trying to achieve both within a single feature as well. Sent from my iPhone > On Sep 19, 2023, at 1:19 AM, Daniel Standish > <daniel.stand...@astronomer.io.invalid> wrote: > > I was able to chat with a couple folks about this. Small sample, but the > sentiment was, "this is just a timeout". In other words, if we're going to > call this SLA, we really ought to evaluate against the "this thing should > have run by" time and not the actual start time. And, ideally, we should > also have a way to enforce "this should have run by X time daily" (for > example) even when it's a dataset-triggered or API-triggered dag with *no* > schedule. > > Like I said, it's a small number of folks I've talked to, so I don't have > overwhelming confidence about this assessment. But I do think it's more > likely than not that this would be the prevailing assessment were somehow > able to get better data on this.