Thanks for bringing this up. I agree SLAs have been broken forever and try to stay away from them.
However, I do see people trying to use them (not knowing it's broken). While I appreciate the effort others have made to build a system around Airflow for correct SLA alerting, I think Airflow is the right place to check for SLAs — Airflow starts and manages other processes after all, which to me feels like the place for SLA decisions to be made. I have some implementation ideas but will save those for later, curious to hear thoughts first. Bas > On 12 Jul 2022, at 09:42, Jarek Potiuk <ja...@potiuk.com> wrote: > > Hey everyone, > > I keep on being involved in discussions where people are complaining > about how bad and useless the SLA feature of Airflow is. And yeah, I > pretty much agree with it. > > Without getting into details of why it is bad - should we possibly > just, well, deprecate it? I think that would give a much stronger > signal to our users if they keep on getting warnings that the feature > is deprecated and when we officially deprecate it in the docs that > they should not rely on it. > > I also think that possibly we do not have to replace it with an > equivalent/better SLA feature. > > I personally think Airflow on its own should not provide such > SLA/monitoring features, but it should become more of the platform > that provides useful metrics that will enable other - more dedicated > systems - to do the job of monitoring and alerting - and the native > Airflow UI should be more of a "management" than "monitoring". > > With (already approved) Open Telemetry support > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow > integrating such monitoring solution should become much easier. Also > with possible newer and more sophisticated metrics (such as those > proposed by Ping in > https://lists.apache.org/thread/g52vk2p7l4nf6on436mbdzwrqstld7jl ) > this opens up to more sophisticated usages, that Airflow will never be > able to match with built-in SLA/monitoring features. > > Also even today there are better ways to achieve SLA functionality - > Good and successful story about it has been told by Eden from Fyber at > the Summit: > https://airflowsummit.org/sessions/2022/the-slayer-your-data-pipeline-needs/ > > Making SLA deprecate would give a signal to the users that this is the > long-term, recommended approach. > > J.