I agree they SLA as they work now is basically non functional, but strongly 
disagree with removing them and think a high level concept such as SLA must 
belong in Airflow.

Ultimately what the vast majority of users want to do with Airflow is process 
data _and make it available_ in a timely fashion. My view on this is reinforced 
by one of the sections of "future work" I mentioned in AIP-48 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-48+Data+Dependency+Management+and+Data+Driven+Scheduling#AIP48DataDependencyManagementandDataDrivenScheduling-Newconceptinproviders:DatasetProvider

"""
Dataset based SLAs

The core Dataset concept is in place as part of this AIP, but there isn't yet 
any support for enhanced SLAs based on Datasets. This is a foundational AIP and 
the support for SLAs specifically around data timeliness will be the focus of a 
follow-on AIP. 
"""

I didn't do a great job there explaining why I think SLA is required (as I took 
it as a given at that point) so let me expand on it here.

Picture this scenario: your dag uses an input dataset that you expect to be 
updated once a day. Since your dag has no schedule anymore (just the new 
schedule_on attribute instead) there's no way for Airflow, nor any monitoring 
system hooked up to it to be able to know how often your dataset should be 
updated. (Remember: a dag can depend on multiple datasets with different 
"schedules".) So we're left with the dag author having to say what the 
acceptable SLA is. And if we're doing that we should handle it ourselves.

In short: Airflow's job is to run data pipelines and make the resulting data 
available to the rest of the business, and to give the operational visibility 
to debug when things go wrong. SLA must be a key part of that toolset and 
shouldn't require running an extra system.

-a

On 12 July 2022 08:42:18 BST, Jarek Potiuk <ja...@potiuk.com> wrote:
>Hey everyone,
>
>I keep on being involved in discussions where people are complaining
>about how bad and useless the SLA feature of Airflow is. And yeah, I
>pretty much agree with it.
>
>Without getting into details of why it is bad - should we possibly
>just, well, deprecate it? I think that would give a much stronger
>signal to our users if they keep on getting warnings that the feature
>is deprecated and when we officially deprecate it in the docs that
>they should not rely on it.
>
>I also think that possibly we do not have to replace it with an
>equivalent/better SLA feature.
>
>I personally think Airflow on its own should not provide such
>SLA/monitoring features, but it should become more of the platform
>that provides useful metrics that will enable other - more dedicated
>systems - to do the job of monitoring and alerting - and the native
>Airflow UI should be more of a "management" than "monitoring".
>
>With (already approved) Open Telemetry support
>https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow
>integrating such monitoring solution should become much easier. Also
>with possible newer and more sophisticated metrics (such as those
>proposed by Ping in
>https://lists.apache.org/thread/g52vk2p7l4nf6on436mbdzwrqstld7jl )
>this opens up to more sophisticated usages, that Airflow will never be
>able to match with built-in SLA/monitoring features.
>
>Also even today there are better ways to achieve SLA functionality -
>Good and successful story about it has been told by Eden from Fyber at
>the Summit:  
>https://airflowsummit.org/sessions/2022/the-slayer-your-data-pipeline-needs/
>
>Making SLA deprecate would give a signal to the users that this is the
>long-term, recommended approach.
>
>J.

Reply via email to