Considering the current framing of the problem/solution - as a means to address 
poorly written DAGs - I agree with Elad's perspective. Nevertheless, I believe 
that caching for the secrets backend has the potential to enhance Airflow 
performance even for DAGs adhering to best practices, particularly when applied 
to connections and configurations in addition to variables. IMO the proposed 
solution can be easily extended to include connections and configurations - 
@Vandon, Raphael, please feel free to correct me if I am mistaken. Many Airflow 
users aim to manage hundreds or even thousands of tasks concurrently. Dynamic 
Task Mapping greatly simplifies the process of creating scalable tasks, and if 
each task requires a connection (or a variable), a secrets backend cache could 
prove beneficial. This is just one example. I am eager to hear opinions on why 
we should not implement this - are there any drawbacks to this approach? 
Ultimately, this solution provides users the option to utilize caching if they 
prefer. 

> I would also propose an alternative fix: let's come up with a way to tell 
> Airflow to not continuously reparse a file!
I wholeheartedly agree with this. @Vandon, Raphael, this could be the next 
optimization you consider.  

Shubham

On 2023-03-23, 4:34 PM, "Ash Berlin-Taylor" <[email protected] 
<mailto:[email protected]>> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






I second Elad's view here.


I would also propose an alternative fix: let's come up with a way to tell 
Airflow to not continuously reparse a file!


A strawman example:


```
from airflow import ReparseMode, DAG


AIRFLOW_REPARSE = ReparseMode.ON_FILE_CHANGED


with DAG(...):
....
```


We could change things so that if that top level value is seen Airflow stops 
parsing that file unless a change is detected.


We could also add a configuration option to set a default mode for all dag 
files that don't have an explicit mode set.


Not only cos this put less load/cost on the secrets backend, it also puts less 
load on the Dag parser itself! Win win.


-ash


On 23 March 2023 22:56:11 GMT, Elad Kalif <[email protected] 
<mailto:[email protected]>> wrote:
>To me Airflow is not "open and play". It is not a closed system that guides
>you how to develop. Workflow as code requires more skills.
>There are stuff to learn before authoring dags.
>This discussion is about Variables but I might ask similar question about
>users who query API as part of top level code. This is also very bad and
>not covered by current discussion.
>
>In my prespective users who hit this issue are likely also to hit many
>other issues. Its a good sign that they should try to learn Airflow better
>before continue development. I don't think a solution involves in hinding
>the issue is a good one.
>
>For now I'm -1 for making complecated changes to mitigate this. However I
>am infavor of finding a way to alert users that we detected a possible bad
>practice and we advise to recheck the code (that also requires some
>thought.. I don't know if bumping many warnings in the main UI is the right
>way...)
>
>בתאריך יום ו׳, 24 במרץ 2023, 00:25, מאת Vandon, Raphael
>‏<[email protected] <mailto:[email protected]>lid>:
>
>> Hello,
>> I’d like to submit to discussion the idea of having a cache on Airflow
>> Variables.
>> The aim is to reduce DAG parsing time and Secret Manager API bill for
>> users who make a liberal use of Variable.get in their DAG files.
>> The drawback, of course, is that caches introduce a delay in how fast
>> changes are propagated.
>>
>> For improved readability, I kindly invite you to read and comment in the
>> github discussion : https://github.com/apache/airflow/discussions/30265 
>> <https://github.com/apache/airflow/discussions/30265>
>>
>> Cheers
>>
>>



Reply via email to