Considering the current framing of the problem/solution - as a means to address poorly written DAGs - I agree with Elad's perspective. Nevertheless, I believe that caching for the secrets backend has the potential to enhance Airflow performance even for DAGs adhering to best practices, particularly when applied to connections and configurations in addition to variables. IMO the proposed solution can be easily extended to include connections and configurations - @Vandon, Raphael, please feel free to correct me if I am mistaken. Many Airflow users aim to manage hundreds or even thousands of tasks concurrently. Dynamic Task Mapping greatly simplifies the process of creating scalable tasks, and if each task requires a connection (or a variable), a secrets backend cache could prove beneficial. This is just one example. I am eager to hear opinions on why we should not implement this - are there any drawbacks to this approach? Ultimately, this solution provides users the option to utilize caching if they prefer.
> I would also propose an alternative fix: let's come up with a way to tell > Airflow to not continuously reparse a file! I wholeheartedly agree with this. @Vandon, Raphael, this could be the next optimization you consider. Shubham On 2023-03-23, 4:34 PM, "Ash Berlin-Taylor" <[email protected] <mailto:[email protected]>> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. I second Elad's view here. I would also propose an alternative fix: let's come up with a way to tell Airflow to not continuously reparse a file! A strawman example: ``` from airflow import ReparseMode, DAG AIRFLOW_REPARSE = ReparseMode.ON_FILE_CHANGED with DAG(...): .... ``` We could change things so that if that top level value is seen Airflow stops parsing that file unless a change is detected. We could also add a configuration option to set a default mode for all dag files that don't have an explicit mode set. Not only cos this put less load/cost on the secrets backend, it also puts less load on the Dag parser itself! Win win. -ash On 23 March 2023 22:56:11 GMT, Elad Kalif <[email protected] <mailto:[email protected]>> wrote: >To me Airflow is not "open and play". It is not a closed system that guides >you how to develop. Workflow as code requires more skills. >There are stuff to learn before authoring dags. >This discussion is about Variables but I might ask similar question about >users who query API as part of top level code. This is also very bad and >not covered by current discussion. > >In my prespective users who hit this issue are likely also to hit many >other issues. Its a good sign that they should try to learn Airflow better >before continue development. I don't think a solution involves in hinding >the issue is a good one. > >For now I'm -1 for making complecated changes to mitigate this. However I >am infavor of finding a way to alert users that we detected a possible bad >practice and we advise to recheck the code (that also requires some >thought.. I don't know if bumping many warnings in the main UI is the right >way...) > >בתאריך יום ו׳, 24 במרץ 2023, 00:25, מאת Vandon, Raphael ><[email protected] <mailto:[email protected]>lid>: > >> Hello, >> I’d like to submit to discussion the idea of having a cache on Airflow >> Variables. >> The aim is to reduce DAG parsing time and Secret Manager API bill for >> users who make a liberal use of Variable.get in their DAG files. >> The drawback, of course, is that caches introduce a delay in how fast >> changes are propagated. >> >> For improved readability, I kindly invite you to read and comment in the >> github discussion : https://github.com/apache/airflow/discussions/30265 >> <https://github.com/apache/airflow/discussions/30265> >> >> Cheers >> >>
