sjyangkevin commented on PR #51511:
URL: https://github.com/apache/airflow/pull/51511#issuecomment-3166423105

   > If we are concerned about performance issues, should we conduct some 
experiments to see how it affects performance? If it doesn't affect much, I 
think we're good to merge
   
   I attempted to create 100 DAGs (physically created DAG files) with their 
access control being configured with an non-existing role. Then, I enabled 
statsd integration when running Airflow in breeze and collect the following 
metrics.
   
   I think the data isn't clean up after breeze down, and breeze start-airflow, 
so I use a red line to distinguish the two periods.
   
   <img width="1284" height="468" alt="image" 
src="https://github.com/user-attachments/assets/c65a4167-4c8c-4796-b892-27ce97e9d4b3";
 />
   
   ### The time period before the red line
   The metrics for this period were collected when `_sync_dag_perms` was 
executed, regardless of whether a DAG was updated (after applying the change 
introduced in this PR). The number of times we scanned the filesystem and 
queued all existing DAGs increased by roughly 1,000. However, there was no 
significant change in either the “Number of DAG files to be considered for the 
next scan” (after the first scan) or the “Total Parse Time.”
   
   Not sure if these metrics can accurately describe the performance 
considerations. I can continual to learn more about it, but hope it can somehow 
help us to understand the performance impact (or probably I should create more 
DAGs, like 1,000?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to