sjyangkevin commented on PR #51511: URL: https://github.com/apache/airflow/pull/51511#issuecomment-3166423105
> If we are concerned about performance issues, should we conduct some experiments to see how it affects performance? If it doesn't affect much, I think we're good to merge I attempted to create 100 DAGs (physically created DAG files) with their access control being configured with an non-existing role. Then, I enabled statsd integration when running Airflow in breeze and collect the following metrics. I think the data isn't clean up after breeze down, and breeze start-airflow, so I use a red line to distinguish the two periods. <img width="1284" height="468" alt="image" src="https://github.com/user-attachments/assets/c65a4167-4c8c-4796-b892-27ce97e9d4b3" /> ### The time period before the red line The metrics for this period were collected when `_sync_dag_perms` was executed, regardless of whether a DAG was updated (after applying the change introduced in this PR). The number of times we scanned the filesystem and queued all existing DAGs increased by roughly 1,000. However, there was no significant change in either the “Number of DAG files to be considered for the next scan” (after the first scan) or the “Total Parse Time.” Not sure if these metrics can accurately describe the performance considerations. I can continual to learn more about it, but hope it can somehow help us to understand the performance impact (or probably I should create more DAGs, like 1,000?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
