dirrao opened a new issue, #35599: URL: https://github.com/apache/airflow/issues/35599
### Apache Airflow version main (development) ### What happened _list_pods function uses kube list_namespaced_pod and list_pod_for_all_namespaces kube functions. Right now, these Kube functions will get the entire pod spec though we are interested in the pod metadata alone. This _list_pods is refered in clear_not_launched_queued_tasks. try_adopt_task_instances and _adopt_completed_pods functions. When we run the airflow at large scale (with worker pods of more than > 500). The _list_pods function takes a significant amount of time (upto 15 - 30 seconds with 500 worker pods) due to unnecessary data transfer (V1PodList up to a few 10 MBs) and JSON deserialization overhead. This is blocking us from scaling the airflow to run at large scale ### What you think should happen instead Request the Pod metadata instead of entire Pod payload. It will help to reduce significant network data transfer and JSON deserialization overhead. ### How to reproduce I have reproduced the performance issue while running 500 concurrent jobs. Monitor kubernetes_executor.clear_not_launched_queued_tasks.duration and kubernetes_executor.adopt_task_instances.duration metrics. ### Operating System CentOS 6 ### Versions of Apache Airflow Providers apache-airflow-providers-cncf-kubernetes ### Deployment Other Docker-based deployment ### Deployment details Terraform based airflow deployment ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
