ROVAN1220 commented on issue #34877: URL: https://github.com/apache/airflow/issues/34877#issuecomment-1759924519
It seems like you've identified a performance bottleneck in your Airflow setup when running on a large Kubernetes cluster with a high number of queued tasks. Your proposed solution of making batch calls to get all the Airflow worker pods instead of making individual calls for each task is a reasonable approach to address the issue. Here are some steps you can take to optimize the situation: Batch API Calls: Modify the clear_not_launched_queued_tasks function to make batch calls to the Kubernetes API to fetch information about all the Airflow worker pods. This will significantly reduce the overhead of making individual API requests for each queued task. Optimize Query Filters: When querying for pods, ensure you use efficient filters to fetch only the necessary information. For example, you may want to filter by labels or other criteria to narrow down the list of relevant pods. Caching: Consider implementing a caching mechanism to store information about worker pods, so you don't need to query the Kubernetes API every time the function runs. You can set up a cache expiration strategy to periodically refresh the pod information. Throttling: If the Kubernetes API calls are still causing performance issues, you can implement a throttling mechanism to limit the frequency and number of API calls. This can help balance the load on the API server. Scaling Resources: In case the Kubernetes cluster is continually growing or experiencing resource constraints, consider scaling your cluster to ensure that it can handle the increased workload efficiently. Tune Airflow Scheduler Settings: Review the Airflow scheduler settings and parameters to optimize its performance. For example, you can adjust the scheduling_interval, max_threads, or other configuration options to better align with your cluster's capacity. Asynchronous Processing: If possible, you may explore asynchronous processing for certain tasks that don't need immediate scheduling, which can help reduce the load on the scheduler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org