james-seymour-cubiko opened a new issue, #29416: URL: https://github.com/apache/airflow/issues/29416
### Description Optionally allow a task pool to count tasks in the 'deferred' state as occupying slots in that pool - not sure what the best way of implementing this is, but currently my very hacky solution is to patch the `airflow.models.pool.Pool.slots_stats` method to include deferred tasks as running in each pool. ### Use case/motivation The prototypical usecase here is using Airflow to limit the number of concurrent queries executing against a database while keeping the benefit of waiting for those queries to complete on a triggerer (where a proxy is used to execute queries instead of a direct connection to the db) In our case, we use Airflow to orchestrate an Azure Data Factory that executes queries against a database and moves the resulting data. We have an airflow task trigger a single pipeline run in that data factory, which then defers and waits for that pipeline run to complete in the triggerer (for efficiency) before continuing the dag run. However, we have ~100 tasks that all execute a pipeline run on the same factory - ideally we would execute all of these pipelines concurrently, but the database is quickly overwhelmed by that many queries at the same time, resulting in timeouts. Therefore, the next best option is to limit the concurrency of those queries with a task pool in Airflow. This _can_ currently be achieved with Airflow's task pools, but only if we keep each of those tasks in the running state while waiting each query to complete (as deferred tasks do not occupy slots in the task pool). Otherwise, if we defer the tasks while waiting, then we lose the concurrency limits of the pool, as all ~100 tasks are free to defer at the same time, so its currently an either / or solution. I am aware that in this specific case that ADF does support a maximum pipeline run concurrency setting, which is a much quicker way to solve this problem, but we have other extraction tools that we can't rely on to limit concurrency in this way, and I thought I would just throw this idea out here anyway in case others might find it helpful :) ### Related issues Somewhat related - https://github.com/apache/airflow/issues/15082 ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
