xBis7 commented on PR #54103:
URL: https://github.com/apache/airflow/pull/54103#issuecomment-3162487166

   > But what you are doing is, limiting how many can be active, essentially, 
it seems.
   
   @dstandish It might look like it but it's not. The number of tasks to be 
queued is still the same as before but with these changes we are doing the 
calculations up front. Let me explain.
   
   Based on the existing code
   1. We run a query to fetch SCHEDULED tasks
   2. We iterate over the tasks from the query result
   3. During each iteration, we check concurrency limits (like the value of 
`max_active_tasks`) and queue the task when possible
   
   With this change, the query will consider the concurrency limits when 
bringing tasks for a dag from the db. As a result, all the tasks returned by 
the query will be valid for becoming active.
   
   Let me provide an example. Let's assume that 
   * we can examine 30 tasks per query
   * the order of SCHEDULED tasks based on priorities and earlier checks, is 80 
from `dag1`, 20 from `dag2` and 30 from `dag3`
   * each dag can have up to 10 active tasks.
   
   The current code will
   * pick 30 tasks from `dag1`
   * start iterating over them
   * queue the 10 first out of the 30 and then continue iterating
       * it will fail to queue any more because of the 10 tasks per dag limit.
   
   In the next query it will bring another 30 from `dag1` but it won't be able 
to queue any of them and then it will move on to check `dag2`.
   
   With the new code
   * the query will consider the 10 active tasks limit
   * instead of getting 30 tasks from `dag1` it will get 10 tasks
   * we have to reach the 30 limit per query
   * it will get another 10 from `dag2`
   * it will get another 10 from `dag3`
   
   The tasks for examining will still be 30 but they will all be valid for 
queueing and they will become active.
   
   The above example is a simplification because we are also considering how 
many tasks are currently active for a dag. 
    
   I would use `max_active_tasks` but it has been changed to point to the per 
`dag_run` limit and I was aiming for a per dag limit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to