Nataneljpwd opened a new pull request, #55797: URL: https://github.com/apache/airflow/pull/55797
This PR fixes an issue we noticed, where we had Negative open slots for our executors in our metrics. As it can be seen in here, the metric is calculated from the parallelism minus the length of `self.running`. <img width="2258" height="246" alt="image" src="https://github.com/user-attachments/assets/5384bd42-faf4-4cbc-9b58-12fbb299e78c" /> In the K8S Executor, it is updated in only a few places, where we create tasks in the methods `self.adopt_launched_tasks` and `self._adopt_completed_tasks`. Where the first adds tasks to running when they are started, as they were just set to running, and the latter, adopts tasks which are completed, so that the K8S Watchers in `AirflowKubernetesScheduler` can delete the pods, this can cause an issue, where we can have negative amount of open slots for the K8S Executor, and it also means that we might drop tasks which we could set to running, just because completed tasks occupied their spot (as it is the check done by the SchedulerJobRunner, see Images bellow). Here is where the `SchedulerJobRunner` uses the `open_slots` property of the executor. <img width="2434" height="520" alt="image" src="https://github.com/user-attachments/assets/40aff00b-b7dc-478b-8e59-6818373b5588" /> <img width="2312" height="880" alt="image" src="https://github.com/user-attachments/assets/15ca0d29-8aaa-4af5-91c4-7ef3e1759b8c" /> This PR resolves the issue by addopting completed tasks to a different set, called `completed`, which resolves the negative open slots metric and the issue where in certain cases we run less tasks than we actually can. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
