RNHTTR opened a new pull request, #27020: URL: https://github.com/apache/airflow/pull/27020
[BackfillJob](https://github.com/apache/airflow/blob/main/airflow/jobs/backfill_job.py#L836) sets `executor.job_id = "backfill"`, which results in a bug in the Celery executor's `stalled_task_timeout` feature. The query to select stalled tasks [is as follows](https://github.com/apache/airflow/blob/main/airflow/executors/celery_executor.py#L395-L398): ``` session.query(TaskInstance).filter( TaskInstance.filter_for_tis(keys), TaskInstance.state == State.QUEUED, TaskInstance.queued_by_job_id == self.job_id, ``` `TaskInstance.queued_by_job_id` [is an int](https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py#L452), but for backfill jobs, it is comparing against a string (i.e. `"backfill"`). This prevents stalled tasks from backfill jobs to remain stalled and not be retried. Setting `executor.job_id = self.id` (where `self.id` is `BaseJob`'s `id` field should resolve this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
