florian-meyrueis-al opened a new issue, #60527:
URL: https://github.com/apache/airflow/issues/60527
### Apache Airflow Provider(s)
cncf-kubernetes
### Versions of Apache Airflow Providers
version 10.5.0.
Assuming it affects all version above
### Apache Airflow version
2.11.0, 3.x not tested but it's not an airflow core issue, so should be
affected too
### Operating System
Debian
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### What happened
In the scheduler logs we receive a series of errors like :
```
2026-01-11 19:11:33.092 | [2026-01-11T19:11:33.091+0000]
{kubernetes_executor_utils.py:98} ERROR - Unknown error in
KubernetesJobWatcher. Failing |
-- | -- | --
| | 2026-01-11 19:11:33.092 | Traceback (most recent call last): |
| | 2026-01-11 19:11:33.092 | File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 91, in run |
| | 2026-01-11 19:11:33.092 | self.resource_version = self._run( |
| | 2026-01-11 19:11:33.092 | ^^^^^^^^^^ |
| | 2026-01-11 19:11:33.092 | File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 171, in _run |
| | 2026-01-11 19:11:33.092 | self.process_status( |
| | 2026-01-11 19:11:33.092 | File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 249, in process_status |
| | 2026-01-11 19:11:33.092 |
container_status_state["waiting"]["reason"] |
| | 2026-01-11 19:11:33.092 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^ |
| | 2026-01-11 19:11:33.092 | KeyError: 'reason' |
| | 2026-01-11 19:11:33.093 | Process KubernetesJobWatcher-3: |
| | 2026-01-11 19:11:33.093 | Traceback (most recent call last): |
| | 2026-01-11 19:11:33.093 | File
"/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
|
| | 2026-01-11 19:11:33.093 | self.run() |
| | 2026-01-11 19:11:33.093 | File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 91, in run |
| | 2026-01-11 19:11:33.094 | self.resource_version = self._run( |
| | 2026-01-11 19:11:33.094 | ^^^^^^^^^^ |
| | 2026-01-11 19:11:33.094 | File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 171, in _run |
| | 2026-01-11 19:11:33.094 | self.process_status( |
| | 2026-01-11 19:11:33.094 | File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 249, in process_status |
| | 2026-01-11 19:11:33.094 |
container_status_state["waiting"]["reason"] |
| | 2026-01-11 19:11:33.094 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^ |
| | 2026-01-11 19:11:33.094 | KeyError: 'reason'
```
At this hour most of our dags are started and run for the all night.
In the morning, our monitoring of available slots showed this :
<img width="803" height="260" alt="Image"
src="https://github.com/user-attachments/assets/541877af-f0bc-4166-a63e-60e0a39bc72a"
/>
where the blue line is opened execution slots and green line is running
execution slots.
The problem is, at that hour (08:00 and after), no dags were running anymore
on airflow. Our only solution was to restart the scheduler to get all our
opened slots available
### What you think should happen instead
The kubernetesjobwatcher should not have crashed because a problem of
missing key in the K8s api response and the opened slot should all have been
released properly at the end of the dags.
### How to reproduce
I don't know.
### Anything else
The code of the kubernetes providers should handle correctly optional keys
answer from kubernetes.
It this case, k8s api do not enforce "reason" and "message" keys as required
in the specification of the ContainerStateWaiting object .
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]