ketozhang opened a new issue, #54964:
URL: https://github.com/apache/airflow/issues/54964

   ### Apache Airflow Provider(s)
   
   cncf-kubernetes
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-cncf-kubernetes==10.5.0
   
   ### Apache Airflow version
   
   v2.11.0
   
   ### Operating System
   
   Amazon Linux 2
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   K8s provider reports `FailedScheduling` as an ERROR level log when it is 
WARNING in K8s. This causes confusion for users as FailedScheduling events 
implies the task failed due to this error when K8s will happily attempt to 
retry scheduling until the pod TTL.
   
   ```
   [2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling 
- 0/14 nodes are available: waiting for ephemeral volume controller to create 
the persistentvolumeclaim "test-part1-vyt3ovcx-bigstorage". preemption: 0/14 
nodes are available: 14 Preemption is not helpful for scheduling.
   [2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling 
- Failed to schedule pod, incompatible with nodepool "high-
   [2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling 
- 0/14 nodes are available: 5 node(s) had untolerated taint {rfoo/component: 
bar}, 9 node(s) didn't match Pod's node affinity/selector. preemption: 0/14 
nodes are available: 14 Preemption is not helpful for scheduling.
   [2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling 
- 0/14 nodes are available: waiting for ephemeral volume controller to create 
the persistentvolumeclaim "test-part1-vyt3ovcx-bigstorage". preemption: 0/14 
nodes are available: 14 Preemption is not helpful for scheduling.
   [2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling 
- Failed to schedule pod, incompatible with nodepool "high-availability", 
daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, did not tolerate 
roman.ipac.caltech.edu/component=cm:NoSchedule; incompatible with nodepool 
"default", daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, no 
instance type satisfied resources {"cpu":"8180m","memory":"65656Mi","pods":"6"} 
and requirements karpenter.k8s.aws/instance-category In [m], 
karpenter.k8s.aws/instance-generation In [6], karpenter.sh/capacity-type In 
[on-demand], karpenter.sh/nodepool In [default], kubernetes.io/arch In [amd64], 
kubernetes.io/os In [linux], topology.kubernetes.io/zone In [us-east-1a 
us-east-1b] (no instance type has enough resources); incompatible with nodepool 
"al2023", daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, did 
not tolerate roman.ipac.caltech.edu/os=al2023:NoSchedule
   [2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling 
- 0/14 nodes are available: 5 node(s) had untolerated taint {rfoo/component: 
bar}, 9 node(s) didn't match Pod's node affinity/selector. preemption: 0/14 
nodes are available: 14 Preemption is not helpful for scheduling.
   ```
   
   ### What you think should happen instead
   
   The log level should be WARNING
   
   ### How to reproduce
   
   Set up a k8s cluster with a node with some taint. Create a KPO tasks without 
the toleration for the taint and `log_events_on_failure=True`:
   
   ```py
   from airflow.providers.cncf.kubernetes.operators.pod import 
KubernetesPodOperator
   
   with DAG(...) as dag:
       k = KubernetesPodOperator(
           task_id="dry_run_demo",
           image="debian",
           cmds=["bash", "-cx"],
           arguments=["echo", "10"],
           log_events_on_failure=True
       )
   ```
   
   ### Anything else
   
   This was partially addressed in 
https://github.com/apache/airflow/issues/36077 but did not address 
FailedScheduling Event type.
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to