mhaure-touze opened a new issue, #55368:
URL: https://github.com/apache/airflow/issues/55368
### Apache Airflow Provider(s)
cncf-kubernetes, amazon
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==9.12.0
apache-airflow-providers-cncf-kubernetes==10.7.0
### Apache Airflow version
2.10.3
### Operating System
amazon linux
### Deployment
Amazon (AWS) MWAA
### Deployment details
EksPodOperator which launch a pod on a EKS cluster v1.32
### What happened
1. the operator launch the pod
2. the triggerer pause the task as "DEFERRED"
3. the triggerer send a "running" event
4. the pod launch trigger_reentry method
5. some how the task wait for pod completion
6. task stay alive until a hearbeat timeout kill it
```
ip-172-29-129-41.eu-west-1.compute.internal
*** Reading remote log from Cloudwatch log_group:
airflow-data-eng-mwaa-env-Task log_stream:
dag_id=waititng/run_id=manual__2025-09-05T09_23_19.615897+00_00/task_id=waititng/attempt=10.log
2025-09-05T16:22:04.599378194Z
2025-09-05T16:22:04.653984267Z
2025-09-05T16:22:04.654174133Z
...
2025-09-05T23:50:51.065801031Z
2025-09-05T23:50:51.078800975Z
2025-09-05T23:50:51.125942120Z
[Invalid date] {local_task_job_runner.py:123} ▶ Pre task execution logs
[Invalid date] {base.py:84} INFO - Retrieving connection 'aws_eks_role'
[Invalid date] {baseoperator.py:416} WARNING - EksPodOperator.execute cannot
be called outside TaskInstance!
[Invalid date] {pod.py:1280} INFO - Building pod
waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-n48xxaw6 with labels: {'dag_id':
'waititng', 'task_id': 'waititng', 'run_id':
'manual__2025-09-05T092319.6158970000-3503ec696', 'kubernetes_pod_operator':
'True', 'try_number': '10'}
[Invalid date] {pod.py:572} INFO - Found matching pod
waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5 with labels
{'airflow_kpo_in_cluster': 'False', 'airflow_version': '2.10.3', 'component':
'singleuser-server', 'dag_id': 'waititng', 'kubernetes_pod_operator': 'True',
'run_id': 'manual__2025-09-05T092319.6158970000-3503ec696', 'task_id':
'waititng', 'try_number': '7'}
[Invalid date] {pod.py:573} INFO - `try_number` of task_instance: 10
[Invalid date] {pod.py:574} INFO - `try_number` of pod: 7
[Invalid date] {pod.py:584} INFO - Reusing existing pod
'waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5' (phase=Running,
reason=) since it is not terminated or evicted.
[Invalid date] {taskinstance.py:288} INFO - Pausing task as DEFERRED.
dag_id=waititng, task_id=waititng,
run_id=manual__2025-09-05T09:23:19.615897+00:00,
execution_date=20250905T092319, start_date=20250906T001827
[Invalid date] {taskinstance.py:340} ▶ Post task execution logs
[Invalid date] {pod.py:146} INFO - Checking pod
'waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5' in namespace
'namespace'.
[Invalid date] {triggerer_job_runner.py:631} INFO - Trigger
waititng/manual__2025-09-05T09:23:19.615897+00:00/waititng/-1/10 (ID 20) fired:
TriggerEvent<{'status': 'running', 'last_log_time': None, 'namespace':
'namespace', 'name': 'waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5',
'eks_cluster_name': 'cluster'}>
[Invalid date] {local_task_job_runner.py:123} ▶ Pre task execution logs
[Invalid date] {base.py:84} INFO - Retrieving connection 'aws_eks_role'
[Invalid date] {pod_manager.py:713} INFO - Pod
waiting-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5 has phase Running
[Invalid date] {pod_manager.py:713} INFO - Pod
waiting-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5 has phase Running
[Invalid date] {job.py:229} INFO - Heartbeat recovered after 71.80 seconds
[Invalid date] {local_task_job_runner.py:266} INFO - Task exited with return
code -9. For more information, see
https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#LocalTaskJob-killed
[Invalid date] {local_task_job_runner.py:245} ▲▲▲ Log group end
```
### What you think should happen instead
I am expecting the task to alternate between a running and deferred state
until pod completion/failure
- Operator mode is deferrable=true
- logging_interval is set to 600 seconds
### How to reproduce
```
import datetime
from airflow.decorators import dag
from airflow.providers.amazon.aws.operators.eks import EksPodOperator
@dag(
dag_id="wait",
start_date=datetime.datetime(2025, 8, 4),
schedule=None,
catchup=False,
)
def wait() -> None:
EksPodOperator(
task_id="wait",
aws_conn_id="aws_eks_role",
cluster_name="cluster,
deferrable=True,
namespace="namespace",
region="eu-west-1",
pod_name=f"chromium-{pipeline_config.pipeline_id}",
cmds=["/bin/sh", "-c"],
arguments=["while true; do echo 'sleeping...'; sleep 2; done"],
image="alpine:3.22.1",
on_finish_action="delete_pod",
poll_interval=60,
logging_interval=600,
)
wait()
```
### Anything else
_No response_
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]