mjsqu opened a new issue, #33711: URL: https://github.com/apache/airflow/issues/33711
### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened Running on MWAA v2.5.1 with EcsRunTaskOperator upgraded to v8.3.0 All `EcsRunTaskOperator` tasks appear to 'detach' from the underlying ECS Task after 10 minutes. Running a command: ``` sleep 800 ``` results in: ``` [2023-08-25, 10:15:12 NZST] {{ecs.py:533}} INFO - EcsOperator overrides: {'containerOverrides': [{'name': 'meltano', 'command': ['sleep', '800']}]} ... [2023-08-25, 10:15:13 NZST] {{ecs.py:651}} INFO - ECS task ID is: b2681954f66148e8909d5e74c4b94c1a [2023-08-25, 10:15:13 NZST] {{ecs.py:565}} INFO - Starting ECS Task Log Fetcher [2023-08-25, 10:15:43 NZST] {{base_aws.py:554}} WARNING - Unable to find AWS Connection ID 'aws_ecs', switching to empty. [2023-08-25, 10:15:43 NZST] {{base_aws.py:160}} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name='ap-southeast-2'). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html [2023-08-25, 10:25:13 NZST] {{taskinstance.py:1768}} ERROR - Task failed with exception Traceback (most recent call last): File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 75, in wrapper return func(*args, session=session, **kwargs) File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/operators/ecs.py", line 570, in execute self._wait_for_task_ended() File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/operators/ecs.py", line 684, in _wait_for_task_ended waiter.wait( File "/usr/local/airflow/.local/lib/python3.10/site-packages/botocore/waiter.py", line 55, in wait Waiter.wait(self, **kwargs) File "/usr/local/airflow/.local/lib/python3.10/site-packages/botocore/waiter.py", line 388, in wait raise WaiterError( botocore.exceptions.WaiterError: Waiter TasksStopped failed: Max attempts exceeded ``` It appears to be caused by the addition of `waiter.wait` with different max_attempts (defaults to 100 instead of sys.maxsize - usually a very large number): ``` waiter.config.max_attempts = sys.maxsize # timeout is managed by airflow waiter.wait( cluster=self.cluster, tasks=[self.arn], WaiterConfig={ "Delay": self.waiter_delay, "MaxAttempts": self.waiter_max_attempts, }, ) ``` ### What you think should happen instead Set the default `waiter_max_attempts` in `EcsRunTaskOperator` to `sys.maxsize` to revert back to previous behaviour ### How to reproduce 1. You would need to set up ECS with a task definition, cluster, etc. 2. Assuming ECS is all setup - build a DAG with a EcsRunTaskOperator task 3. Run a task that should take more than 10 minutes, e.g. in `overrides` set `command` to `["sleep","800"]` 4. The Airflow task should fail while the ECS task should run for 800 seconds and complete successfully ### Operating System MWAA v2.5.1 Python 3.10 (Linux) ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==8.3.0 ### Deployment Amazon (AWS) MWAA ### Deployment details n/a ### Anything else n/a ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org