ROOBALJINDAL opened a new issue, #67178:
URL: https://github.com/apache/airflow/issues/67178

   ### Under which category would you file this issue?
   
   Providers
   
   ### Apache Airflow version
   
   3.0.6
   
   ### What happened and how to reproduce it?
   
   We upgraded aws mwaa airflow from 2.7.2 to 3.0.6 and we noticed 1 random 
issue. While submitting jobs to emr serverless from our dags i.e. via 
EmrServerlessStartJobOperator, we see jobs are submitted fine to emr serverless 
and are finished in emr but task status is marked as failure in airflow dag's 
task. Out of 100 tasks, 98-99 proceed fine but we see random failures for 1 or 
2 tasks. We saw a pattern, it fails in 20-21seconds. Its completely random, not 
for particular task.
   
   Something is wrong with new version of airflow or might be some 
configuration is missing from our end
   
   Requirements.txt for airflow of both versions
   **Airflow 3.0.6**
   ```
   --constraint "/usr/local/airflow/dags/constraints-3.11_spark_trino.txt"
   
   apache-airflow-providers-apache-spark==5.3.2
   apache-airflow-providers-amazon==9.12.0
   apache-airflow-providers-ssh==4.1.3
   types-paramiko==3.5.0.20250801
   sshtunnel==0.4.0
   requests==2.32.5
   orjson==3.11.2
   cachetools==5.5.2
   Authlib==1.6.2
   apache-airflow-providers-apache-livy==4.4.2
   apache-airflow-providers-http==5.3.3
   confluent-kafka==2.11.1
   apache-airflow-providers-apache-kafka==1.10.2
   fastavro==1.12.0
   
   ```
    
   **Airflow 2.7.2**
   ```
   --constraint "/usr/local/airflow/dags/constraints-3.7_spark_trino.txt"
   
   apache-airflow-providers-apache-spark==3.0.0
   apache-airflow-providers-amazon==6.0.0
   apache-airflow-providers-ssh==3.2.0
   types-paramiko==2.11.6
   sshtunnel==0.4.0
   requests==2.28.1
   apache-airflow-providers-apache-livy==3.1.0
   apache-airflow-providers-http==4.0.0
   ```
   
   Following are the logs of the task which fails randomly
   ```
   Reading remote log from Cloudwatch log_group: 
arn:aws:logs:xxxxx:log-group:airflow-abc-MwaaEnvironment-Task log_stream: 
dag_id=xxx/run_id=manual__2026-05-19T10_35_27.159729+00_00/task_id=mytaskid/attempt=1.log
   An error occurred (ResourceNotFoundException) when calling the GetLogEvents 
operation: The specified log stream does not exist.
   ```
   Ideally this error log should be printed for other tasks as well but I dont 
think its failing due to missing log stream in the cloud-watch. It even didnt 
print that job was submitted to EMR successfully as other tasks are doing.
   
   Do we know if its a known issue?
   
   ### What you think should happen instead?
   
   If job was submitted to emr successfully, task should reflect it and should 
proceed fine without any failure.
   
   ### Operating System
   
   _No response_
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==9.12.0
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   _No response_
   
   ### Helm Chart configuration
   
   _No response_
   
   ### Docker Image customizations
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to