psuslykov-godaddy opened a new issue, #54254:
URL: https://github.com/apache/airflow/issues/54254

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   8.16.0
   
   ### Apache Airflow version
   
   2.8.1
   
   ### Operating System
   
   Amazon Linux 2023
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   The BatchOperator from airflow.providers.amazon.aws.operators.batch  task is 
failing after successful job completion when it gets logs from the CloudWatch. 
This happens when running the `multi-node` aws batch job.
   
   Two issues:
   
   1. The 
[Batch.Client.describe_jobs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch/client/describe_jobs.html)
 returns different response schema for single and multi-node jobs. The [current 
code](https://github.com/apache/airflow/blob/367d8680af355b492f256ab86aa738f9ee292f2f/providers/amazon/src/airflow/providers/amazon/aws/hooks/batch_client.py#L474-L482)
 gets log settings only from the container overrides 
`attempts.[].container.logStreamName`, but there is also a 
`attempts.[].taskProperties.[].containers.[].logStreamName`, which is not 
handled.
   2. For the case above the `stream_names` is [None], and with None as single 
element, the length of the list is 1, which is true when checking it as `if not 
stream_names:` 
[here](https://github.com/apache/airflow/blob/367d8680af355b492f256ab86aa738f9ee292f2f/providers/amazon/src/airflow/providers/amazon/aws/hooks/batch_client.py#L499C9-L499C28)
   
   Both this issues combined leads to the logs links with None in 
awslogs_stream_name, and causes the error when generating the link to the logs:
   `[{'awslogs_stream_name': None, 'awslogs_group': '/aws/batch/job', 
'awslogs_region': 'us-west-2'}]`
   
   ### What you think should happen instead
   
   The error log:
   
   ```
   
   [2025-08-07, 18:12:07 MST] {{batch_client.py:293}} INFO - AWS Batch job 
(1c767a5f-8690-43cb-a793-5cbb7c5e133a) has completed
   [2025-08-07, 18:12:07 MST] {{batch.py:376}} INFO - AWS Batch job 
(1c767a5f-8690-43cb-a793-5cbb7c5e133a) CloudWatch Events details found. Links 
to logs:
   [2025-08-07, 18:12:07 MST] {{taskinstance.py:2698}} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 433, in _execute_task
       result = execute_callable(context=context, **execute_callable_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/batch.py",
 line 254, in execute
       self.monitor_job(context)
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/batch.py",
 line 379, in monitor_job
       self.log.info(link_builder.format_link(**log))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/links/logs.py",
 line 38, in format_link
       kwargs[field] = quote_plus(kwargs[field])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/urllib/parse.py", line 909, in quote_plus
       string = quote(string, safe + space, encoding, errors)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/urllib/parse.py", line 893, in quote
       return quote_from_bytes(string, safe)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/urllib/parse.py", line 923, in 
quote_from_bytes
       raise TypeError("quote_from_bytes() expected bytes")
   TypeError: quote_from_bytes() expected bytes
   [2025-08-07, 18:12:07 MST] {{taskinstance.py:1138}} INFO - Marking task as 
FAILED. dag_id=llm_batch_inference, task_id=llm_batch, 
execution_date=20250808T003906, start_date=20250808T010832, 
end_date=20250808T011207
   [2025-08-07, 18:12:07 MST] {{standard_task_runner.py:107}} ERROR - Failed to 
execute job 37068 for task llm_batch (quote_from_bytes() expected bytes; 61978)
   [2025-08-07, 18:12:07 MST] {{local_task_job_runner.py:234}} INFO - Task 
exited with return code 1
   [2025-08-07, 18:12:07 MST] {{taskinstance.py:3280}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   
   ```
   
   ### How to reproduce
   
   Create AWS Batch Job as 
   1. Create job definition
   2. Select "Amazon Elastic Compute Cloud (Amazon EC2)" in "Orchestration type"
   3. Select "Enable multi-node parallel" in "Job type"
   4. Next.
   5. In "Node ranges" you can define or not the logs settings. The 2 issues 
will cause this error whether the logs are present or not.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to