zhaorui2022 opened a new issue, #68585: URL: https://github.com/apache/airflow/issues/68585
### Under which category would you file this issue? Providers ### Apache Airflow version 3.2.2, also available in older 3.x versions ### What happened and how to reproduce it? 1. Set up an environment using s3 as the remote log backend 2. Run any task (EmptyOperator would also work) 3. Inspect the open lineage event, and s3 log file URIs are added as task outputs ### What you think should happen instead? S3 logs are not task outputs, and they should not be added as task outputs in openlineage events. The main cause is probably: 1. S3Hook is used to upload logs to s3 https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/log/s3_task_handler.py#L136 2. Inside S3Hook, load_string always emits openlineage events, in this case, task logs https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/hooks/s3.py#L1248-L1250 ### Operating System _No response_ ### Deployment None ### Apache Airflow Provider(s) amazon ### Versions of Apache Airflow Providers 9.29.0, available in earlier versions as well ### Official Helm Chart version Not Applicable ### Kubernetes Version _No response_ ### Helm Chart configuration _No response_ ### Docker Image customizations Using the official 3.2.2 docker image ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
