jason810496 commented on issue #45079: URL: https://github.com/apache/airflow/issues/45079#issuecomment-2556220144
> > Yes. That's exactly how I envisioned solving this problem. @dstandish ? > > IIRC this should be fine when task done but may present challenges when task is in flight because at any moment the location of the logs may shift eg from worker to remote storage etc Taking `S3TaskHandler` as an example, it requires additional refactoring and might need a `read_stream` method added to `S3Hook` that returns a generator-based result: https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/amazon/aws/log/s3_task_handler.py#L136-L192 From my perspective, for the `s3_write` case, I would merge the old log stream with the new log stream, flush the result into a temporary file, and use the [`upload_file`](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html) method to upload the file. This approach would help prevent memory starvation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
