potiuk edited a comment on pull request #17609: URL: https://github.com/apache/airflow/pull/17609#issuecomment-899055233
> Is there any advantage on saving the file locally in a temporary manner? I am wondering if it makes sense to just change the way it uploads the file to S3 without giving the option to store the temporary file in local system I think the main reason are implementation details of the `upload_fileobj`. It's not really obvious how the data is buffered while `upload_fileobj` runs so there might be significant memory usage during this operation. But the main reason is that from what I see the description of upload_fileobj, whenever possible it will use multiple threads and upload s3 object in parallel (which - I know for a fact) can speed up the s3 upload immensely (this is how S3 upload is designed). However (my guess but quite likely), this cannot be done if the "fileobj" does not provide "seek()" functionality. Looking how sftp get is implemented, it's fileobj does not allow seek, it can only read the file sequentially (this is how sftp protocol works I believe). It could only provide "seek" if it loaded the file entirely in memory first (but this would not be good for huge files). So if you have a fast (local network) sftp connection, downloading the file first and then uploading the local file might significantly speed up the transfer, as `upload_fileobj` will be able to utilise multiple threads to upload. That's moslty educated guess, but I think it's very likely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org