orf opened a new issue, #40557: URL: https://github.com/apache/arrow/issues/40557
### Describe the bug, including details regarding any error messages, version, and platform. Running the following snippet shows that `open_output_stream()` initiates a multipart upload immediately, before anything is written. This is quite unexpected: I would expect that the `buffer_size` argument would ensure that a multipart upload is not initiated until at least 1,000 bytes are written. The issue with the current behaviour is that writing a single byte results in three requests to s3: one to create the multipart upload, one to upload the 1-byte part, and one to finish the multipart upload. This is very inefficient if you are writing a small file to S3, where a simple put object (without multipart uploading) would suffice. Using `background_writes=False` and `fs.copy_files(...)` with a local, "known-sized" small file also results in a multipart upload. While this behaviour keeps the implementation simple, it is surprising and I couldn't find [it described in the documentation anywhere](https://arrow.apache.org/docs/python/filesystems.html). ```python import time from pyarrow import fs fs.initialize_s3(fs.S3LogLevel.Debug) sfs = fs.S3FileSystem() with sfs.open_output_stream("a_bucket/test", buffer_size=1000): time.sleep(10) ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org