Multi-part upload with finalization seems like a good approach for this problem.
Bhavin Thaker. On Wed, Mar 7, 2018 at 7:45 AM Naveen Swamy <mnnav...@gmail.com> wrote: > Rahul, > IMO It is not Ok to write to a local file before streaming, you have to > consider security implications such as: > 1) will your local file be encrypted(encryption at rest) > 2) what happens if the process crashes, you will have to make sure the > local file is deleted in failure and process exit scenarios. > > My understanding is for multi part uploads it uses chunked transfer > encoding and for that you do not need to know the total size and only know > the chunked data size. > https://en.wikipedia.org/wiki/Chunked_transfer_encoding > > See this SO answer: > > https://stackoverflow.com/questions/8653146/can-i-stream-a-file-upload-to-s3-without-a-content-length-header > > Can you point to the literature that asks to know the total size. > > -Naveen > > > On Tue, Mar 6, 2018 at 10:34 PM, Rahul Huilgol <rahulhuil...@gmail.com> > wrote: > > > Hi Chris, > > > > S3 doesn't support append calls. They promote the use of multipart > uploads > > to upload large files in parallel, or when network reliability is an > issue. > > Writing like a stream does not seem to be the purpose of multipart > uploads. > > > > I looked into what the AWS SDK does (in Java). It buffers in memory > however > > large the file might be, and then uploads. I imagine this involves > > reallocating and copying the buffer to the larger buffer. There are few > > issues raised regarding this on the sdk repos like this > > <https://github.com/aws/aws-sdk-java/issues/474>. But this doesn't seem > to > > be something the SDKs can do anything about. People seem to be writing to > > temporary files and then uploading. > > > > Regards, > > Rahul > > > > On Tue, Mar 6, 2018 at 9:04 PM, Chris Olivier <cjolivie...@gmail.com> > > wrote: > > > > > it seems strange that s3 would make such a major restriction. there’s > > > literally no way to incrementally write a file without knowing the size > > > beforehand? some sort of separate append calls, maybe? > > > > > > On Tue, Mar 6, 2018 at 8:53 PM Rahul Huilgol <rahulhuil...@gmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > I have been looking at updating the authentication used by > S3FileSystem > > > in > > > > dmlc-core. Current code uses Signature version 2, which works only in > > the > > > > region us-east-1 now. We need to update the authentication scheme to > > use > > > > Signature version 4 (SIG4). > > > > > > > > I've submitted a PR <https://github.com/dmlc/dmlc-core/pull/378> to > > > change > > > > this for Reads. But I wanted to seek out thoughts on what to do for > > > Writes, > > > > as there is a potential problem. > > > > > > > > *How writes to S3 work currently:* > > > > Whenever s3filesystem's stream.write() is called, data is buffered. > > When > > > > the buffer is full, a request is made to S3. Since this can happen > > > multiple > > > > times, multipart upload feature is used. An upload id is created when > > > > stream is initialized. This upload id is used till the stream is > > closed. > > > > Default buffer size is 64MB. > > > > > > > > *Problem:* > > > > The new SIG4 authentication scheme changes how multipart uploads > work. > > > Such > > > > an upload now requires that we know the total size of data to be sent > > > (sum > > > > of sizes of all parts) when we create the first request itself. We > need > > > to > > > > pass the total size of payload as part of header. This is not > possible > > > > given that we don't know all the write calls beforehand. For > example, a > > > > call to save model's parameters makes 145 calls to the stream's > write. > > > > > > > > *Approach?* > > > > Is it okay to buffer it to a local file, and then upload this file to > > S3 > > > at > > > > the end? > > > > What use case do we have for writes to S3 generally? I believe we > would > > > > want to write params after training or logs. These wouldn't be too > > large > > > or > > > > frequent I imagine. What would you suggest? > > > > > > > > Appreciate your thoughts and suggestions. > > > > > > > > Thanks, > > > > Rahul Huilgol > > > > > > > > > > > > > > > -- > > Rahul Huilgol > > >