I dont think it would work with multipart upload either. The file is not visible until the multipart download is explicitly closed. So even if each write a part upload, all the parts are not visible until the multiple download is closed.
TD On Fri, Sep 18, 2015 at 1:55 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > > On 17 Sep 2015, at 21:40, Tathagata Das <t...@databricks.com> wrote: > > > > Actually, the current WAL implementation (as of Spark 1.5) does not work > with S3 because S3 does not support flushing. Basically, the current > implementation assumes that after write + flush, the data is immediately > durable, and readable if the system crashes without closing the WAL file. > This does not work with S3 as data is durable only and only if the S3 file > output stream is cleanly closed. > > > > > more precisely, unless you turn multipartition uploads on, the S3n/s3a > clients Spark uses *doesn't even upload anything to s3*. > > It's not a filesystem, and you have to bear that in mind. > > Amazon's own s3 client used in EMR behaves differently; it may be usable > as a destination (I haven't tested) > >