Re: WAL on S3

Tathagata Das Fri, 18 Sep 2015 02:48:26 -0700

I dont think it would work with multipart upload either. The file is not
visible until the multipart download is explicitly closed. So even if each
write a part upload, all the parts are not visible until the multiple
download is closed.


TD

On Fri, Sep 18, 2015 at 1:55 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> > On 17 Sep 2015, at 21:40, Tathagata Das <t...@databricks.com> wrote:
> >
> > Actually, the current WAL implementation (as of Spark 1.5) does not work
> with S3 because S3 does not support flushing. Basically, the current
> implementation assumes that after write + flush, the data is immediately
> durable, and readable if the system crashes without closing the WAL file.
> This does not work with S3 as data is durable only and only if the S3 file
> output stream is cleanly closed.
> >
>
>
> more precisely, unless you turn multipartition uploads on, the S3n/s3a
> clients Spark uses *doesn't even upload anything to s3*.
>
> It's not a filesystem, and you have to bear that in mind.
>
> Amazon's own s3 client used in EMR behaves differently; it may be usable
> as a destination (I haven't tested)
>
>

Re: WAL on S3

Reply via email to