On Wed, 23 Sep 2020 at 20:07, Igor Dvorzhak <i...@google.com.invalid> wrote:

> What will be the solution for object stores to have fast and correct
> commit algorithms?
>

https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_draft_006

There's a plugin point for you to add an explicit committer for gcs:

A key thing is: what atomic operations does your store have?


   1. HDFS has rename and create-no-overwrite
   2. S3 has only PUT/complete multipart upload, and no fail-if-exists
   checks




> On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
> <ste...@cloudera.com.invalid> wrote:
>
>> I've got a PR up to completely remove the v2 commit algorithm
>>
>> https://github.com/apache/hadoop/pull/2320
>>
>> That may seem overkill, but while *we* know there's a small window of risk
>> (task attempt 1 failing partway through a nonatomic commit), that's not
>> known/appreciated by others.
>>
>> The patch removes the v2 codepath from FileOutputCommitter, making it a
>> lot
>> less complicated, and when v2 is requested, a warning is printed and the
>> option ignored.
>>
>> Overkill? Maybe. But it guarantees correctness
>>
>

Reply via email to