[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Steve Loughran (JIRA) Mon, 27 Mar 2017 12:23:14 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-13786:
------------------------------------
    Attachment: HADOOP-13786-HADOOP-13345-021.patch

HADOOP-13786 patch 021: unifying data structures and s3 client use

This iteration is starting to move towards using the S3AFS writeOoperations, 
with the first step being: unified JSON persistence across alll committers
* New serializable type, MultiplePendingCommits , containing a list of 
SinglePendingCommit instances; the latter being the things the magic committer 
saves. 
* SinglePendingCommit adds fields/operations needed by staging committer. It's 
a bit ungainly right now; view as an interim step.
* Staging committers and tests all moved over to this datatype instead of 
seralized java stream. Good: debugging, security, validation logic. Bad: JSON 
serialization overhead.
* in the move, switches the various lists that the thread-pooled staging code 
buids up to being synchronized lists. I think there may have been risk of race 
conditions there. 

Other changes
* default unique name option == false
* tests can handle option of unique vs non-unique filenames
* and the partition committer skips the Mapper test. Doesn't make sense.

Essentinally: the unique name algorithm doesn't work withy map tasks, as they 
expect a part-m-0000/ dir with children explicitly named "index" and "data". 
Adding unique names under index and data breaks this.

I'm still undecided about what the best default value is here, more insight and 
experimentation needed


> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, 
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch, 
> HADOOP-13786-HADOOP-13345-010.patch, HADOOP-13786-HADOOP-13345-011.patch, 
> HADOOP-13786-HADOOP-13345-012.patch, HADOOP-13786-HADOOP-13345-013.patch, 
> HADOOP-13786-HADOOP-13345-015.patch, HADOOP-13786-HADOOP-13345-016.patch, 
> HADOOP-13786-HADOOP-13345-017.patch, HADOOP-13786-HADOOP-13345-018.patch, 
> HADOOP-13786-HADOOP-13345-019.patch, HADOOP-13786-HADOOP-13345-020.patch, 
> HADOOP-13786-HADOOP-13345-021.patch, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

Reply via email to