[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905498#comment-15905498
 ] 

Ryan Blue commented on HADOOP-13786:
------------------------------------

For the staging committer drawbacks, I think there's a clear path to avoid them.

The committer is not intended to instantiate its own S3Client. It does for 
testing, but when it is integrated with S3A it should be passed a configured 
client when it is instantiated, or should use package-local access to get one 
from the S3A FS object. In other words, the default {{findClient}} method 
shouldn't be used; we don't use it other than for testing. My intent was for 
S3A to have a {{FileSystem#newOutputCommitter(Path, JobContext)}} factory 
method. That way, the FS can pass its internal S3 client instead of 
instantiating two.

The storage on local disk isn't a requirement. We can replace that with an 
output stream that buffers in memory and sends parts to S3 when they are ready 
(we're planning on doing this eventually). This is just waiting on a stable API 
to rely on that can close a stream, but not commit data. Since the committer 
API right now expects tasks to create files underneath the work path, we'll 
have to figure out how tasks can get a multi-part stream that is committed 
later without using a different method.

We can also pass in a thread-pool if there is a better one to use. I think this 
is separate enough that it should be easy.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-13786
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13786
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, 
> HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch, 
> HADOOP-13786-HADOOP-13345-010.patch, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to