[ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905498#comment-15905498 ]
Ryan Blue commented on HADOOP-13786: ------------------------------------ For the staging committer drawbacks, I think there's a clear path to avoid them. The committer is not intended to instantiate its own S3Client. It does for testing, but when it is integrated with S3A it should be passed a configured client when it is instantiated, or should use package-local access to get one from the S3A FS object. In other words, the default {{findClient}} method shouldn't be used; we don't use it other than for testing. My intent was for S3A to have a {{FileSystem#newOutputCommitter(Path, JobContext)}} factory method. That way, the FS can pass its internal S3 client instead of instantiating two. The storage on local disk isn't a requirement. We can replace that with an output stream that buffers in memory and sends parts to S3 when they are ready (we're planning on doing this eventually). This is just waiting on a stable API to rely on that can close a stream, but not commit data. Since the committer API right now expects tasks to create files underneath the work path, we'll have to figure out how tasks can get a multi-part stream that is committed later without using a different method. We can also pass in a thread-pool if there is a better one to use. I think this is separate enough that it should be easy. > Add S3Guard committer for zero-rename commits to consistent S3 endpoints > ------------------------------------------------------------------------ > > Key: HADOOP-13786 > URL: https://issues.apache.org/jira/browse/HADOOP-13786 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 > Affects Versions: HADOOP-13345 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: HADOOP-13786-HADOOP-13345-001.patch, > HADOOP-13786-HADOOP-13345-002.patch, HADOOP-13786-HADOOP-13345-003.patch, > HADOOP-13786-HADOOP-13345-004.patch, HADOOP-13786-HADOOP-13345-005.patch, > HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-006.patch, > HADOOP-13786-HADOOP-13345-007.patch, HADOOP-13786-HADOOP-13345-009.patch, > HADOOP-13786-HADOOP-13345-010.patch, s3committer-master.zip > > > A goal of this code is "support O(1) commits to S3 repositories in the > presence of failures". Implement it, including whatever is needed to > demonstrate the correctness of the algorithm. (that is, assuming that s3guard > provides a consistent view of the presence/absence of blobs, show that we can > commit directly). > I consider ourselves free to expose the blobstore-ness of the s3 output > streams (ie. not visible until the close()), if we need to use that to allow > us to abort commit operations. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org