[jira] [Updated] (HADOOP-11183) Memory-based S3AOutputstream

Thomas Demoor (JIRA) Fri, 30 Jan 2015 08:25:58 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas Demoor updated HADOOP-11183:
-----------------------------------
    Attachment: HADOOP-11183.002.patch

002.patch: see 003.patch for basic functionality but this patch uses statistics 
collection similar to S3AOutputStream, which is different from what HDFS does. 
It counts bytes at the server side, thus double counting retried bytes and 
counts each partUpload of a MultiPartUpload as a write operation. 

(a related note: The swift fs code does not even seem to do write operation 
counting) 


> Memory-based S3AOutputstream
> ----------------------------
>
>                 Key: HADOOP-11183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11183
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Thomas Demoor
>            Assignee: Thomas Demoor
>         Attachments: HADOOP-11183.001.patch, HADOOP-11183.002.patch, 
> info-S3AFastOutputStream-sync.md
>
>
> Currently s3a buffers files on disk(s) before uploading. This JIRA 
> investigates adding a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users 
> with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on 
> an S3-compatible object store (FYI: my contributions are made in name of 
> Amplidata). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11183) Memory-based S3AOutputstream

Reply via email to