[ 
https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Demoor updated HADOOP-11183:
-----------------------------------
    Attachment: HADOOP-11183-005.patch

Marked as unstable. 

The underlying httpclient retries retriable errors and you can control it 
though fs.s3a.attempts.maximum

Did some more investigation and the dominant time it takes for failure is the 
dns failing to resolve. After that the subsequent parts fail fast in line with 
what is set in fs.s3a.establish.timeout. So the fail-fast I had in mind (and 
have implemented) seems premature optimization. Have been testing the current 
code for some time so I think we shouldn't take the risk to put fail-fast in so 
close to 2.7, I'll open up a separate jira for fail-fast.

Added site and core-default documentation. While passing by I corrected  the 
description of the connection timeouts: they are defined in milliseconds, not 
seconds.

> Memory-based S3AOutputstream
> ----------------------------
>
>                 Key: HADOOP-11183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11183
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Thomas Demoor
>            Assignee: Thomas Demoor
>         Attachments: HADOOP-11183-004.patch, HADOOP-11183-005.patch, 
> HADOOP-11183.001.patch, HADOOP-11183.002.patch, HADOOP-11183.003.patch, 
> design-comments.pdf
>
>
> Currently s3a buffers files on disk(s) before uploading. This JIRA 
> investigates adding a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users 
> with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on 
> an S3-compatible object store (FYI: my contributions are made in name of 
> Amplidata). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to