[ https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Demoor updated HADOOP-11183: ----------------------------------- Attachment: HADOOP-11183-005.patch Marked as unstable. The underlying httpclient retries retriable errors and you can control it though fs.s3a.attempts.maximum Did some more investigation and the dominant time it takes for failure is the dns failing to resolve. After that the subsequent parts fail fast in line with what is set in fs.s3a.establish.timeout. So the fail-fast I had in mind (and have implemented) seems premature optimization. Have been testing the current code for some time so I think we shouldn't take the risk to put fail-fast in so close to 2.7, I'll open up a separate jira for fail-fast. Added site and core-default documentation. While passing by I corrected the description of the connection timeouts: they are defined in milliseconds, not seconds. > Memory-based S3AOutputstream > ---------------------------- > > Key: HADOOP-11183 > URL: https://issues.apache.org/jira/browse/HADOOP-11183 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.6.0 > Reporter: Thomas Demoor > Assignee: Thomas Demoor > Attachments: HADOOP-11183-004.patch, HADOOP-11183-005.patch, > HADOOP-11183.001.patch, HADOOP-11183.002.patch, HADOOP-11183.003.patch, > design-comments.pdf > > > Currently s3a buffers files on disk(s) before uploading. This JIRA > investigates adding a memory-based upload implementation. > The motivation is evidently performance: this would be beneficial for users > with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on > an S3-compatible object store (FYI: my contributions are made in name of > Amplidata). -- This message was sent by Atlassian JIRA (v6.3.4#6332)