[ 
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840542#comment-15840542
 ] 

ASF GitHub Bot commented on HADOOP-13560:
-----------------------------------------

Github user steveloughran commented on the issue:

    https://github.com/apache/hadoop/pull/130
  
    Can you comment on that in a JIRA, not a PR? Thanks


> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>             Fix For: 2.8.0, 3.0.0-alpha2
>
>         Attachments: HADOOP-13560-branch-2-001.patch, 
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, 
> HADOOP-13560-branch-2-004.patch
>
>
> There's two output stream mechanisms in Hadooop 2.7.x, neither of which 
> handle massive multi-GB files that well.
> "classic": buffer everything to HDD until to the close() operation; time to 
> close becomes O(data); as is available disk space. Fails to exploit exploit 
> idle bandwidth, and on EC2 VMs with not much HDD capacity (especially 
> completing with HDFS storage), can fill up the disk.
> {{S3AFastOutputStream}} uploads data in partition-sized blocks, buffering via 
> byte arrays. Avoids disk problems and as it writes as soon as the first 
> partition is ready, close() time is O(outstanding-data). However: needs 
> tuning to reduce amount of data buffered. Get it wrong, and the first clue 
> you get may be that the process goes OOM or is killed by YARN. Which is a 
> shame, as get it right and operations which generates lots of data, complete 
> much faster, including distcp.
> This patch proposes a new output stream, a successor to both, 
> {{S3ABlockOutputStream}}.
> # uses block upload model of S3AFastOutputStream
> # supports buffering via: HDD, heap and (recycled) byte buffer, offering a 
> choice between memory and HDD use. HDD: no OOM problems on small JVMs/need to 
> tune.
> # Uses the fast output stream mechanism of limiting queue size for data to 
> upload. Even when buffering via HDD, you may need to limit that use.
> # lots of instrumentation to see what's being written.
> # good defaults out the box (e.g buffer to HDD, partition size to strike a 
> good balance of early upload and scaleability)
> # robust against transient failures. The AWS SDK retries a PUT on failure; 
> the entire block may need to be replayed, so HDD input cannot be buffered via 
> {{java.io.BufferedInputStream}}. It has also surfaced in testing that if the 
> final commit of a multipart option fails, it isn't retried —at least in the 
> current SDK in use. Do that ourselves.
> # use roundrobin directory allocation, for most effective disk use
> # take an AWS SDK {{com.amazonaws.event.ProgressListener}} for progress 
> callbacks, giving more detail on the operation. (It actually takes a 
> {{org.apache.hadoop.util.Progressable}}, but if that also implements the AWS 
> interface, that is used instead.
> All of this to come with scale tests
> * generate large files using all buffer mechanisms
> * Do a large copy/rname and verify that the copy really works, including 
> metadata
> * be configurable with sizes up to muti-GB, which also means that the test 
> timeouts need to be configurable to match the time it can take.
> * As they are slow, make them optional, using the {{-Dscale}} switch to 
> enable.
> Verifying large file rename is important on its own, as it is needed for very 
> large commit operations for committers using rename
> The goal here is to implement a single, object stream which can be used for 
> all outputs, tuneable as to whether to use disk or memory, and on queue 
> sizes, but otherwise be all that's needed. We can do future development on 
> this, remove its predecessor {{S3AFastOutputStream}}, so keeping docs and 
> testing down, and leave the original {{S3AOutputStream}} alone for regression 
> testing/fallback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to