[ https://issues.apache.org/jira/browse/BEAM-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166735#comment-16166735 ]
Luke Cwik commented on BEAM-2500: --------------------------------- Performing the multipart download/upload will become important as 5GiBs has limited use but start off implementing the simpler thing as multipart upload/download can come later. http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html Amazon supports an efficient copy operation if you specify "x-amz-copy-source" as a header where you don't need to upload the bytes and it just adds some metadata that points to the same set of bytes. Depending on which Amazon S3 Java library you use, they may or may not expose this flexibility. > Add support for S3 as a Apache Beam FileSystem > ---------------------------------------------- > > Key: BEAM-2500 > URL: https://issues.apache.org/jira/browse/BEAM-2500 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions > Reporter: Luke Cwik > Priority: Minor > Attachments: hadoop_fs_patch.patch > > > Note that this is for providing direct integration with S3 as an Apache Beam > FileSystem. > There is already support for using the Hadoop S3 connector by depending on > the Hadoop File System module[1], configuring HadoopFileSystemOptions[2] with > a S3 configuration[3]. > 1: https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system > 2: > https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53 > 3: https://wiki.apache.org/hadoop/AmazonS3 -- This message was sent by Atlassian JIRA (v6.4.14#64029)