[ 
https://issues.apache.org/jira/browse/BEAM-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166735#comment-16166735
 ] 

Luke Cwik commented on BEAM-2500:
---------------------------------

Performing the multipart download/upload will become important as 5GiBs has 
limited use but start off implementing the simpler thing as multipart 
upload/download can come later.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
Amazon supports an efficient copy operation if you specify "x-amz-copy-source" 
as a header where you don't need to upload the bytes and it just adds some 
metadata that points to the same set of bytes. Depending on which Amazon S3 
Java library you use, they may or may not expose this flexibility.

> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>
>                 Key: BEAM-2500
>                 URL: https://issues.apache.org/jira/browse/BEAM-2500
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Luke Cwik
>            Priority: Minor
>         Attachments: hadoop_fs_patch.patch
>
>
> Note that this is for providing direct integration with S3 as an Apache Beam 
> FileSystem.
> There is already support for using the Hadoop S3 connector by depending on 
> the Hadoop File System module[1], configuring HadoopFileSystemOptions[2] with 
> a S3 configuration[3].
> 1: https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
> 2: 
> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
> 3: https://wiki.apache.org/hadoop/AmazonS3



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to