[ 
https://issues.apache.org/jira/browse/BEAM-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166741#comment-16166741
 ] 

Jacob Marble commented on BEAM-2500:
------------------------------------

Chamikara, thanks for your comment. I'll switch my implementation to multipart 
after I have something working, just got the simple 5GB version written. I'll 
also give closer consideration to the credentials question after I have the 
harder parts complete. For now, just using flags via PipelineOptions.

So I have completed enough of this to test it out, except one problem. S3 
requires the content length before writing any data, or else the client buffers 
the entire content in memory before writing. I have added contentLength to my 
S3CreateOptions, but how to set that value before S3FileSystem.create() is 
called?

> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>
>                 Key: BEAM-2500
>                 URL: https://issues.apache.org/jira/browse/BEAM-2500
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Luke Cwik
>            Priority: Minor
>         Attachments: hadoop_fs_patch.patch
>
>
> Note that this is for providing direct integration with S3 as an Apache Beam 
> FileSystem.
> There is already support for using the Hadoop S3 connector by depending on 
> the Hadoop File System module[1], configuring HadoopFileSystemOptions[2] with 
> a S3 configuration[3].
> 1: https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
> 2: 
> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
> 3: https://wiki.apache.org/hadoop/AmazonS3



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to