[ 
https://issues.apache.org/jira/browse/BEAM-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173403#comment-16173403
 ] 

Jacob Marble commented on BEAM-2500:
------------------------------------

I see this warning thousands of times when reading from S3:

Sep 20, 2017 8:45:04 AM 
com.amazonaws.services.s3.internal.S3AbortableInputStream close
WARNING: Not all bytes were read from the S3ObjectInputStream, aborting HTTP 
connection. This is likely an error and may result in sub-optimal behavior. 
Request only the bytes you need via a ranged GET or drain the input stream 
after use.

It looks like TextIO requests bytes n through m, but only consumes less than 
m-n bytes, then closes the channel (channel wraps a stream). Am I wrong? Is m-n 
predictably small enough that I should drain the stream at close?

> Add support for S3 as a Apache Beam FileSystem
> ----------------------------------------------
>
>                 Key: BEAM-2500
>                 URL: https://issues.apache.org/jira/browse/BEAM-2500
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Luke Cwik
>            Priority: Minor
>         Attachments: hadoop_fs_patch.patch
>
>
> Note that this is for providing direct integration with S3 as an Apache Beam 
> FileSystem.
> There is already support for using the Hadoop S3 connector by depending on 
> the Hadoop File System module[1], configuring HadoopFileSystemOptions[2] with 
> a S3 configuration[3].
> 1: https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
> 2: 
> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
> 3: https://wiki.apache.org/hadoop/AmazonS3



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to