[ 
https://issues.apache.org/jira/browse/BEAM-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976952#comment-15976952
 ] 

Stephen Sisk commented on BEAM-2005:
------------------------------------

I don't want to derail this conversation, but I did have a couple other 
concerns - Beam's FileSystem has a copy() command, however I can't find a good 
analog in Hadoop's FileSystem. 
https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html 
shows lots of copy to/from local files, but no "copy between these two 
arbitrary paths". 

I also believe that since Beam FileSystem objects are configured via 
PipelineOptions, we need to pass a Hadoop Configuration through 
PipelineOptions. I think that's very solvable, but it does seem 
semi-complicated.

I'm going to open subtasks for discussion so we can discuss in separate threads.

> Add a Hadoop FileSystem implementation of Beam's FileSystem
> -----------------------------------------------------------
>
>                 Key: BEAM-2005
>                 URL: https://issues.apache.org/jira/browse/BEAM-2005
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
>
>
> Beam's FileSystem creates an abstraction for reading from files in many 
> different places. 
> We should add a Hadoop FileSystem implementation 
> (https://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileSystem.html)
>  - that would enable us to read from any file system that implements 
> FileSystem (including HDFS, azure, s3, etc..)
> I'm investigating this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to