[ 
https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767937#comment-15767937
 ] 

Daniel Halperin commented on BEAM-1190:
---------------------------------------

I do not think this is generally safe -- it may mask underlying bugs. For 
example, we should never invoke this code unless the filesystem is known be 
eventually list-consistent but consistent with stat.

This change does not obviate the need for [BEAM-60] -- because users may want 
to go the other way, and expand the inconsistent list they get. I propose you 
package this logic up in whatever the new name for IOChannelUtils is as one of 
the things users can do in the code they run at expand-time.

Bringing the user into the loop is also nice because it makes them deal with 
eventual consistency up front. We are burned a lot by users who don't realize 
what their globs really mean.

> FileBasedSource should ignore files that matched the glob but don't exist
> -------------------------------------------------------------------------
>
>                 Key: BEAM-1190
>                 URL: https://issues.apache.org/jira/browse/BEAM-1190
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>
> See user issue:
> http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing
> We should, after globbing the files in FileBasedSource, individually stat 
> every file and remove those that don't exist, to account for the possibility 
> that glob yielded non-existing files due to eventual consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to