[ https://issues.apache.org/jira/browse/BEAM-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767937#comment-15767937 ]
Daniel Halperin commented on BEAM-1190: --------------------------------------- I do not think this is generally safe -- it may mask underlying bugs. For example, we should never invoke this code unless the filesystem is known be eventually list-consistent but consistent with stat. This change does not obviate the need for [BEAM-60] -- because users may want to go the other way, and expand the inconsistent list they get. I propose you package this logic up in whatever the new name for IOChannelUtils is as one of the things users can do in the code they run at expand-time. Bringing the user into the loop is also nice because it makes them deal with eventual consistency up front. We are burned a lot by users who don't realize what their globs really mean. > FileBasedSource should ignore files that matched the glob but don't exist > ------------------------------------------------------------------------- > > Key: BEAM-1190 > URL: https://issues.apache.org/jira/browse/BEAM-1190 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Eugene Kirpichov > > See user issue: > http://stackoverflow.com/questions/41251741/coping-with-eventual-consistency-of-gcs-bucket-listing > We should, after globbing the files in FileBasedSource, individually stat > every file and remove those that don't exist, to account for the possibility > that glob yielded non-existing files due to eventual consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)