[ https://issues.apache.org/jira/browse/BEAM-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197478#comment-16197478 ]
ASF GitHub Bot commented on BEAM-3030: -------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/beam/pull/3957 > watchForNewFiles() can emit a file multiple times if it's growing > ----------------------------------------------------------------- > > Key: BEAM-3030 > URL: https://issues.apache.org/jira/browse/BEAM-3030 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Eugene Kirpichov > Fix For: 2.3.0 > > > TextIO and AvroIO watchForNewFiles(), as well as > FileIO.match().continuously(), use Watch transform under the hood, and watch > the set of Metadata matching a filepattern. > Two Metadata's with the same filename but different size are not considered > equal, so if these transforms observe the same file multiple times with > different sizes, they'll read the file multiple times. > This is likely not yet a problem for production users, because these features > require SDF, it's supported only in Dataflow runner, and users of the > Dataflow runner are likely to use only files on GCS which doesn't support > appends. However, this needs to be fixed still. -- This message was sent by Atlassian JIRA (v6.4.14#64029)