GitHub user jkff opened a pull request: https://github.com/apache/beam/pull/3607
[BEAM-2512] Introduces TextIO.read/readAll().watchForNewFiles() https://issues.apache.org/jira/browse/BEAM-2512 Part of http://s.apache.org/textio-sdf, based on http://s.apache.org/beam-watch-transform. This PR includes https://github.com/apache/beam/pull/3565 - reviewer should look only at the other commit. Also, requires https://github.com/apache/beam/pull/3598 to properly support read().from(ValueProvider).watchForNewFiles() - this PR should be submitted only after both of the PRs above. R: @reuvenlax You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkff/incubator-beam textio-read-watch-new-files Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3607.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3607 ---- commit 3103f9438a9fc392dfa1c37ceac990fc43c2ab98 Author: Eugene Kirpichov <kirpic...@google.com> Date: 2017-07-20T02:50:03Z [BEAM-2623] Introduces Watch transform The transform watches for new elements in a family of growing sets. See design at http://s.apache.org/beam-watch-transform As part of the implementation, I found and fixed a bug in tracking the watermark in OutputAndTimeBoundedSplittableProcessElementInvoker. The watermark must be captured at the moment checkpoint is taken, because it describes timestamps of elements output from the checkpoint. I also made direct runner by default checkpoint SDF's every 100 elements rather than every 10000, to make it more aggressive - that's what uncovered the bug above. commit c977d606f14a59ed73acf22f32a6b250d89c0ccd Author: Eugene Kirpichov <kirpic...@google.com> Date: 2017-07-20T23:58:42Z [BEAM-2512] Introduces TextIO.read/readAll().watchForNewFiles() Part of http://s.apache.org/textio-sdf, based on http://s.apache.org/beam-watch-transform. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---