Thanks Eugene I will pick up some.
Regards JB On Jun 24, 2017, 20:00, at 20:00, Eugene Kirpichov <kirpic...@google.com.INVALID> wrote: >Filed JIRAs for the proposed features and linked with the doc: >https://issues.apache.org/jira/browse/BEAM-2511 TextIO should support >reading a PCollection of filenames >https://issues.apache.org/jira/browse/BEAM-2512 TextIO should support >watching for new files >https://issues.apache.org/jira/browse/BEAM-2513 TextIO should support >watching files for new entries > >On Fri, Jun 23, 2017 at 4:32 PM Eugene Kirpichov <kirpic...@google.com> >wrote: > >> Hi all, >> >> I've written up a proposal for incrementally delivering a bunch of >useful >> new features in TextIO based on Splittable DoFn. It's applicable to >other >> file-based connectors, TextIO is just one good example. Let me know >what >> you think! >> >> https://s.apache.org/textio-sdf >> >> Copy of abstract: >> >> Users have often expressed interest in several new features for >reading >> files - in particular, incremental reading of log files (streaming of >new >> files matching a pattern and new entries in each file) and reading a >> PCollection of filenames (in particular, an unbounded collection >arriving >> from a stream such as PubSub or Kafka). >> >> Splittable DoFn <http://s.apache.org/splittable-do-fn> (SDF) enables >> these features. This document proposes an API for them, using the >example >> of TextIO, and proposes and a plan for delivering them subject to >> availability of SDF in different runners. Some availability >constraints are >> circumvented by Running Splittable DoFn via Source API >> <http://s.apache.org/sdf-via-source>. >> >> TL;DR Read a collection of filepatterns arriving on PubSub via >> files.apply(TextIO.readEach()). Tail a filepattern via >> TextIO.read().watchForNewFiles().watchFilesForNewEntries(). Coming to >a >> Beam SDK near you in small pieces. >> >> I think I'm gonna start working on the first steps of the proposed >plan, >> in parallel with this discussion, because I'm excited :) >>