Re: Proposal and plan: new TextIO features based on SDF

Jean-Baptiste Onofré Sat, 24 Jun 2017 11:52:06 -0700

Thanks Eugene

I will pick up some.


Regards
JB

On Jun 24, 2017, 20:00, at 20:00, Eugene Kirpichov 
<kirpic...@google.com.INVALID> wrote:
>Filed JIRAs for the proposed features and linked with the doc:
>https://issues.apache.org/jira/browse/BEAM-2511 TextIO should support
>reading a PCollection of filenames
>https://issues.apache.org/jira/browse/BEAM-2512 TextIO should support
>watching for new files
>https://issues.apache.org/jira/browse/BEAM-2513 TextIO should support
>watching files for new entries
>
>On Fri, Jun 23, 2017 at 4:32 PM Eugene Kirpichov <kirpic...@google.com>
>wrote:
>
>> Hi all,
>>
>> I've written up a proposal for incrementally delivering a bunch of
>useful
>> new features in TextIO based on Splittable DoFn. It's applicable to
>other
>> file-based connectors, TextIO is just one good example. Let me know
>what
>> you think!
>>
>> https://s.apache.org/textio-sdf
>>
>> Copy of abstract:
>>
>> Users have often expressed interest in several new features for
>reading
>> files - in particular, incremental reading of log files (streaming of
>new
>> files matching a pattern and new entries in each file) and reading a
>> PCollection of filenames (in particular, an unbounded collection
>arriving
>> from a stream such as PubSub or Kafka).
>>
>> Splittable DoFn <http://s.apache.org/splittable-do-fn> (SDF) enables
>> these features. This document proposes an API for them, using the
>example
>> of TextIO, and proposes and a plan for delivering them subject to
>> availability of SDF in different runners. Some availability
>constraints are
>> circumvented by Running Splittable DoFn via Source API
>> <http://s.apache.org/sdf-via-source>.
>>
>> TL;DR Read a collection of filepatterns arriving on PubSub via
>> files.apply(TextIO.readEach()). Tail a filepattern via
>> TextIO.read().watchForNewFiles().watchFilesForNewEntries(). Coming to
>a
>> Beam SDK near you in small pieces.
>>
>> I think I'm gonna start working on the first steps of the proposed
>plan,
>> in parallel with this discussion, because I'm excited :)
>>

Re: Proposal and plan: new TextIO features based on SDF

Reply via email to