+1

This is a really nice doc and plan.

On Tue, Jun 27, 2017 at 1:49 AM, Aljoscha Krettek <[email protected]>
wrote:

> +1
>
> This sounds very good and there is a clear implementation path!
>
> > On 24. Jun 2017, at 20:55, Jean-Baptiste Onofré <[email protected]> wrote:
> >
> > Fair enough ;)
> >
> > Let me review the different Jira and provide some feedback.
> >
> > Regards
> > JB
> >
> > On Jun 24, 2017, 20:54, at 20:54, Eugene Kirpichov
> <[email protected]> wrote:
> >> Hi JB,
> >> I haven't yet thought about how this work can be parallelized. For now
> >> I'd
> >> like to just get feedback on the approach :)
> >> But glad that you're willing to help out - let's discuss this too a bit
> >> later!
> >>
> >> On Sat, Jun 24, 2017 at 11:51 AM Jean-Baptiste Onofré <[email protected]>
> >> wrote:
> >>
> >>> Thanks Eugene
> >>>
> >>> I will pick up some.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On Jun 24, 2017, 20:00, at 20:00, Eugene Kirpichov
> >>> <[email protected]> wrote:
> >>>> Filed JIRAs for the proposed features and linked with the doc:
> >>>> https://issues.apache.org/jira/browse/BEAM-2511 TextIO should
> >> support
> >>>> reading a PCollection of filenames
> >>>> https://issues.apache.org/jira/browse/BEAM-2512 TextIO should
> >> support
> >>>> watching for new files
> >>>> https://issues.apache.org/jira/browse/BEAM-2513 TextIO should
> >> support
> >>>> watching files for new entries
> >>>>
> >>>> On Fri, Jun 23, 2017 at 4:32 PM Eugene Kirpichov
> >> <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I've written up a proposal for incrementally delivering a bunch of
> >>>> useful
> >>>>> new features in TextIO based on Splittable DoFn. It's applicable
> >> to
> >>>> other
> >>>>> file-based connectors, TextIO is just one good example. Let me
> >> know
> >>>> what
> >>>>> you think!
> >>>>>
> >>>>> https://s.apache.org/textio-sdf
> >>>>>
> >>>>> Copy of abstract:
> >>>>>
> >>>>> Users have often expressed interest in several new features for
> >>>> reading
> >>>>> files - in particular, incremental reading of log files (streaming
> >> of
> >>>> new
> >>>>> files matching a pattern and new entries in each file) and reading
> >> a
> >>>>> PCollection of filenames (in particular, an unbounded collection
> >>>> arriving
> >>>>> from a stream such as PubSub or Kafka).
> >>>>>
> >>>>> Splittable DoFn <http://s.apache.org/splittable-do-fn> (SDF)
> >> enables
> >>>>> these features. This document proposes an API for them, using the
> >>>> example
> >>>>> of TextIO, and proposes and a plan for delivering them subject to
> >>>>> availability of SDF in different runners. Some availability
> >>>> constraints are
> >>>>> circumvented by Running Splittable DoFn via Source API
> >>>>> <http://s.apache.org/sdf-via-source>.
> >>>>>
> >>>>> TL;DR Read a collection of filepatterns arriving on PubSub via
> >>>>> files.apply(TextIO.readEach()). Tail a filepattern via
> >>>>> TextIO.read().watchForNewFiles().watchFilesForNewEntries(). Coming
> >> to
> >>>> a
> >>>>> Beam SDK near you in small pieces.
> >>>>>
> >>>>> I think I'm gonna start working on the first steps of the proposed
> >>>> plan,
> >>>>> in parallel with this discussion, because I'm excited :)
> >>>>>
> >>>
>
>

Reply via email to