Re: [DISCUSS] Current ongoing work on runners

Manu Zhang Tue, 25 Oct 2016 18:25:34 -0700

We usually have docs sitting together with the source codes such that each
release has its own versioned docs. If the capability matrix is like other
codes, we can update it as we add new features. It also applies to other
docs like new IO. We could make it a requirement in the PR template.


Thanks,
Manu


On Wed, Oct 26, 2016 at 7:24 AM Thomas Weise <t...@apache.org> wrote:

> I'm planning to take up the discussion about Apex runner current state and
> proposed next steps in a separate thread.
>
> Thanks,
> Thomas
>
>
> On Tue, Oct 25, 2016 at 10:32 AM, Amit Sela <amitsel...@gmail.com> wrote:
>
> > SparkRunner status:
> >
> > V1 (Spark 1.6.x - DStream/RDD API):
> > *Batch*: Full model support for batch, continuous ROS testing setup is in
> > process now so that CI will validate constantly.
> > *Streaming*: Supporting UnboundedSource is in review
> > <https://github.com/apache/incubator-beam/pull/1143>, starting to work
> on
> > triggers and accumulation modes now.
> >
> > V2 (Spark 2.x - Dataset API):
> > This is on hold for now as Spark 2.0 - Dataset AP for streaming (AKA
> > "Structured Streaming") is marked Alpha.
> > In addition, there are still some basic properties in the Dataset API
> that
> > are missing and will be required to properly support Beam:
> >
> >    - Stateful operators.
> >    - Encoders (Spark's new schema-based coders) optimization support for
> >    classes that are a bit more sophisticated than POJO's (generics, inner
> >    classes, etc.).
> >
> > Also waiting to see if Watermarks and purging late/stale data will be
> > introduced in 2.1 (currently the Dataset grows indefinitely which is not
> > something acceptable for production applications).
> > Once this becomes more clear (2.1 release ?) I will get back to working
> on
> > this because in general the Dataset API is preferred as it is actually a
> > real unified API for batch and streaming (and the schema-based
> > optimizations are also interesting).
> >
> > I hope this gives a clear view of the SparkRunner status, feel free to
> ping
> > me for more details on the user/dev list or Slack.
> >
> > Thanks,
> > Amit
> >
> > On Tue, Oct 25, 2016 at 6:57 PM Aljoscha Krettek <aljos...@apache.org>
> > wrote:
> >
> > > I think we might need to update the capability matrix with some of the
> > new
> > > features that have popped up. Immediate things that come to mind are:
> > >  * Timer/State API for user DoFns (coupled with new-style DoFn) (not
> yet
> > > completely in master)
> > >  * SplittableDoFn
> > >
> > > This would allow tracking the process in each of these for each runner
> > and
> > > would not require hunting for that information in email threads.
> > >
> > > On Tue, 25 Oct 2016 at 08:12 Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> > >
> > > > +1. For me it's one of the most important point for the new website.
> We
> > > > should give a clear and exhaustive list of what we have, both for
> > runners
> > > > and IOs (with supported features).
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > ⁣
> > > >
> > > > On Oct 24, 2016, 21:52, at 21:52, "Ismaël Mejía" <ieme...@gmail.com>
> > > > wrote:
> > > > >Hello,
> > > > >
> > > > >I am really happy to see new runners been contributed to our
> community
> > > > >(e.g. GearPump and Apex recently). We have not discussed a lot about
> > > > >the
> > > > >current capabilities of both runners.
> > > > >
> > > > >Following the recent discussion about making ongoing work more
> > explicit
> > > > >in
> > > > >the mailing list, I would like to ask the people involved about the
> > > > >current
> > > > >status of them, I think it is important to discuss this (apart of
> > > > >creating
> > > > >the given JIRAs + updating the capability matrix docs) because more
> > > > >people
> > > > >can eventually jump and give a hand on open issues.
> > > > >
> > > > >I remember there was a google doc for the  capabilities of each
> > runner,
> > > > >is
> > > > >this doc still available (sorry I lost the link). I suppose that
> once
> > > > >these
> > > > >ongoing runners mature we can add this doc also to the website.
> > > > >https://beam.apache.org/learn/runners/capability-matrix/
> > > > >
> > > > >Regards,
> > > > >Ismaël
> > > > >
> > > > >ps. @Amit, given that the spark 2 (Dataset based) runner has also a
> > > > >feature
> > > > >branch, if you consider it worth, can you please share a bit about
> > that
> > > > >work too.
> > > > >
> > > > >ps2. Can anyone please share the link to the google doc I was
> talking
> > > > >about, I can't find it after the recent changes to the website.
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Current ongoing work on runners

Reply via email to