I'm planning to take up the discussion about Apex runner current state and
proposed next steps in a separate thread.

Thanks,
Thomas


On Tue, Oct 25, 2016 at 10:32 AM, Amit Sela <amitsel...@gmail.com> wrote:

> SparkRunner status:
>
> V1 (Spark 1.6.x - DStream/RDD API):
> *Batch*: Full model support for batch, continuous ROS testing setup is in
> process now so that CI will validate constantly.
> *Streaming*: Supporting UnboundedSource is in review
> <https://github.com/apache/incubator-beam/pull/1143>, starting to work on
> triggers and accumulation modes now.
>
> V2 (Spark 2.x - Dataset API):
> This is on hold for now as Spark 2.0 - Dataset AP for streaming (AKA
> "Structured Streaming") is marked Alpha.
> In addition, there are still some basic properties in the Dataset API that
> are missing and will be required to properly support Beam:
>
>    - Stateful operators.
>    - Encoders (Spark's new schema-based coders) optimization support for
>    classes that are a bit more sophisticated than POJO's (generics, inner
>    classes, etc.).
>
> Also waiting to see if Watermarks and purging late/stale data will be
> introduced in 2.1 (currently the Dataset grows indefinitely which is not
> something acceptable for production applications).
> Once this becomes more clear (2.1 release ?) I will get back to working on
> this because in general the Dataset API is preferred as it is actually a
> real unified API for batch and streaming (and the schema-based
> optimizations are also interesting).
>
> I hope this gives a clear view of the SparkRunner status, feel free to ping
> me for more details on the user/dev list or Slack.
>
> Thanks,
> Amit
>
> On Tue, Oct 25, 2016 at 6:57 PM Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> > I think we might need to update the capability matrix with some of the
> new
> > features that have popped up. Immediate things that come to mind are:
> >  * Timer/State API for user DoFns (coupled with new-style DoFn) (not yet
> > completely in master)
> >  * SplittableDoFn
> >
> > This would allow tracking the process in each of these for each runner
> and
> > would not require hunting for that information in email threads.
> >
> > On Tue, 25 Oct 2016 at 08:12 Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> >
> > > +1. For me it's one of the most important point for the new website. We
> > > should give a clear and exhaustive list of what we have, both for
> runners
> > > and IOs (with supported features).
> > >
> > > Regards
> > > JB
> > >
> > > ⁣​
> > >
> > > On Oct 24, 2016, 21:52, at 21:52, "Ismaël Mejía" <ieme...@gmail.com>
> > > wrote:
> > > >Hello,
> > > >
> > > >I am really happy to see new runners been contributed to our community
> > > >(e.g. GearPump and Apex recently). We have not discussed a lot about
> > > >the
> > > >current capabilities of both runners.
> > > >
> > > >Following the recent discussion about making ongoing work more
> explicit
> > > >in
> > > >the mailing list, I would like to ask the people involved about the
> > > >current
> > > >status of them, I think it is important to discuss this (apart of
> > > >creating
> > > >the given JIRAs + updating the capability matrix docs) because more
> > > >people
> > > >can eventually jump and give a hand on open issues.
> > > >
> > > >I remember there was a google doc for the  capabilities of each
> runner,
> > > >is
> > > >this doc still available (sorry I lost the link). I suppose that once
> > > >these
> > > >ongoing runners mature we can add this doc also to the website.
> > > >https://beam.apache.org/learn/runners/capability-matrix/
> > > >
> > > >Regards,
> > > >Ismaël
> > > >
> > > >ps. @Amit, given that the spark 2 (Dataset based) runner has also a
> > > >feature
> > > >branch, if you consider it worth, can you please share a bit about
> that
> > > >work too.
> > > >
> > > >ps2. Can anyone please share the link to the google doc I was talking
> > > >about, I can't find it after the recent changes to the website.
> > > >​
> > >
> >
>

Reply via email to