Re: [DISCUSS] State of the project: Feature roadmap for 2018

Kenneth Knowles Tue, 30 Jan 2018 11:25:39 -0800

I've got some thoughts :-)

Here is how I see the direction(s):


 - Requirements to be relevant: known scale, SQL, retractions (required for
correct answers)
 - Core value-add: portability! I don't know that there is any other
project ambitiously trying to run Python and Go on "every" data processing
engine.
 - Experiments: SDF and dynamic work rebalancing. Just like event time
processing, when it matters to users these will become widespread and then
Beam's runner can easily make the features portable.

So let's do portability really well on all our most active runners. I have
a radical proposal for how we should think about it:

    A portable Beam runner should be defined to be a _service_ hosting the
Beam job management APIs.

In that sense, we have zero runners today. Even Dataflow is just a service
hosting its own API with a client-side library for converting a Beam
pipeline into a Dataflow pipeline. Re-orienting our thinking this way is
not actually a huge change in code, but emphasizes:

 - our "runners/core" etc should focus on making these services easy
(Thomas G is doing great work here right now)
 - a user selecting a runner should be thought of more as just pointing at
a different endpoint
 - our testing infrastructure should become much more service-oriented,
standing these up even for local testing
 - ditto Luke's point about making a crisp line of SDK/runner responsibility

Kenn


On Fri, Jan 26, 2018 at 12:58 PM, Lukasz Cwik <[email protected]> wrote:

> 1) Instead of enabling it easier to write features I think more users
> would care about being able to move their pipeline between different
> runners and one of the key missing features is dynamic work rebalancing in
> all runners (except Dataflow).
> Also, portability is meant to help make a crisp line between what are the
> responsibilities of the Runner and the SDK which would help make it easier
> to write features in an SDK and to support features in Runners.
>
> 2) To realize portability there are a lot of JIRAs being tracked under the
> portability label[1] that need addressing to be able to run an existing
> pipeline in a portable manner before we even get to more advanced features.
>
> 1: https://issues.apache.org/jira/browse/BEAM-3515?jql=
> project%20%3D%20BEAM%20AND%20labels%20%3D%20portability
>
> 3) Ben, do you want to design and run a couple of polls (similar to the
> Java 8 poll) to get feedback from our users based upon the list of major
> features being developed?
>
> 4) Yes, plenty. It would be worthwhile to have someone walk through the
> open JIRAs and mark them with a label and also summarize what groups they
> fall under as there are plenty of good ideas there.
>
> On Tue, Jan 23, 2018 at 5:25 PM, Robert Bradshaw <[email protected]>
> wrote:
>
>> In terms of features, I think a key thing we should focus on is making
>> simple things simple. Beam is very powerful, but it doesn't always
>> make easy things easy. Features like schema'd PCollections could go a
>> long way here. Also fully fleshing out/smoothing our runner
>> portability story is part of this too.
>>
>> For beam 3.x we could also reason about if there's any complexity that
>> doesn't hold its weight (e.g. side inputs on CombineFns).
>>
>> On Mon, Jan 22, 2018 at 9:20 PM, Jean-Baptiste Onofré <[email protected]>
>> wrote:
>> > Hi Ben,
>> >
>> > about the "technical roadmap", we have a thread about "Beam 3.x
>> roadmap".
>> >
>> > It already provides ideas for points 3 & 4.
>> >
>> > Regards
>> > JB
>> >
>> > On 01/22/2018 09:15 PM, Ben Chambers wrote:
>> >> Thanks Davor for starting the state of the project discussions [1].
>> >>
>> >>
>> >> In this fork of the state of the project discussion, I’d like to start
>> the
>> >> discussion of the feature roadmap for 2018 (and beyond).
>> >>
>> >>
>> >> To kick off the discussion, I think the features could be divided into
>> several
>> >> areas, as follows:
>> >>
>> >>  1.
>> >>
>> >>     Enabling Contributions: How do we make it easier to add new
>> features to the
>> >>     supported runners? Can we provide a common intermediate layer
>> below the
>> >>     existing functionality that features are translated to so that
>> runners only
>> >>     need to support the intermediate layer and new features only need
>> to target
>> >>     it? What other ways can we make it easier to contribute to the
>> development
>> >>     of Beam?
>> >>
>> >>  2.
>> >>
>> >>     Realizing Portability: What gaps are there in the promise of
>> portability?
>> >>     For example in [1] we discussed the fact that users must write
>> per-runner
>> >>     code to push system metrics from runners to their monitoring
>> platform. This
>> >>     limits their ability to actually change runners. Credential
>> management for
>> >>     different environments also falls into this category.
>> >>
>> >>  3.
>> >>
>> >>     Large Features: What major features (like Beam SQL, Beam Python,
>> etc.) would
>> >>     increase the Beam user base in 2018?
>> >>
>> >>  4.
>> >>
>> >>     Improvements: What small changes could make Beam more appealing to
>> users?
>> >>     Are there API improvements we could make or common mistakes we
>> could detect
>> >>     and/or prevent?
>> >>
>> >>
>> >> Thanks in advance for participating in the discussion. I believe that
>> 2018 could
>> >> be a great year for Beam, providing easier, more complete runner
>> portability and
>> >> features that make Beam easier to use for everyone.
>> >>
>> >>
>> >> Ben
>> >>
>> >>
>> >> [1]
>> >> https://lists.apache.org/thread.html/f750f288af8dab3f468b869
>> bf5a3f473094f4764db419567f33805d0@%3Cdev.beam.apache.org%3E
>> >>
>> >> [2]
>> >> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f0
>> 5e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
>> >
>> > --
>> > Jean-Baptiste Onofré
>> > [email protected]
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
>>
>
>

Re: [DISCUSS] State of the project: Feature roadmap for 2018

Reply via email to