On Tue, Jan 30, 2018 at 11:25 AM Kenneth Knowles <k...@google.com> wrote:
> I've got some thoughts :-) > > Here is how I see the direction(s): > > - Requirements to be relevant: known scale, SQL, retractions (required > for correct answers) > - Core value-add: portability! I don't know that there is any other > project ambitiously trying to run Python and Go on "every" data processing > engine. > - Experiments: SDF and dynamic work rebalancing. Just like event time > processing, when it matters to users these will become widespread and then > Beam's runner can easily make the features portable. > > So let's do portability really well on all our most active runners. I have > a radical proposal for how we should think about it: > > A portable Beam runner should be defined to be a _service_ hosting the > Beam job management APIs. > > In that sense, we have zero runners today. Even Dataflow is just a service > hosting its own API with a client-side library for converting a Beam > pipeline into a Dataflow pipeline. Re-orienting our thinking this way is > not actually a huge change in code, but emphasizes: > > - our "runners/core" etc should focus on making these services easy > (Thomas G is doing great work here right now) > - a user selecting a runner should be thought of more as just pointing at > a different endpoint > - our testing infrastructure should become much more service-oriented, > standing these up even for local testing > - ditto Luke's point about making a crisp line of SDK/runner > responsibility > +1, I like this perspective -- I think this would be really useful. If this encompasses more than just running (eg., getting results/metrics/logs/etc.) out of the pipelines, then it enables to treat Beam as a true abstraction layer on top of the data processing service, and build their own infrastructure around Beam rather than specializing. > > On Fri, Jan 26, 2018 at 12:58 PM, Lukasz Cwik <lc...@google.com> wrote: > >> 1) Instead of enabling it easier to write features I think more users >> would care about being able to move their pipeline between different >> runners and one of the key missing features is dynamic work rebalancing in >> all runners (except Dataflow). >> Also, portability is meant to help make a crisp line between what are the >> responsibilities of the Runner and the SDK which would help make it easier >> to write features in an SDK and to support features in Runners. >> >> 2) To realize portability there are a lot of JIRAs being tracked under >> the portability label[1] that need addressing to be able to run an existing >> pipeline in a portable manner before we even get to more advanced features. >> >> 1: >> https://issues.apache.org/jira/browse/BEAM-3515?jql=project%20%3D%20BEAM%20AND%20labels%20%3D%20portability >> >> 3) Ben, do you want to design and run a couple of polls (similar to the >> Java 8 poll) to get feedback from our users based upon the list of major >> features being developed? >> >> 4) Yes, plenty. It would be worthwhile to have someone walk through the >> open JIRAs and mark them with a label and also summarize what groups they >> fall under as there are plenty of good ideas there. >> >> On Tue, Jan 23, 2018 at 5:25 PM, Robert Bradshaw <rober...@google.com> >> wrote: >> >>> In terms of features, I think a key thing we should focus on is making >>> simple things simple. Beam is very powerful, but it doesn't always >>> make easy things easy. Features like schema'd PCollections could go a >>> long way here. Also fully fleshing out/smoothing our runner >>> portability story is part of this too. >>> >>> For beam 3.x we could also reason about if there's any complexity that >>> doesn't hold its weight (e.g. side inputs on CombineFns). >>> >>> On Mon, Jan 22, 2018 at 9:20 PM, Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> > Hi Ben, >>> > >>> > about the "technical roadmap", we have a thread about "Beam 3.x >>> roadmap". >>> > >>> > It already provides ideas for points 3 & 4. >>> > >>> > Regards >>> > JB >>> > >>> > On 01/22/2018 09:15 PM, Ben Chambers wrote: >>> >> Thanks Davor for starting the state of the project discussions [1]. >>> >> >>> >> >>> >> In this fork of the state of the project discussion, I’d like to >>> start the >>> >> discussion of the feature roadmap for 2018 (and beyond). >>> >> >>> >> >>> >> To kick off the discussion, I think the features could be divided >>> into several >>> >> areas, as follows: >>> >> >>> >> 1. >>> >> >>> >> Enabling Contributions: How do we make it easier to add new >>> features to the >>> >> supported runners? Can we provide a common intermediate layer >>> below the >>> >> existing functionality that features are translated to so that >>> runners only >>> >> need to support the intermediate layer and new features only need >>> to target >>> >> it? What other ways can we make it easier to contribute to the >>> development >>> >> of Beam? >>> >> >>> >> 2. >>> >> >>> >> Realizing Portability: What gaps are there in the promise of >>> portability? >>> >> For example in [1] we discussed the fact that users must write >>> per-runner >>> >> code to push system metrics from runners to their monitoring >>> platform. This >>> >> limits their ability to actually change runners. Credential >>> management for >>> >> different environments also falls into this category. >>> >> >>> >> 3. >>> >> >>> >> Large Features: What major features (like Beam SQL, Beam Python, >>> etc.) would >>> >> increase the Beam user base in 2018? >>> >> >>> >> 4. >>> >> >>> >> Improvements: What small changes could make Beam more appealing >>> to users? >>> >> Are there API improvements we could make or common mistakes we >>> could detect >>> >> and/or prevent? >>> >> >>> >> >>> >> Thanks in advance for participating in the discussion. I believe that >>> 2018 could >>> >> be a great year for Beam, providing easier, more complete runner >>> portability and >>> >> features that make Beam easier to use for everyone. >>> >> >>> >> >>> >> Ben >>> >> >>> >> >>> >> [1] >>> >> >>> https://lists.apache.org/thread.html/f750f288af8dab3f468b869bf5a3f473094f4764db419567f33805d0@%3Cdev.beam.apache.org%3E >>> >> >>> >> [2] >>> >> >>> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E >>> > >>> > -- >>> > Jean-Baptiste Onofré >>> > jbono...@apache.org >>> > http://blog.nanthrax.net >>> > Talend - http://www.talend.com >>> >> >> >