On Tue, Jan 30, 2018 at 11:25 AM Kenneth Knowles <k...@google.com> wrote:

> I've got some thoughts :-)
>
> Here is how I see the direction(s):
>
>  - Requirements to be relevant: known scale, SQL, retractions (required
> for correct answers)
>  - Core value-add: portability! I don't know that there is any other
> project ambitiously trying to run Python and Go on "every" data processing
> engine.
>  - Experiments: SDF and dynamic work rebalancing. Just like event time
> processing, when it matters to users these will become widespread and then
> Beam's runner can easily make the features portable.
>
> So let's do portability really well on all our most active runners. I have
> a radical proposal for how we should think about it:
>
>     A portable Beam runner should be defined to be a _service_ hosting the
> Beam job management APIs.
>
> In that sense, we have zero runners today. Even Dataflow is just a service
> hosting its own API with a client-side library for converting a Beam
> pipeline into a Dataflow pipeline. Re-orienting our thinking this way is
> not actually a huge change in code, but emphasizes:
>
>  - our "runners/core" etc should focus on making these services easy
> (Thomas G is doing great work here right now)
>  - a user selecting a runner should be thought of more as just pointing at
> a different endpoint
>  - our testing infrastructure should become much more service-oriented,
> standing these up even for local testing
>  - ditto Luke's point about making a crisp line of SDK/runner
> responsibility
>

+1, I like this perspective -- I think this would be really useful. If this
encompasses more than just running (eg., getting results/metrics/logs/etc.)
out of the pipelines, then it enables to treat Beam as a true abstraction
layer on top of the data processing service, and build their own
infrastructure around Beam rather than specializing.


>
> On Fri, Jan 26, 2018 at 12:58 PM, Lukasz Cwik <lc...@google.com> wrote:
>
>> 1) Instead of enabling it easier to write features I think more users
>> would care about being able to move their pipeline between different
>> runners and one of the key missing features is dynamic work rebalancing in
>> all runners (except Dataflow).
>> Also, portability is meant to help make a crisp line between what are the
>> responsibilities of the Runner and the SDK which would help make it easier
>> to write features in an SDK and to support features in Runners.
>>
>> 2) To realize portability there are a lot of JIRAs being tracked under
>> the portability label[1] that need addressing to be able to run an existing
>> pipeline in a portable manner before we even get to more advanced features.
>>
>> 1:
>> https://issues.apache.org/jira/browse/BEAM-3515?jql=project%20%3D%20BEAM%20AND%20labels%20%3D%20portability
>>
>> 3) Ben, do you want to design and run a couple of polls (similar to the
>> Java 8 poll) to get feedback from our users based upon the list of major
>> features being developed?
>>
>> 4) Yes, plenty. It would be worthwhile to have someone walk through the
>> open JIRAs and mark them with a label and also summarize what groups they
>> fall under as there are plenty of good ideas there.
>>
>> On Tue, Jan 23, 2018 at 5:25 PM, Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> In terms of features, I think a key thing we should focus on is making
>>> simple things simple. Beam is very powerful, but it doesn't always
>>> make easy things easy. Features like schema'd PCollections could go a
>>> long way here. Also fully fleshing out/smoothing our runner
>>> portability story is part of this too.
>>>
>>> For beam 3.x we could also reason about if there's any complexity that
>>> doesn't hold its weight (e.g. side inputs on CombineFns).
>>>
>>> On Mon, Jan 22, 2018 at 9:20 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>> > Hi Ben,
>>> >
>>> > about the "technical roadmap", we have a thread about "Beam 3.x
>>> roadmap".
>>> >
>>> > It already provides ideas for points 3 & 4.
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On 01/22/2018 09:15 PM, Ben Chambers wrote:
>>> >> Thanks Davor for starting the state of the project discussions [1].
>>> >>
>>> >>
>>> >> In this fork of the state of the project discussion, I’d like to
>>> start the
>>> >> discussion of the feature roadmap for 2018 (and beyond).
>>> >>
>>> >>
>>> >> To kick off the discussion, I think the features could be divided
>>> into several
>>> >> areas, as follows:
>>> >>
>>> >>  1.
>>> >>
>>> >>     Enabling Contributions: How do we make it easier to add new
>>> features to the
>>> >>     supported runners? Can we provide a common intermediate layer
>>> below the
>>> >>     existing functionality that features are translated to so that
>>> runners only
>>> >>     need to support the intermediate layer and new features only need
>>> to target
>>> >>     it? What other ways can we make it easier to contribute to the
>>> development
>>> >>     of Beam?
>>> >>
>>> >>  2.
>>> >>
>>> >>     Realizing Portability: What gaps are there in the promise of
>>> portability?
>>> >>     For example in [1] we discussed the fact that users must write
>>> per-runner
>>> >>     code to push system metrics from runners to their monitoring
>>> platform. This
>>> >>     limits their ability to actually change runners. Credential
>>> management for
>>> >>     different environments also falls into this category.
>>> >>
>>> >>  3.
>>> >>
>>> >>     Large Features: What major features (like Beam SQL, Beam Python,
>>> etc.) would
>>> >>     increase the Beam user base in 2018?
>>> >>
>>> >>  4.
>>> >>
>>> >>     Improvements: What small changes could make Beam more appealing
>>> to users?
>>> >>     Are there API improvements we could make or common mistakes we
>>> could detect
>>> >>     and/or prevent?
>>> >>
>>> >>
>>> >> Thanks in advance for participating in the discussion. I believe that
>>> 2018 could
>>> >> be a great year for Beam, providing easier, more complete runner
>>> portability and
>>> >> features that make Beam easier to use for everyone.
>>> >>
>>> >>
>>> >> Ben
>>> >>
>>> >>
>>> >> [1]
>>> >>
>>> https://lists.apache.org/thread.html/f750f288af8dab3f468b869bf5a3f473094f4764db419567f33805d0@%3Cdev.beam.apache.org%3E
>>> >>
>>> >> [2]
>>> >>
>>> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
>>> >
>>> > --
>>> > Jean-Baptiste Onofré
>>> > jbono...@apache.org
>>> > http://blog.nanthrax.net
>>> > Talend - http://www.talend.com
>>>
>>
>>
>

Reply via email to