Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

Ismaël Mejía Wed, 29 Nov 2017 05:19:58 -0800

It is good to see so much enthusiasm about the future of Beam
independently of the fact that we call it Beam 3 or no.


I have some doubts about the idea of a release per month, Apache
releases are designed to be slow-pace (via the 3-day voting process).
It is just a question that we have in the same month some holiday
period + some issues during the release that require two RCs and it
will easily take two weeks (of course I understand the will to improve
this considering our not so good statu quo of 6 weeks for the last two
votes). My point is that a monthly release can bring a ton of extra
work to validate every release, remember validating a release is not
just running the unit tests.

I want to add one idea to the wishlist for Beam in the future:

- We need to improve Beam’s monitorability in a unified way even if
this goes beyond the initial goals of the project because this is a
big pain point for Beam adopters. We need things like system metrics
and utilities to monitor what is going on with Beam pipelines in a
runner-agnostic way.

It would be nice to create JIRAs for the issues discussed in this
thread (that don’t exist yet) with this we can follow them and
categorize some sort of roadmap.


On Wed, Nov 29, 2017 at 7:05 AM, Romain Manni-Bucau
<[email protected]> wrote:
> Ps: forgot another wish: make usable beam sql. Today you need to add a fn
> before and after cause of that type breakage not consistent with the
> pipeline API. It would be nice to support pojo (extracted from the select
> fields or created from "views" like in jackson) bit not having to wrap the
> sql usage in multiple UDF would make it powerful and ready to use.
>
> Le 29 nov. 2017 07:01, "Romain Manni-Bucau" <[email protected]> a écrit
> :
>>
>> My user wishes - whatever version, it is just a number after all ;):
>>
>> - make coder usage simpler and consistent (PCollection TypeDescriptor and
>> Coder are duplicated in term of API)
>> - have a beam api (split from the sdk and internals and impl)
>> - have SDF supported by runners
>> - have a SDFRunner allowing to simulate the SDF lifecycle manually (same
>> for DoFn short term - see next point for the current issue)
>> - ensure classloader usage is consistent, ie any proxy is created into the
>> final artifact classloader (transform if custom, dofn/source/sdf otherwise)
>> - have a test compatibility kit (TCK) for runner. It would be a jar any
>> runner impl can import to run with surefire
>> - make IO configuration reflection friendly (get rid of the autovalue
>> pattern which is not industriablizable and allow pojo like classes or
>> alternatively support reading the conf from properties)
>> - support pipeline implicit option based on transform names to override
>> some attributes
>> - change runner implementations to let the bundle size have a pipeline
>> option defining an upper bound and not hardcode them arbitrarly - defaults
>> can stay the current ones
>> - better multi input/output support (just PCollection based and fully
>> wireable)
>> - a smoother pipeline API would be nice. I like hazelcast jet one for
>> instance
>>
>> Le 29 nov. 2017 03:29, "Robert Bradshaw" <[email protected]> a écrit :
>>>
>>> On Tue, Nov 28, 2017 at 9:48 AM, Reuven Lax <[email protected]> wrote:
>>> >
>>> > On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré <[email protected]>
>>> > wrote:
>>> >>
>>> >> Hi Reuven,
>>> >>
>>> >> Yes, I remember that we agreed on a release per month. However, we
>>> >> didn't
>>> >> do it before. I think the most important is not the period, it's more
>>> >> a
>>> >> stable pace. I think it's more interesting for our community to have
>>> >> "always" a release every two months, more than a tentative of a
>>> >> release
>>> >> every month that end later than that. Of course, if we can do both,
>>> >> it's
>>> >> perfect ;)
>>> >
>>> > Agree. A stable pace is the most important thing.
>>>
>>> +1, and I think everyone who's done a release is in favor of making it
>>> easier and more frequent. Someone should put together a proposal of
>>> easy things we can do to automate, etc.
>>>
>>> >> For Beam 3.x, I wasn't talking about breaking change, but more about
>>> >> "marketing" announcement. I think that, even if we don't break API,
>>> >> some
>>> >> features are "strong enough" to be "qualified" in a major version.
>>> >
>>> > Ah, good point. This doesn't stop us from checking in these new
>>> > features
>>> > into 2.x possibly tagged with an @Experimental flag. We can then use
>>> > 3.0 to
>>> > announce all these features more broadly, and remove @Experimental
>>> > tags.
>>> >
>>> > I would also like to see enterprise-ready BeamSQL and Java 7
>>> > deprecation on
>>> > the list for Beam 3.0
>>> >
>>> >>
>>> >> I think that any major idea & feature (breaking or not the API) are
>>> >> valuables for Beam 3.x (and it's a good sign for our community again
>>> >> ;)).
>>>
>>> I'm generally not a fan of bumping the major version number just
>>> because enough time has passed, or enough new features have gone in
>>> (and am mostly opposed to holding features back just because we want
>>> to announce them (simultanously?) in a big release)--instead I find
>>> that the need for a new major version arises out of a realization that
>>> the model has sufficiently changed and we need to cut ties with the
>>> old way of doing things (that's perhaps holding us back). That being
>>> said, it could be that some of these features are large enough to
>>> merit this.
>>>
>>> Regardless of the naming, I think it's a great time to have a
>>> discussion of where we want to go in 2018.
>>>
>>> Top of my list is first class support for Schema'd PCollections (and
>>> with it SQL support, etc.) and full support of the portability
>>> framework realizing the possibility of every runner running every SDK
>>> (and, ideally, even cross-SDK/language pipelines). I would also like
>>> to see explorations into interactive/incremental (for Python at least,
>>> but probably Java as well).
>>>
>>> - Robert
>>>
>>>
>>> >> On 11/28/2017 06:09 PM, Reuven Lax wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré
>>> >>> <[email protected]
>>> >>> <mailto:[email protected]>> wrote:
>>> >>>
>>> >>>     Hi guys,
>>> >>>
>>> >>>     Even if there's no rush, I think it would be great for the
>>> >>> community
>>> >>> to have
>>> >>>     a better view on our roadmap and where we are going in term of
>>> >>> schedule.
>>> >>>
>>> >>>     I would like to discuss the following:
>>> >>>     - a best effort to maintain a good release pace or at least
>>> >>> provide a
>>> >>> rough
>>> >>>     schedule. For instance, in Apache Karaf, I have a release
>>> >>> schedule
>>> >>>     (http://karaf.apache.org/download.html#container-schedule
>>> >>>     <http://karaf.apache.org/download.html#container-schedule>). I
>>> >>> think
>>> >>> a
>>> >>>     release ~ every quarter would be great.
>>> >>>
>>> >>>
>>> >>> Originally we had stated that we wanted monthly releases of Beam. So
>>> >>> far
>>> >>> the releases have been painful enough that monthly hasn't happened. I
>>> >>> think
>>> >>> we should address these issues and go to monthly releases as
>>> >>> originally
>>> >>> stated.
>>> >>>
>>> >>>     - if I see new Beam 2.x releases for sure (according to the
>>> >>> previous
>>> >>> point),
>>> >>>     it would be great to have discussion about Beam 3.x. I think that
>>> >>> one
>>> >>> of
>>> >>>     interesting new feature that Beam 3.x can provide is around
>>> >>> PCollection with
>>> >>>     Schemas. It's something that we started to discuss with Reuven
>>> >>> and
>>> >>> Eugene.
>>> >>>     In term of schedule,
>>> >>>
>>> >>>
>>> >>> I don't think schemas require Beam 3.0 - I think we can introduce
>>> >>> them
>>> >>> without making breaking changes. However there are many other
>>> >>> features that
>>> >>> would be very interesting for Beam 3.x, and we should start putting
>>> >>> together
>>> >>> a list of them. I
>>> >>>
>>> >>>
>>> >>>     I would love to see your thoughts & ideas about releases schedule
>>> >>> and
>>> >>> Beam 3.x.
>>> >>>
>>> >>>     Regards
>>> >>>     JB
>>> >>>     --     Jean-Baptiste Onofré
>>> >>>     [email protected] <mailto:[email protected]>
>>> >>>     http://blog.nanthrax.net
>>> >>>     Talend - http://www.talend.com
>>> >>>
>>> >>>
>>> >>
>>> >> --
>>> >> Jean-Baptiste Onofré
>>> >> [email protected]
>>> >> http://blog.nanthrax.net
>>> >> Talend - http://www.talend.com
>>> >
>>> >

Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

Reply via email to