Re: beam 3?

2018-03-21 Thread Romain Manni-Bucau
Probably too quickly done but can be used as a start here is a first list:
https://gist.github.com/rmannibucau/ab7543c23b6f57af921d98639fbcd436


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-03-21 20:49 GMT+01:00 Lukasz Cwik :

> Note that for the 2.0 release, we tracked this list of changes in JIRA
> under "backward-incompatible" labels.
>
> https://issues.apache.org/jira/browse/BEAM-2427?jql=
> project%20%3D%20BEAM%20AND%20labels%20in%20(backwards-
> incompatible%2C%20backwards-compatibility%2C%20backward-incompatible)
>
> We could do the same leading up to an eventual sprint towards producing 3.0
>
>
> On Wed, Mar 21, 2018 at 11:45 AM Robert Bradshaw 
> wrote:
>
>> I, too, think it's way to early to move master to 3.0.0. Especially if
>> this involves reworking everything from the API to runners, possibly from
>> scratch. Right now, I think the most important and urgent work is the many
>> efforts underway to fully realize the portability story. I'm also concerned
>> about fragmenting development effort (and the confusion for end users) if
>> we have a separate 2.x line.
>>
>> That being said, there's much we could cleanup and fix in Beam, and I
>> think it's a fine to have discussion to see what we would want to do at any
>> time. There have been a couple of discussions on this list, but it may be
>> worth summarizing these in a doc (rather than having them only be scattered
>> on the list). For each concrete item, it'd be worth clearly documenting
>> what the problem is, proposal(s) for fixing it, and why this can't be done
>> incrementally/in a backwards compatible way (if it can, I'd rather get
>> improvements in sooner and as part of a single mainline). Such a doc could
>> also be illuminating to why things are the way they are even when specific
>> changes are not pursued (e.g. are things so due to historical accident or
>> actual, if not obvious constraints).
>>
>> - Robert
>>
>>
>>
>> On Wed, Mar 21, 2018 at 10:57 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>>
>>>
>>> Le 21 mars 2018 18:25, "Lukasz Cwik"  a écrit :
>>>
>>> I think its immature to start a new major version at this point in time.
>>> * Apache Beam 2.x series is less then a year old.
>>> * Many features that users want can be built on top of the existing APIs.
>>>
>>>
>>> Oki, how do you see:
>>>
>>> 1. Stripping down the classpath to beam only jars until the sdk core
>>> (please no jackson, no snappy, no avro, no joda to cite only the bothering
>>> deps) and avoid to have fat jarsreally fat
>>> 2. Cleanup the promoted api and hide the stable but deprecated/old
>>> fashion ones (sdf as main extension point and hide sources for instance)
>>> 3. Start defining a clean lifecycle for any managed bean (linked to 2
>>> which gets rid of issues with sources and sinks + serialization hooks to
>>> ensure sdf can be replaced for serialization/environment purposes without
>>> rewriting the bytecode)
>>>
>>> (Just the issues i hit today to say where they are coming from)
>>>
>>> 3 can be worked around supporting a coder on dofn and sources/sinks
>>> (assuming runners support it) bit 1 and 2 require a design rethought and
>>> erasing to be doable.
>>>
>>>
>>> I think the issue is you maybe see beam as part of a runtile whereas it
>>> is part of a library which is embedded by nature so you need to be light,
>>> ewtremely extensible and not define any concept not mandatory like views.
>>> Beam doesnt respect any of these rules today and fails at its goal of
>>> portable and not intrusive api for this reason IMHO. This is why I think it
>>> is crucial to realize it now and restart correctly.
>>>
>>>
>>>
>>>
>>> On Wed, Mar 21, 2018 at 1:31 AM Jean-Baptiste Onofré 
>>> wrote:
>>>
 Hi,

 Starting from scratch is an option, but don't you think it's a huge
 effort ?
 Anyway, we will reuse part of the existing codebase.

 Let's see what the team is thinking.

 Regards
 JB

 On 03/21/2018 09:26 AM, Romain Manni-Bucau wrote:
 >
 >
 > 2018-03-21 9:13 GMT+01:00 Jean-Baptiste Onofré  >:
 >
 > Hi Romain,
 >
 > We didn't define a date yet.
 >
 > However, I think it makes lot of sense to think about that.
 >
 > What about creating a beam-2.x branch and move master version to
 > 3.0.0-SNAPSHOT ?
 >
 >
 > Do we want to "move" master or start fresh to avoid to pay again the
 legacy
 > which prevents us to move correctly forward ATM?
 > I really wonder if starting from scratch and only 

Re: beam 3?

2018-03-21 Thread Lukasz Cwik
Note that for the 2.0 release, we tracked this list of changes in JIRA
under "backward-incompatible" labels.

https://issues.apache.org/jira/browse/BEAM-2427?jql=project%20%3D%20BEAM%20AND%20labels%20in%20(backwards-incompatible%2C%20backwards-compatibility%2C%20backward-incompatible)

We could do the same leading up to an eventual sprint towards producing 3.0


On Wed, Mar 21, 2018 at 11:45 AM Robert Bradshaw 
wrote:

> I, too, think it's way to early to move master to 3.0.0. Especially if
> this involves reworking everything from the API to runners, possibly from
> scratch. Right now, I think the most important and urgent work is the many
> efforts underway to fully realize the portability story. I'm also concerned
> about fragmenting development effort (and the confusion for end users) if
> we have a separate 2.x line.
>
> That being said, there's much we could cleanup and fix in Beam, and I
> think it's a fine to have discussion to see what we would want to do at any
> time. There have been a couple of discussions on this list, but it may be
> worth summarizing these in a doc (rather than having them only be scattered
> on the list). For each concrete item, it'd be worth clearly documenting
> what the problem is, proposal(s) for fixing it, and why this can't be done
> incrementally/in a backwards compatible way (if it can, I'd rather get
> improvements in sooner and as part of a single mainline). Such a doc could
> also be illuminating to why things are the way they are even when specific
> changes are not pursued (e.g. are things so due to historical accident or
> actual, if not obvious constraints).
>
> - Robert
>
>
>
> On Wed, Mar 21, 2018 at 10:57 AM Romain Manni-Bucau 
> wrote:
>
>>
>>
>> Le 21 mars 2018 18:25, "Lukasz Cwik"  a écrit :
>>
>> I think its immature to start a new major version at this point in time.
>> * Apache Beam 2.x series is less then a year old.
>> * Many features that users want can be built on top of the existing APIs.
>>
>>
>> Oki, how do you see:
>>
>> 1. Stripping down the classpath to beam only jars until the sdk core
>> (please no jackson, no snappy, no avro, no joda to cite only the bothering
>> deps) and avoid to have fat jarsreally fat
>> 2. Cleanup the promoted api and hide the stable but deprecated/old
>> fashion ones (sdf as main extension point and hide sources for instance)
>> 3. Start defining a clean lifecycle for any managed bean (linked to 2
>> which gets rid of issues with sources and sinks + serialization hooks to
>> ensure sdf can be replaced for serialization/environment purposes without
>> rewriting the bytecode)
>>
>> (Just the issues i hit today to say where they are coming from)
>>
>> 3 can be worked around supporting a coder on dofn and sources/sinks
>> (assuming runners support it) bit 1 and 2 require a design rethought and
>> erasing to be doable.
>>
>>
>> I think the issue is you maybe see beam as part of a runtile whereas it
>> is part of a library which is embedded by nature so you need to be light,
>> ewtremely extensible and not define any concept not mandatory like views.
>> Beam doesnt respect any of these rules today and fails at its goal of
>> portable and not intrusive api for this reason IMHO. This is why I think it
>> is crucial to realize it now and restart correctly.
>>
>>
>>
>>
>> On Wed, Mar 21, 2018 at 1:31 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi,
>>>
>>> Starting from scratch is an option, but don't you think it's a huge
>>> effort ?
>>> Anyway, we will reuse part of the existing codebase.
>>>
>>> Let's see what the team is thinking.
>>>
>>> Regards
>>> JB
>>>
>>> On 03/21/2018 09:26 AM, Romain Manni-Bucau wrote:
>>> >
>>> >
>>> > 2018-03-21 9:13 GMT+01:00 Jean-Baptiste Onofré >> > >:
>>> >
>>> > Hi Romain,
>>> >
>>> > We didn't define a date yet.
>>> >
>>> > However, I think it makes lot of sense to think about that.
>>> >
>>> > What about creating a beam-2.x branch and move master version to
>>> > 3.0.0-SNAPSHOT ?
>>> >
>>> >
>>> > Do we want to "move" master or start fresh to avoid to pay again the
>>> legacy
>>> > which prevents us to move correctly forward ATM?
>>> > I really wonder if starting from scratch and only bringing stable and
>>> well
>>> > defined API wouldn't be saner after 10 months working with beam.
>>> >
>>> > Most of the actual logic will be importable easily when needed (I'm
>>> thinking to
>>> > the runner) but just bumping the major will keep the same pitfall in
>>> the
>>> > codebase which are very basic design issues IMHO.
>>> >
>>> >
>>> >
>>> > I almost agree with your point even if I would suggest you to use
>>> a more
>>> > positive tone: being sharp never encourage the community,
>>> contribution and don't
>>> > motivate people. You can say things but with some friendly form.
>>> >
>>> >
>>> > Read it as it is "I'm tired to always 

Re: beam 3?

2018-03-21 Thread Robert Bradshaw
I, too, think it's way to early to move master to 3.0.0. Especially if this
involves reworking everything from the API to runners, possibly from
scratch. Right now, I think the most important and urgent work is the many
efforts underway to fully realize the portability story. I'm also concerned
about fragmenting development effort (and the confusion for end users) if
we have a separate 2.x line.

That being said, there's much we could cleanup and fix in Beam, and I think
it's a fine to have discussion to see what we would want to do at any time.
There have been a couple of discussions on this list, but it may be worth
summarizing these in a doc (rather than having them only be scattered on
the list). For each concrete item, it'd be worth clearly documenting what
the problem is, proposal(s) for fixing it, and why this can't be done
incrementally/in a backwards compatible way (if it can, I'd rather get
improvements in sooner and as part of a single mainline). Such a doc could
also be illuminating to why things are the way they are even when specific
changes are not pursued (e.g. are things so due to historical accident or
actual, if not obvious constraints).

- Robert



On Wed, Mar 21, 2018 at 10:57 AM Romain Manni-Bucau 
wrote:

>
>
> Le 21 mars 2018 18:25, "Lukasz Cwik"  a écrit :
>
> I think its immature to start a new major version at this point in time.
> * Apache Beam 2.x series is less then a year old.
> * Many features that users want can be built on top of the existing APIs.
>
>
> Oki, how do you see:
>
> 1. Stripping down the classpath to beam only jars until the sdk core
> (please no jackson, no snappy, no avro, no joda to cite only the bothering
> deps) and avoid to have fat jarsreally fat
> 2. Cleanup the promoted api and hide the stable but deprecated/old fashion
> ones (sdf as main extension point and hide sources for instance)
> 3. Start defining a clean lifecycle for any managed bean (linked to 2
> which gets rid of issues with sources and sinks + serialization hooks to
> ensure sdf can be replaced for serialization/environment purposes without
> rewriting the bytecode)
>
> (Just the issues i hit today to say where they are coming from)
>
> 3 can be worked around supporting a coder on dofn and sources/sinks
> (assuming runners support it) bit 1 and 2 require a design rethought and
> erasing to be doable.
>
>
> I think the issue is you maybe see beam as part of a runtile whereas it is
> part of a library which is embedded by nature so you need to be light,
> ewtremely extensible and not define any concept not mandatory like views.
> Beam doesnt respect any of these rules today and fails at its goal of
> portable and not intrusive api for this reason IMHO. This is why I think it
> is crucial to realize it now and restart correctly.
>
>
>
>
> On Wed, Mar 21, 2018 at 1:31 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi,
>>
>> Starting from scratch is an option, but don't you think it's a huge
>> effort ?
>> Anyway, we will reuse part of the existing codebase.
>>
>> Let's see what the team is thinking.
>>
>> Regards
>> JB
>>
>> On 03/21/2018 09:26 AM, Romain Manni-Bucau wrote:
>> >
>> >
>> > 2018-03-21 9:13 GMT+01:00 Jean-Baptiste Onofré > > >:
>> >
>> > Hi Romain,
>> >
>> > We didn't define a date yet.
>> >
>> > However, I think it makes lot of sense to think about that.
>> >
>> > What about creating a beam-2.x branch and move master version to
>> > 3.0.0-SNAPSHOT ?
>> >
>> >
>> > Do we want to "move" master or start fresh to avoid to pay again the
>> legacy
>> > which prevents us to move correctly forward ATM?
>> > I really wonder if starting from scratch and only bringing stable and
>> well
>> > defined API wouldn't be saner after 10 months working with beam.
>> >
>> > Most of the actual logic will be importable easily when needed (I'm
>> thinking to
>> > the runner) but just bumping the major will keep the same pitfall in the
>> > codebase which are very basic design issues IMHO.
>> >
>> >
>> >
>> > I almost agree with your point even if I would suggest you to use a
>> more
>> > positive tone: being sharp never encourage the community,
>> contribution and don't
>> > motivate people. You can say things but with some friendly form.
>> >
>> >
>> > Read it as it is "I'm tired to always workaround the API" ;). Generally
>> I send a
>> > mail after some hard fight and disappointment so mea culpa for this one.
>> >
>> >
>> >
>> > I would add:
>> >
>> > - Schema or PCollection: it's already started but I think we could
>> do some
>> > improvements (potentially introducing some API change)
>> > - Hints/Annotations on PCollection: it's something we discussed
>> during Beam
>> > Summit with Tyler and others. The idea is to mimic the Message
>> Headers in Apache
>> > Camel. It would allow us to have more dynamic IOs and transforms,
>> and give 

Re: beam 3?

2018-03-21 Thread Romain Manni-Bucau
Le 21 mars 2018 18:25, "Lukasz Cwik"  a écrit :

I think its immature to start a new major version at this point in time.
* Apache Beam 2.x series is less then a year old.
* Many features that users want can be built on top of the existing APIs.


Oki, how do you see:

1. Stripping down the classpath to beam only jars until the sdk core
(please no jackson, no snappy, no avro, no joda to cite only the bothering
deps) and avoid to have fat jarsreally fat
2. Cleanup the promoted api and hide the stable but deprecated/old fashion
ones (sdf as main extension point and hide sources for instance)
3. Start defining a clean lifecycle for any managed bean (linked to 2 which
gets rid of issues with sources and sinks + serialization hooks to ensure
sdf can be replaced for serialization/environment purposes without
rewriting the bytecode)

(Just the issues i hit today to say where they are coming from)

3 can be worked around supporting a coder on dofn and sources/sinks
(assuming runners support it) bit 1 and 2 require a design rethought and
erasing to be doable.


I think the issue is you maybe see beam as part of a runtile whereas it is
part of a library which is embedded by nature so you need to be light,
ewtremely extensible and not define any concept not mandatory like views.
Beam doesnt respect any of these rules today and fails at its goal of
portable and not intrusive api for this reason IMHO. This is why I think it
is crucial to realize it now and restart correctly.




On Wed, Mar 21, 2018 at 1:31 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> Starting from scratch is an option, but don't you think it's a huge effort
> ?
> Anyway, we will reuse part of the existing codebase.
>
> Let's see what the team is thinking.
>
> Regards
> JB
>
> On 03/21/2018 09:26 AM, Romain Manni-Bucau wrote:
> >
> >
> > 2018-03-21 9:13 GMT+01:00 Jean-Baptiste Onofré  > >:
> >
> > Hi Romain,
> >
> > We didn't define a date yet.
> >
> > However, I think it makes lot of sense to think about that.
> >
> > What about creating a beam-2.x branch and move master version to
> > 3.0.0-SNAPSHOT ?
> >
> >
> > Do we want to "move" master or start fresh to avoid to pay again the
> legacy
> > which prevents us to move correctly forward ATM?
> > I really wonder if starting from scratch and only bringing stable and
> well
> > defined API wouldn't be saner after 10 months working with beam.
> >
> > Most of the actual logic will be importable easily when needed (I'm
> thinking to
> > the runner) but just bumping the major will keep the same pitfall in the
> > codebase which are very basic design issues IMHO.
> >
> >
> >
> > I almost agree with your point even if I would suggest you to use a
> more
> > positive tone: being sharp never encourage the community,
> contribution and don't
> > motivate people. You can say things but with some friendly form.
> >
> >
> > Read it as it is "I'm tired to always workaround the API" ;). Generally
> I send a
> > mail after some hard fight and disappointment so mea culpa for this one.
> >
> >
> >
> > I would add:
> >
> > - Schema or PCollection: it's already started but I think we could
> do some
> > improvements (potentially introducing some API change)
> > - Hints/Annotations on PCollection: it's something we discussed
> during Beam
> > Summit with Tyler and others. The idea is to mimic the Message
> Headers in Apache
> > Camel. It would allow us to have more dynamic IOs and transforms,
> and give some
> > additional statements to the runners.
> >
> >
> > +1
> >
> >
> >
> > I'm proposing to start a vote to create the 2.x branch and move
> master to Beam
> > 3.0.0-SNAPSHOT as join effort.
> >
> > Regards
> > JB
> >
> > On 03/21/2018 08:36 AM, Romain Manni-Bucau wrote:
> > > Hi guys,
> > >
> > > it got mentionned but without any concrete dates: when beam 3 work
> will be started?
> > >
> > > I'm very interested in:
> > >
> > > 1. reworking the whole DAG API to ensure it is instrumentable
> (today the dag
> > > uses a tons of static utilities and internals which makes it not
> > > industrializable at all as soon as you are just on top of beam)
> > > 2. reworking the API definition in its own module not coupled to
> any
> > > implementation details (api/provider design) and 100% based on the
> sdf
> > > 3. rework the overall serialization (coders + transform
> serialization which is
> > > hardcoded today and not portable or industrializable at all)
> > > 4. make runners decorable properly and not just forked each time
> you need to
> > > modify some behavior for a particular case
> > > (+ indeed all the issues we hit and saw on the list)
> > >
> > > Romain Manni-Bucau
> > > @rmannibucau  > > |  Blog
> > > 

Re: beam 3?

2018-03-21 Thread Lukasz Cwik
I think its immature to start a new major version at this point in time.
* Apache Beam 2.x series is less then a year old.
* Many features that users want can be built on top of the existing APIs.


On Wed, Mar 21, 2018 at 1:31 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> Starting from scratch is an option, but don't you think it's a huge effort
> ?
> Anyway, we will reuse part of the existing codebase.
>
> Let's see what the team is thinking.
>
> Regards
> JB
>
> On 03/21/2018 09:26 AM, Romain Manni-Bucau wrote:
> >
> >
> > 2018-03-21 9:13 GMT+01:00 Jean-Baptiste Onofré  > >:
> >
> > Hi Romain,
> >
> > We didn't define a date yet.
> >
> > However, I think it makes lot of sense to think about that.
> >
> > What about creating a beam-2.x branch and move master version to
> > 3.0.0-SNAPSHOT ?
> >
> >
> > Do we want to "move" master or start fresh to avoid to pay again the
> legacy
> > which prevents us to move correctly forward ATM?
> > I really wonder if starting from scratch and only bringing stable and
> well
> > defined API wouldn't be saner after 10 months working with beam.
> >
> > Most of the actual logic will be importable easily when needed (I'm
> thinking to
> > the runner) but just bumping the major will keep the same pitfall in the
> > codebase which are very basic design issues IMHO.
> >
> >
> >
> > I almost agree with your point even if I would suggest you to use a
> more
> > positive tone: being sharp never encourage the community,
> contribution and don't
> > motivate people. You can say things but with some friendly form.
> >
> >
> > Read it as it is "I'm tired to always workaround the API" ;). Generally
> I send a
> > mail after some hard fight and disappointment so mea culpa for this one.
> >
> >
> >
> > I would add:
> >
> > - Schema or PCollection: it's already started but I think we could
> do some
> > improvements (potentially introducing some API change)
> > - Hints/Annotations on PCollection: it's something we discussed
> during Beam
> > Summit with Tyler and others. The idea is to mimic the Message
> Headers in Apache
> > Camel. It would allow us to have more dynamic IOs and transforms,
> and give some
> > additional statements to the runners.
> >
> >
> > +1
> >
> >
> >
> > I'm proposing to start a vote to create the 2.x branch and move
> master to Beam
> > 3.0.0-SNAPSHOT as join effort.
> >
> > Regards
> > JB
> >
> > On 03/21/2018 08:36 AM, Romain Manni-Bucau wrote:
> > > Hi guys,
> > >
> > > it got mentionned but without any concrete dates: when beam 3 work
> will be started?
> > >
> > > I'm very interested in:
> > >
> > > 1. reworking the whole DAG API to ensure it is instrumentable
> (today the dag
> > > uses a tons of static utilities and internals which makes it not
> > > industrializable at all as soon as you are just on top of beam)
> > > 2. reworking the API definition in its own module not coupled to
> any
> > > implementation details (api/provider design) and 100% based on the
> sdf
> > > 3. rework the overall serialization (coders + transform
> serialization which is
> > > hardcoded today and not portable or industrializable at all)
> > > 4. make runners decorable properly and not just forked each time
> you need to
> > > modify some behavior for a particular case
> > > (+ indeed all the issues we hit and saw on the list)
> > >
> > > Romain Manni-Bucau
> > > @rmannibucau  > > |  Blog
> > >  https://rmannibucau.metawerx.net/>> |
> > Old Blog
> > >  http://rmannibucau.wordpress.com>>
> > | Github  https://github.com/rmannibucau>> |
> > > LinkedIn  > > | Book
> > >
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: beam 3?

2018-03-21 Thread Jean-Baptiste Onofré
Hi,

Starting from scratch is an option, but don't you think it's a huge effort ?
Anyway, we will reuse part of the existing codebase.

Let's see what the team is thinking.

Regards
JB

On 03/21/2018 09:26 AM, Romain Manni-Bucau wrote:
> 
> 
> 2018-03-21 9:13 GMT+01:00 Jean-Baptiste Onofré  >:
> 
> Hi Romain,
> 
> We didn't define a date yet.
> 
> However, I think it makes lot of sense to think about that.
> 
> What about creating a beam-2.x branch and move master version to
> 3.0.0-SNAPSHOT ?
> 
> 
> Do we want to "move" master or start fresh to avoid to pay again the legacy
> which prevents us to move correctly forward ATM?
> I really wonder if starting from scratch and only bringing stable and well
> defined API wouldn't be saner after 10 months working with beam.
> 
> Most of the actual logic will be importable easily when needed (I'm thinking 
> to
> the runner) but just bumping the major will keep the same pitfall in the
> codebase which are very basic design issues IMHO.
>  
> 
> 
> I almost agree with your point even if I would suggest you to use a more
> positive tone: being sharp never encourage the community, contribution 
> and don't
> motivate people. You can say things but with some friendly form.
> 
> 
> Read it as it is "I'm tired to always workaround the API" ;). Generally I 
> send a
> mail after some hard fight and disappointment so mea culpa for this one.
>  
> 
> 
> I would add:
> 
> - Schema or PCollection: it's already started but I think we could do some
> improvements (potentially introducing some API change)
> - Hints/Annotations on PCollection: it's something we discussed during 
> Beam
> Summit with Tyler and others. The idea is to mimic the Message Headers in 
> Apache
> Camel. It would allow us to have more dynamic IOs and transforms, and 
> give some
> additional statements to the runners.
> 
> 
> +1
>  
> 
> 
> I'm proposing to start a vote to create the 2.x branch and move master to 
> Beam
> 3.0.0-SNAPSHOT as join effort.
> 
> Regards
> JB
> 
> On 03/21/2018 08:36 AM, Romain Manni-Bucau wrote:
> > Hi guys,
> >
> > it got mentionned but without any concrete dates: when beam 3 work will 
> be started?
> >
> > I'm very interested in:
> >
> > 1. reworking the whole DAG API to ensure it is instrumentable (today 
> the dag
> > uses a tons of static utilities and internals which makes it not
> > industrializable at all as soon as you are just on top of beam)
> > 2. reworking the API definition in its own module not coupled to any
> > implementation details (api/provider design) and 100% based on the sdf
> > 3. rework the overall serialization (coders + transform serialization 
> which is
> > hardcoded today and not portable or industrializable at all)
> > 4. make runners decorable properly and not just forked each time you 
> need to
> > modify some behavior for a particular case
> > (+ indeed all the issues we hit and saw on the list)
> >
> > Romain Manni-Bucau
> > @rmannibucau  > |  Blog
> > > 
> |
> Old Blog
> > >
> | Github  > |
> > LinkedIn  > | Book
> >
> 
>  
> >
> 
> --
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: beam 3?

2018-03-21 Thread Romain Manni-Bucau
2018-03-21 9:13 GMT+01:00 Jean-Baptiste Onofré :

> Hi Romain,
>
> We didn't define a date yet.
>
> However, I think it makes lot of sense to think about that.
>
> What about creating a beam-2.x branch and move master version to
> 3.0.0-SNAPSHOT ?
>

Do we want to "move" master or start fresh to avoid to pay again the legacy
which prevents us to move correctly forward ATM?
I really wonder if starting from scratch and only bringing stable and well
defined API wouldn't be saner after 10 months working with beam.

Most of the actual logic will be importable easily when needed (I'm
thinking to the runner) but just bumping the major will keep the same
pitfall in the codebase which are very basic design issues IMHO.


>
> I almost agree with your point even if I would suggest you to use a more
> positive tone: being sharp never encourage the community, contribution and
> don't
> motivate people. You can say things but with some friendly form.
>

Read it as it is "I'm tired to always workaround the API" ;). Generally I
send a mail after some hard fight and disappointment so mea culpa for this
one.


>
> I would add:
>
> - Schema or PCollection: it's already started but I think we could do some
> improvements (potentially introducing some API change)
> - Hints/Annotations on PCollection: it's something we discussed during Beam
> Summit with Tyler and others. The idea is to mimic the Message Headers in
> Apache
> Camel. It would allow us to have more dynamic IOs and transforms, and give
> some
> additional statements to the runners.
>

+1


>
> I'm proposing to start a vote to create the 2.x branch and move master to
> Beam
> 3.0.0-SNAPSHOT as join effort.
>
> Regards
> JB
>
> On 03/21/2018 08:36 AM, Romain Manni-Bucau wrote:
> > Hi guys,
> >
> > it got mentionned but without any concrete dates: when beam 3 work will
> be started?
> >
> > I'm very interested in:
> >
> > 1. reworking the whole DAG API to ensure it is instrumentable (today the
> dag
> > uses a tons of static utilities and internals which makes it not
> > industrializable at all as soon as you are just on top of beam)
> > 2. reworking the API definition in its own module not coupled to any
> > implementation details (api/provider design) and 100% based on the sdf
> > 3. rework the overall serialization (coders + transform serialization
> which is
> > hardcoded today and not portable or industrializable at all)
> > 4. make runners decorable properly and not just forked each time you
> need to
> > modify some behavior for a particular case
> > (+ indeed all the issues we hit and saw on the list)
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github  rmannibucau> |
> > LinkedIn  | Book
> >  ee-8-high-performance>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: beam 3?

2018-03-21 Thread Jean-Baptiste Onofré
Hi Romain,

We didn't define a date yet.

However, I think it makes lot of sense to think about that.

What about creating a beam-2.x branch and move master version to 3.0.0-SNAPSHOT 
?

I almost agree with your point even if I would suggest you to use a more
positive tone: being sharp never encourage the community, contribution and don't
motivate people. You can say things but with some friendly form.

I would add:

- Schema or PCollection: it's already started but I think we could do some
improvements (potentially introducing some API change)
- Hints/Annotations on PCollection: it's something we discussed during Beam
Summit with Tyler and others. The idea is to mimic the Message Headers in Apache
Camel. It would allow us to have more dynamic IOs and transforms, and give some
additional statements to the runners.

I'm proposing to start a vote to create the 2.x branch and move master to Beam
3.0.0-SNAPSHOT as join effort.

Regards
JB

On 03/21/2018 08:36 AM, Romain Manni-Bucau wrote:
> Hi guys,
> 
> it got mentionned but without any concrete dates: when beam 3 work will be 
> started?
> 
> I'm very interested in:
> 
> 1. reworking the whole DAG API to ensure it is instrumentable (today the dag
> uses a tons of static utilities and internals which makes it not
> industrializable at all as soon as you are just on top of beam)
> 2. reworking the API definition in its own module not coupled to any
> implementation details (api/provider design) and 100% based on the sdf
> 3. rework the overall serialization (coders + transform serialization which is
> hardcoded today and not portable or industrializable at all)
> 4. make runners decorable properly and not just forked each time you need to
> modify some behavior for a particular case
> (+ indeed all the issues we hit and saw on the list)
> 
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github  |
> LinkedIn  | Book
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com