Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Jean-Baptiste Onofré

Just a small thing.

If it's not already done, don't forget to sign a ICLA and let us know 
your apache ID.


Thanks,
Regards
JB

On 10/22/2016 12:18 AM, Davor Bonaci wrote:

Hi everyone,
Please join me and the rest of Beam PPMC in welcoming the following
contributors as our newest committers. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.

* Thomas Weise
Thomas authored the Apache Apex runner for Beam [1]. This is an exciting
new runner that opens a new user base. It is a large contribution, which
starts the whole new component with a great potential.

* Jesse Anderson
Jesse has contributed significantly by promoting Beam. He has co-developed
a Beam tutorial and delivered it at a top big data conference. He published
several blog posts positioning Beam, Q with the Apache Beam team, and a
demo video how to run Beam on multiple runners [2]. On the side, he has
authored 7 pull requests and reported 6 JIRA issues.

* Thomas Groh
Since starting incubation, Thomas has contributed the most commits to the
project [3], a total of 226 commits, which is more than anybody else. He
has contributed broadly to the project, most significantly by developing
from scratch the DirectRunner that supports the full model semantics.
Additionally, he has contributed a new set of APIs for testing unbounded
pipelines. He published a blog highlighting this work.

Congratulations to all three! Welcome!

Davor

[1] https://github.com/apache/incubator-beam/tree/apex-runner
[2] http://www.smokinghand.com/
[3] https://github.com/apache/incubator-beam/graphs/contributors
?from=2016-02-01=2016-10-14=c



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Deferring (pre) combine for merging windows.

2016-10-21 Thread Robert Bradshaw
Combine.perKey() is defined as GroupByKey() | Combine.values().

A runner is free, in fact encouraged, to take advantage of the
associative properties of CombineFn to compute the result of
GroupByKey() | Combine.values() as cheaply as possible, but it is
incorrect to produce something that could not have been produced by
this composite implementation. (In the case of deterministic trigger
firing, (e.g. the default trigger), plus assuming of course a
associative, deterministic CombineFn, there is exactly one correct
output for every input no matter the WindowFns).

A corollary to this is that we cannot apply combining operations that
inspect the main input window (including side inputs where the mapping
is anything but the constant map (like to GlobalWindow)) until the
main input window is known.


On Fri, Oct 21, 2016 at 3:50 PM, Amit Sela  wrote:
> Please excuse my typos and apply "s/differ/defer/g" ;-).
> Amit.
>
> On Fri, Oct 21, 2016 at 2:59 PM Amit Sela  wrote:
>
>> I'd like to raise an issue that was discussed in BEAM-696
>> .
>> I won't recap here because it would be extensive (and probably
>> exhaustive), and I'd also like to restart the discussion here rather then
>> summarize it.
>>
>> *The problem*
>> In the case of (main) input in a merging window (e.g. Sessions) with
>> sideInputs, pre-combining might lead to non-deterministic behaviour, for
>> example:
>> Main input: e1 (time: 3), e2 (time: 5)
>> Session: gap duration of 3 -> e1 alone belongs to [3, 6), e2 alone [5, 8),
>> combined together the merging of their windows yields [3, 8).
>> Matching SideInputs with FixedWindows of size 2 should yield - e1 matching
>> sideInput window [4, 6), e2 [6, 8), merged [6, 8).
>> Now, if the sideInput is used in a merging step of the combine, and both
>> elements are a part of the same bundle, the sideInput accessed will
>> correspond to [6, 8) which is the expected behaviour, but if e1 is
>> pre-combined in a separate bundle, it will access sideInput for [4, 6)
>> which is wrong.
>> ** this can tends to be a bit confusing, so any clarifications/corrections
>> are most welcomed.*
>>
>> *Solutions*
>> The optimal solution would be to differ until trigger in case of merging
>> windows with sideInputs that are not "agnostic" to such behaviour, but this
>> is clearly not feasible since the nature and use of sideInputs in
>> CombineFns are opaque.
>> Second best would be to differ until trigger *only* if sideInputs are
>> used for merging windows - pretty sure this is how Flink and Dataflow (soon
>> Spark) runners do that.
>>
>> *Tradeoffs*
>> This seems like a very user-friendly way to apply authored pipelines
>> correctly, but this also means that users who called for a Combine
>> transformation will get a Grouping transformation instead (sort of the
>> opposite of combiner lifting ? a combiner unwrapping ?).
>> For the SDK, Combine is simply a composite transform, but keep in mind
>> that this affects runner optimization.
>> The price to pay here is (1) shuffle all elements into a single bundle
>> (the cost varies according to a runner's typical bundle size) (2) state can
>> grow as processing is differed and not compacted until triggered.
>>
>> IMHO, the execution should remain faithful to what the pipeline states,
>> and if this results in errors, well... it happens.
>> There are many legitimate use cases where an actual GroupByKey should be
>> used (regardless of sideInputs), such as sequencing of events in a window,
>> and I don't see the difference here.
>>
>> As stated above, I'm (almost) not recapping anyones notes as they are
>> persisted in BEAM-696, so if you had something to say please provide you
>> input here.
>> I will note that Ben Chambers and Pei He mentioned that even with
>> differing, this could still run into some non-determinism if there are
>> triggers controlling when we extract output because non-merging windows'
>> trigger firing is non-deterministic.
>>
>> Thanks,
>> Amit
>>
>>


Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Jesse Anderson
Thanks for the welcomes everyone!

On Fri, Oct 21, 2016 at 4:02 PM Mark Liu  wrote:

> Congrats for all of you!
>
> Mark
>
> On Fri, Oct 21, 2016 at 3:34 PM, Kenneth Knowles 
> wrote:
>
> > Huzzah!
> >
> > I've personally enjoyed working together, and I am glad to extend this
> > acknowledgement and welcome this addition to the Beam community.
> >
> > Kenn
> >
> > On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci  wrote:
> >
> > > Hi everyone,
> > > Please join me and the rest of Beam PPMC in welcoming the following
> > > contributors as our newest committers. They have significantly
> > contributed
> > > to the project in different ways, and we look forward to many more
> > > contributions in the future.
> > >
> > > * Thomas Weise
> > > Thomas authored the Apache Apex runner for Beam [1]. This is an
> exciting
> > > new runner that opens a new user base. It is a large contribution,
> which
> > > starts the whole new component with a great potential.
> > >
> > > * Jesse Anderson
> > > Jesse has contributed significantly by promoting Beam. He has
> > co-developed
> > > a Beam tutorial and delivered it at a top big data conference. He
> > published
> > > several blog posts positioning Beam, Q with the Apache Beam team,
> and a
> > > demo video how to run Beam on multiple runners [2]. On the side, he has
> > > authored 7 pull requests and reported 6 JIRA issues.
> > >
> > > * Thomas Groh
> > > Since starting incubation, Thomas has contributed the most commits to
> the
> > > project [3], a total of 226 commits, which is more than anybody else.
> He
> > > has contributed broadly to the project, most significantly by
> developing
> > > from scratch the DirectRunner that supports the full model semantics.
> > > Additionally, he has contributed a new set of APIs for testing
> unbounded
> > > pipelines. He published a blog highlighting this work.
> > >
> > > Congratulations to all three! Welcome!
> > >
> > > Davor
> > >
> > > [1] https://github.com/apache/incubator-beam/tree/apex-runner
> > > [2] http://www.smokinghand.com/
> > > [3] https://github.com/apache/incubator-beam/graphs/contributors
> > > ?from=2016-02-01=2016-10-14=c
> > >
> >
>


Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Mark Liu
Congrats for all of you!

Mark

On Fri, Oct 21, 2016 at 3:34 PM, Kenneth Knowles 
wrote:

> Huzzah!
>
> I've personally enjoyed working together, and I am glad to extend this
> acknowledgement and welcome this addition to the Beam community.
>
> Kenn
>
> On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci  wrote:
>
> > Hi everyone,
> > Please join me and the rest of Beam PPMC in welcoming the following
> > contributors as our newest committers. They have significantly
> contributed
> > to the project in different ways, and we look forward to many more
> > contributions in the future.
> >
> > * Thomas Weise
> > Thomas authored the Apache Apex runner for Beam [1]. This is an exciting
> > new runner that opens a new user base. It is a large contribution, which
> > starts the whole new component with a great potential.
> >
> > * Jesse Anderson
> > Jesse has contributed significantly by promoting Beam. He has
> co-developed
> > a Beam tutorial and delivered it at a top big data conference. He
> published
> > several blog posts positioning Beam, Q with the Apache Beam team, and a
> > demo video how to run Beam on multiple runners [2]. On the side, he has
> > authored 7 pull requests and reported 6 JIRA issues.
> >
> > * Thomas Groh
> > Since starting incubation, Thomas has contributed the most commits to the
> > project [3], a total of 226 commits, which is more than anybody else. He
> > has contributed broadly to the project, most significantly by developing
> > from scratch the DirectRunner that supports the full model semantics.
> > Additionally, he has contributed a new set of APIs for testing unbounded
> > pipelines. He published a blog highlighting this work.
> >
> > Congratulations to all three! Welcome!
> >
> > Davor
> >
> > [1] https://github.com/apache/incubator-beam/tree/apex-runner
> > [2] http://www.smokinghand.com/
> > [3] https://github.com/apache/incubator-beam/graphs/contributors
> > ?from=2016-02-01=2016-10-14=c
> >
>


Re: [DISCUSS] Deferring (pre) combine for merging windows.

2016-10-21 Thread Amit Sela
Please excuse my typos and apply "s/differ/defer/g" ;-).
Amit.

On Fri, Oct 21, 2016 at 2:59 PM Amit Sela  wrote:

> I'd like to raise an issue that was discussed in BEAM-696
> .
> I won't recap here because it would be extensive (and probably
> exhaustive), and I'd also like to restart the discussion here rather then
> summarize it.
>
> *The problem*
> In the case of (main) input in a merging window (e.g. Sessions) with
> sideInputs, pre-combining might lead to non-deterministic behaviour, for
> example:
> Main input: e1 (time: 3), e2 (time: 5)
> Session: gap duration of 3 -> e1 alone belongs to [3, 6), e2 alone [5, 8),
> combined together the merging of their windows yields [3, 8).
> Matching SideInputs with FixedWindows of size 2 should yield - e1 matching
> sideInput window [4, 6), e2 [6, 8), merged [6, 8).
> Now, if the sideInput is used in a merging step of the combine, and both
> elements are a part of the same bundle, the sideInput accessed will
> correspond to [6, 8) which is the expected behaviour, but if e1 is
> pre-combined in a separate bundle, it will access sideInput for [4, 6)
> which is wrong.
> ** this can tends to be a bit confusing, so any clarifications/corrections
> are most welcomed.*
>
> *Solutions*
> The optimal solution would be to differ until trigger in case of merging
> windows with sideInputs that are not "agnostic" to such behaviour, but this
> is clearly not feasible since the nature and use of sideInputs in
> CombineFns are opaque.
> Second best would be to differ until trigger *only* if sideInputs are
> used for merging windows - pretty sure this is how Flink and Dataflow (soon
> Spark) runners do that.
>
> *Tradeoffs*
> This seems like a very user-friendly way to apply authored pipelines
> correctly, but this also means that users who called for a Combine
> transformation will get a Grouping transformation instead (sort of the
> opposite of combiner lifting ? a combiner unwrapping ?).
> For the SDK, Combine is simply a composite transform, but keep in mind
> that this affects runner optimization.
> The price to pay here is (1) shuffle all elements into a single bundle
> (the cost varies according to a runner's typical bundle size) (2) state can
> grow as processing is differed and not compacted until triggered.
>
> IMHO, the execution should remain faithful to what the pipeline states,
> and if this results in errors, well... it happens.
> There are many legitimate use cases where an actual GroupByKey should be
> used (regardless of sideInputs), such as sequencing of events in a window,
> and I don't see the difference here.
>
> As stated above, I'm (almost) not recapping anyones notes as they are
> persisted in BEAM-696, so if you had something to say please provide you
> input here.
> I will note that Ben Chambers and Pei He mentioned that even with
> differing, this could still run into some non-determinism if there are
> triggers controlling when we extract output because non-merging windows'
> trigger firing is non-deterministic.
>
> Thanks,
> Amit
>
>


Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Frances Perry
Wonderful to see your contributions recognized ;-)

On Fri, Oct 21, 2016 at 3:46 PM, Ahmet Altay 
wrote:

> Congratulations to all of you!
>
> Ahmet
>
> On Fri, Oct 21, 2016 at 3:35 PM, Ben Chambers  >
> wrote:
>
> > Congrats. +3!
> >
> > On Fri, Oct 21, 2016 at 3:34 PM Kenneth Knowles 
> > wrote:
> >
> > > Huzzah!
> > >
> > > I've personally enjoyed working together, and I am glad to extend this
> > > acknowledgement and welcome this addition to the Beam community.
> > >
> > > Kenn
> > >
> > > On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci  wrote:
> > >
> > > > Hi everyone,
> > > > Please join me and the rest of Beam PPMC in welcoming the following
> > > > contributors as our newest committers. They have significantly
> > > contributed
> > > > to the project in different ways, and we look forward to many more
> > > > contributions in the future.
> > > >
> > > > * Thomas Weise
> > > > Thomas authored the Apache Apex runner for Beam [1]. This is an
> > exciting
> > > > new runner that opens a new user base. It is a large contribution,
> > which
> > > > starts the whole new component with a great potential.
> > > >
> > > > * Jesse Anderson
> > > > Jesse has contributed significantly by promoting Beam. He has
> > > co-developed
> > > > a Beam tutorial and delivered it at a top big data conference. He
> > > published
> > > > several blog posts positioning Beam, Q with the Apache Beam team,
> > and a
> > > > demo video how to run Beam on multiple runners [2]. On the side, he
> has
> > > > authored 7 pull requests and reported 6 JIRA issues.
> > > >
> > > > * Thomas Groh
> > > > Since starting incubation, Thomas has contributed the most commits to
> > the
> > > > project [3], a total of 226 commits, which is more than anybody else.
> > He
> > > > has contributed broadly to the project, most significantly by
> > developing
> > > > from scratch the DirectRunner that supports the full model semantics.
> > > > Additionally, he has contributed a new set of APIs for testing
> > unbounded
> > > > pipelines. He published a blog highlighting this work.
> > > >
> > > > Congratulations to all three! Welcome!
> > > >
> > > > Davor
> > > >
> > > > [1] https://github.com/apache/incubator-beam/tree/apex-runner
> > > > [2] http://www.smokinghand.com/
> > > > [3] https://github.com/apache/incubator-beam/graphs/contributors
> > > > ?from=2016-02-01=2016-10-14=c
> > > >
> > >
> >
>


Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Ahmet Altay
Congratulations to all of you!

Ahmet

On Fri, Oct 21, 2016 at 3:35 PM, Ben Chambers 
wrote:

> Congrats. +3!
>
> On Fri, Oct 21, 2016 at 3:34 PM Kenneth Knowles 
> wrote:
>
> > Huzzah!
> >
> > I've personally enjoyed working together, and I am glad to extend this
> > acknowledgement and welcome this addition to the Beam community.
> >
> > Kenn
> >
> > On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci  wrote:
> >
> > > Hi everyone,
> > > Please join me and the rest of Beam PPMC in welcoming the following
> > > contributors as our newest committers. They have significantly
> > contributed
> > > to the project in different ways, and we look forward to many more
> > > contributions in the future.
> > >
> > > * Thomas Weise
> > > Thomas authored the Apache Apex runner for Beam [1]. This is an
> exciting
> > > new runner that opens a new user base. It is a large contribution,
> which
> > > starts the whole new component with a great potential.
> > >
> > > * Jesse Anderson
> > > Jesse has contributed significantly by promoting Beam. He has
> > co-developed
> > > a Beam tutorial and delivered it at a top big data conference. He
> > published
> > > several blog posts positioning Beam, Q with the Apache Beam team,
> and a
> > > demo video how to run Beam on multiple runners [2]. On the side, he has
> > > authored 7 pull requests and reported 6 JIRA issues.
> > >
> > > * Thomas Groh
> > > Since starting incubation, Thomas has contributed the most commits to
> the
> > > project [3], a total of 226 commits, which is more than anybody else.
> He
> > > has contributed broadly to the project, most significantly by
> developing
> > > from scratch the DirectRunner that supports the full model semantics.
> > > Additionally, he has contributed a new set of APIs for testing
> unbounded
> > > pipelines. He published a blog highlighting this work.
> > >
> > > Congratulations to all three! Welcome!
> > >
> > > Davor
> > >
> > > [1] https://github.com/apache/incubator-beam/tree/apex-runner
> > > [2] http://www.smokinghand.com/
> > > [3] https://github.com/apache/incubator-beam/graphs/contributors
> > > ?from=2016-02-01=2016-10-14=c
> > >
> >
>


Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Neelesh Salian
Congratulations folks!

On Fri, Oct 21, 2016 at 3:34 PM, Kenneth Knowles 
wrote:

> Huzzah!
>
> I've personally enjoyed working together, and I am glad to extend this
> acknowledgement and welcome this addition to the Beam community.
>
> Kenn
>
> On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci  wrote:
>
> > Hi everyone,
> > Please join me and the rest of Beam PPMC in welcoming the following
> > contributors as our newest committers. They have significantly
> contributed
> > to the project in different ways, and we look forward to many more
> > contributions in the future.
> >
> > * Thomas Weise
> > Thomas authored the Apache Apex runner for Beam [1]. This is an exciting
> > new runner that opens a new user base. It is a large contribution, which
> > starts the whole new component with a great potential.
> >
> > * Jesse Anderson
> > Jesse has contributed significantly by promoting Beam. He has
> co-developed
> > a Beam tutorial and delivered it at a top big data conference. He
> published
> > several blog posts positioning Beam, Q with the Apache Beam team, and a
> > demo video how to run Beam on multiple runners [2]. On the side, he has
> > authored 7 pull requests and reported 6 JIRA issues.
> >
> > * Thomas Groh
> > Since starting incubation, Thomas has contributed the most commits to the
> > project [3], a total of 226 commits, which is more than anybody else. He
> > has contributed broadly to the project, most significantly by developing
> > from scratch the DirectRunner that supports the full model semantics.
> > Additionally, he has contributed a new set of APIs for testing unbounded
> > pipelines. He published a blog highlighting this work.
> >
> > Congratulations to all three! Welcome!
> >
> > Davor
> >
> > [1] https://github.com/apache/incubator-beam/tree/apex-runner
> > [2] http://www.smokinghand.com/
> > [3] https://github.com/apache/incubator-beam/graphs/contributors
> > ?from=2016-02-01=2016-10-14=c
> >
>



-- 
Neelesh Srinivas Salian
Customer Operations Engineer


Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Kenneth Knowles
Huzzah!

I've personally enjoyed working together, and I am glad to extend this
acknowledgement and welcome this addition to the Beam community.

Kenn

On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci  wrote:

> Hi everyone,
> Please join me and the rest of Beam PPMC in welcoming the following
> contributors as our newest committers. They have significantly contributed
> to the project in different ways, and we look forward to many more
> contributions in the future.
>
> * Thomas Weise
> Thomas authored the Apache Apex runner for Beam [1]. This is an exciting
> new runner that opens a new user base. It is a large contribution, which
> starts the whole new component with a great potential.
>
> * Jesse Anderson
> Jesse has contributed significantly by promoting Beam. He has co-developed
> a Beam tutorial and delivered it at a top big data conference. He published
> several blog posts positioning Beam, Q with the Apache Beam team, and a
> demo video how to run Beam on multiple runners [2]. On the side, he has
> authored 7 pull requests and reported 6 JIRA issues.
>
> * Thomas Groh
> Since starting incubation, Thomas has contributed the most commits to the
> project [3], a total of 226 commits, which is more than anybody else. He
> has contributed broadly to the project, most significantly by developing
> from scratch the DirectRunner that supports the full model semantics.
> Additionally, he has contributed a new set of APIs for testing unbounded
> pipelines. He published a blog highlighting this work.
>
> Congratulations to all three! Welcome!
>
> Davor
>
> [1] https://github.com/apache/incubator-beam/tree/apex-runner
> [2] http://www.smokinghand.com/
> [3] https://github.com/apache/incubator-beam/graphs/contributors
> ?from=2016-02-01=2016-10-14=c
>


[ANNOUNCEMENT] New committers!

2016-10-21 Thread Davor Bonaci
Hi everyone,
Please join me and the rest of Beam PPMC in welcoming the following
contributors as our newest committers. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.

* Thomas Weise
Thomas authored the Apache Apex runner for Beam [1]. This is an exciting
new runner that opens a new user base. It is a large contribution, which
starts the whole new component with a great potential.

* Jesse Anderson
Jesse has contributed significantly by promoting Beam. He has co-developed
a Beam tutorial and delivered it at a top big data conference. He published
several blog posts positioning Beam, Q with the Apache Beam team, and a
demo video how to run Beam on multiple runners [2]. On the side, he has
authored 7 pull requests and reported 6 JIRA issues.

* Thomas Groh
Since starting incubation, Thomas has contributed the most commits to the
project [3], a total of 226 commits, which is more than anybody else. He
has contributed broadly to the project, most significantly by developing
from scratch the DirectRunner that supports the full model semantics.
Additionally, he has contributed a new set of APIs for testing unbounded
pipelines. He published a blog highlighting this work.

Congratulations to all three! Welcome!

Davor

[1] https://github.com/apache/incubator-beam/tree/apex-runner
[2] http://www.smokinghand.com/
[3] https://github.com/apache/incubator-beam/graphs/contributors
?from=2016-02-01=2016-10-14=c


Re: Jenkins build is unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #1388

2016-10-21 Thread Jason Kuster
Filed https://issues.apache.org/jira/browse/BEAM-795 to track

On Fri, Oct 21, 2016 at 6:08 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  GoogleCloudDataflow/1388/>
>
>


-- 
---
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow


[DISCUSS] Deferring (pre) combine for merging windows.

2016-10-21 Thread Amit Sela
I'd like to raise an issue that was discussed in BEAM-696
.
I won't recap here because it would be extensive (and probably exhaustive),
and I'd also like to restart the discussion here rather then summarize it.

*The problem*
In the case of (main) input in a merging window (e.g. Sessions) with
sideInputs, pre-combining might lead to non-deterministic behaviour, for
example:
Main input: e1 (time: 3), e2 (time: 5)
Session: gap duration of 3 -> e1 alone belongs to [3, 6), e2 alone [5, 8),
combined together the merging of their windows yields [3, 8).
Matching SideInputs with FixedWindows of size 2 should yield - e1 matching
sideInput window [4, 6), e2 [6, 8), merged [6, 8).
Now, if the sideInput is used in a merging step of the combine, and both
elements are a part of the same bundle, the sideInput accessed will
correspond to [6, 8) which is the expected behaviour, but if e1 is
pre-combined in a separate bundle, it will access sideInput for [4, 6)
which is wrong.
** this can tends to be a bit confusing, so any clarifications/corrections
are most welcomed.*

*Solutions*
The optimal solution would be to differ until trigger in case of merging
windows with sideInputs that are not "agnostic" to such behaviour, but this
is clearly not feasible since the nature and use of sideInputs in
CombineFns are opaque.
Second best would be to differ until trigger *only* if sideInputs are used
for merging windows - pretty sure this is how Flink and Dataflow (soon
Spark) runners do that.

*Tradeoffs*
This seems like a very user-friendly way to apply authored pipelines
correctly, but this also means that users who called for a Combine
transformation will get a Grouping transformation instead (sort of the
opposite of combiner lifting ? a combiner unwrapping ?).
For the SDK, Combine is simply a composite transform, but keep in mind that
this affects runner optimization.
The price to pay here is (1) shuffle all elements into a single bundle (the
cost varies according to a runner's typical bundle size) (2) state can grow
as processing is differed and not compacted until triggered.

IMHO, the execution should remain faithful to what the pipeline states, and
if this results in errors, well... it happens.
There are many legitimate use cases where an actual GroupByKey should be
used (regardless of sideInputs), such as sequencing of events in a window,
and I don't see the difference here.

As stated above, I'm (almost) not recapping anyones notes as they are
persisted in BEAM-696, so if you had something to say please provide you
input here.
I will note that Ben Chambers and Pei He mentioned that even with
differing, this could still run into some non-determinism if there are
triggers controlling when we extract output because non-merging windows'
trigger firing is non-deterministic.

Thanks,
Amit


Re: Start of release 0.3.0-incubating

2016-10-21 Thread Maximilian Michels
+1 for the release. We have plenty of fixes in and users have already
asked for a new release.

-Max


On Fri, Oct 21, 2016 at 10:22 AM, Jean-Baptiste Onofré  
wrote:
> Hi Aljoscha,
>
> OK for me, you can go ahead ;)
>
> Thanks again to tackle this release !
>
> Regards
> JB
>
>
> On 10/21/2016 08:51 AM, Aljoscha Krettek wrote:
>>
>> +1 @JB
>>
>> We should definitely keep that in mind for the next releases. I think this
>> one is now sufficiently announced so I'll get started on the process.
>> (Which will take me a while since I have to do all the initial setup.)
>>
>>
>>
>> On Fri, 21 Oct 2016 at 06:32 Jean-Baptiste Onofré  wrote:
>>
>>> Hi Dan,
>>>
>>> No problem, MQTT and other IOs will be in the next release..
>>>
>>> IMHO, it would be great to have:
>>> 1. A release reminder couple of days before a release. Just to ask
>>> everyone if there's no objection (something like this:
>>>
>>>
>>> https://lists.apache.org/thread.html/80de75df0115940ca402132338b221e5dd5f669fd1bf915cd95e15c3@%3Cdev.karaf.apache.org%3E
>>> )
>>> 2. A roughly release schedule on the website (something like this:
>>> http://karaf.apache.org/download.html#container-schedule for instance).
>>>
>>> Just my $0.01 ;)
>>>
>>> Regards
>>> JB
>>>
>>> On 10/20/2016 06:30 PM, Dan Halperin wrote:

 Hi JB,

 This is a great discussion to have! IMO, there's no special
 functionality
 requirements for these pre-TLP releases. It's more important to make
 sure
 we keep the process going. (I think we should start the release as soon
>>>
>>> as

 possible, because it's been 2 months since the last one.)

 If we hold a release a week for MQTT, we'll hold it another week for
 some
 other new feature, and then hold it again for some other new feature.

 Can you make a strong argument for why MQTT in particular should be
>>>
>>> release

 blocking?

 Dan

 On Thu, Oct 20, 2016 at 9:26 AM, Jean-Baptiste Onofré 
 wrote:

> +1
>
> Thanks Aljosha !!
>
> Do you mind to wait the week end or Monday to start the release ? I
>>>
>>> would
>
> like to include MqttIO if possible.
>
> Thanks !
> Regards
> JB
>
> ⁣
>
> On Oct 20, 2016, 18:07, at 18:07, Dan Halperin
>>>
>>> 
>
> wrote:
>>
>> On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
>> 
>> wrote:
>>
>>> Hi,
>>> thanks for taking the time and writing this extensive doc!
>>>
>>> If no-one is against this I would like to be the release manager for
>>
>> the
>>>
>>> next (0.3.0-incubating) release. I would work with the guide and
>>
>> update it
>>>
>>> with anything that I learn along the way. Should I open a new thread
>>
>> for
>>>
>>> this or is it ok of nobody objects here?
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>
>> Spinning this out as a separate thread.
>>
>> +1 -- Sounds great to me!
>>
>> Dan
>>
>> On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
>> 
>> wrote:
>>
>>> Hi,
>>> thanks for taking the time and writing this extensive doc!
>>>
>>> If no-one is against this I would like to be the release manager for
>>
>> the
>>>
>>> next (0.3.0-incubating) release. I would work with the guide and
>>
>> update it
>>>
>>> with anything that I learn along the way. Should I open a new thread
>>
>> for
>>>
>>> this or is it ok of nobody objects here?
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Thu, 20 Oct 2016 at 07:10 Jean-Baptiste Onofré 
>>
>> wrote:
>>>
>>>
 Hi,

 well done.

 As already discussed, it looks good to me ;)

 Regards
 JB

 On 10/20/2016 01:24 AM, Davor Bonaci wrote:
>
> Hi everybody,
> As a project, I think we should have a Release Guide to document
>>
>> the
>
> process, have consistent releases, on-board additional release
>>>
>>> managers,
>
> and generally share knowledge. It is also one of the project
>>
>> graduation
>
> guidelines.
>
> Dan and I wrote a draft version, documenting the process we did
>>
>> for the
>
> first two releases. It is currently in a pull request [1]. I'd
>>
>> invite
>
> everyone interested to take a peek and comment, either on the
>>
>> pull

 request
>
> itself or here on mailing list, as appropriate.
>
> Thanks,
> Davor
>
> [1] https://github.com/apache/incubator-beam-site/pull/49
>

 

Re: Tracking backward-incompatible changes for Beam

2016-10-21 Thread Jean-Baptiste Onofré

Hi Dan,

+1, good idea.

Regards
JB

On 10/21/2016 02:21 AM, Dan Halperin wrote:

Hey everyone,

In the Beam codebase, we’ve improved, rewritten, or deleted many APIs.
While this has improved the model and gives us great freedom to experiment,
we are also causing churn on users authoring Beam libraries and pipelines.

To really kick off Beam as something users can depend on, we need to
stabilize the Beam API. Stabilizing means a commitment to not making
breaking changes -- except between major versions as per standard semantic
versioning.

To get there, I’ve started a process for tracking these changes by applying
the `backward-incompatible` label [1] to the corresponding JIRA issues.
Naturally, open `backward-incompatible` changes are “blocking issues” for
the first stable release. (Or we’ll have to put them off for the next major
version!)

So here are some requests for help:
* Please review and appropriately label the components I skipped:
runner-{apex, flink, gearpump, spark}, sdk-py.
* Please proactively file JIRA issues for breaking API changes you still
want to make, and label them.

Thanks everyone!
Dan


[1]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20labels%20%3D%20backward-incompatible



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Start of release 0.3.0-incubating

2016-10-21 Thread Jean-Baptiste Onofré

Hi Aljoscha,

OK for me, you can go ahead ;)

Thanks again to tackle this release !

Regards
JB

On 10/21/2016 08:51 AM, Aljoscha Krettek wrote:

+1 @JB

We should definitely keep that in mind for the next releases. I think this
one is now sufficiently announced so I'll get started on the process.
(Which will take me a while since I have to do all the initial setup.)



On Fri, 21 Oct 2016 at 06:32 Jean-Baptiste Onofré  wrote:


Hi Dan,

No problem, MQTT and other IOs will be in the next release..

IMHO, it would be great to have:
1. A release reminder couple of days before a release. Just to ask
everyone if there's no objection (something like this:

https://lists.apache.org/thread.html/80de75df0115940ca402132338b221e5dd5f669fd1bf915cd95e15c3@%3Cdev.karaf.apache.org%3E
)
2. A roughly release schedule on the website (something like this:
http://karaf.apache.org/download.html#container-schedule for instance).

Just my $0.01 ;)

Regards
JB

On 10/20/2016 06:30 PM, Dan Halperin wrote:

Hi JB,

This is a great discussion to have! IMO, there's no special functionality
requirements for these pre-TLP releases. It's more important to make sure
we keep the process going. (I think we should start the release as soon

as

possible, because it's been 2 months since the last one.)

If we hold a release a week for MQTT, we'll hold it another week for some
other new feature, and then hold it again for some other new feature.

Can you make a strong argument for why MQTT in particular should be

release

blocking?

Dan

On Thu, Oct 20, 2016 at 9:26 AM, Jean-Baptiste Onofré 
wrote:


+1

Thanks Aljosha !!

Do you mind to wait the week end or Monday to start the release ? I

would

like to include MqttIO if possible.

Thanks !
Regards
JB

⁣​

On Oct 20, 2016, 18:07, at 18:07, Dan Halperin



wrote:

On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek

wrote:


Hi,
thanks for taking the time and writing this extensive doc!

If no-one is against this I would like to be the release manager for

the

next (0.3.0-incubating) release. I would work with the guide and

update it

with anything that I learn along the way. Should I open a new thread

for

this or is it ok of nobody objects here?

Cheers,
Aljoscha



Spinning this out as a separate thread.

+1 -- Sounds great to me!

Dan

On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek

wrote:


Hi,
thanks for taking the time and writing this extensive doc!

If no-one is against this I would like to be the release manager for

the

next (0.3.0-incubating) release. I would work with the guide and

update it

with anything that I learn along the way. Should I open a new thread

for

this or is it ok of nobody objects here?

Cheers,
Aljoscha

On Thu, 20 Oct 2016 at 07:10 Jean-Baptiste Onofré 

wrote:



Hi,

well done.

As already discussed, it looks good to me ;)

Regards
JB

On 10/20/2016 01:24 AM, Davor Bonaci wrote:

Hi everybody,
As a project, I think we should have a Release Guide to document

the

process, have consistent releases, on-board additional release

managers,

and generally share knowledge. It is also one of the project

graduation

guidelines.

Dan and I wrote a draft version, documenting the process we did

for the

first two releases. It is currently in a pull request [1]. I'd

invite

everyone interested to take a peek and comment, either on the

pull

request

itself or here on mailing list, as appropriate.

Thanks,
Davor

[1] https://github.com/apache/incubator-beam-site/pull/49



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com









--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Start of release 0.3.0-incubating

2016-10-21 Thread Aljoscha Krettek
+1 @JB

We should definitely keep that in mind for the next releases. I think this
one is now sufficiently announced so I'll get started on the process.
(Which will take me a while since I have to do all the initial setup.)



On Fri, 21 Oct 2016 at 06:32 Jean-Baptiste Onofré  wrote:

> Hi Dan,
>
> No problem, MQTT and other IOs will be in the next release..
>
> IMHO, it would be great to have:
> 1. A release reminder couple of days before a release. Just to ask
> everyone if there's no objection (something like this:
>
> https://lists.apache.org/thread.html/80de75df0115940ca402132338b221e5dd5f669fd1bf915cd95e15c3@%3Cdev.karaf.apache.org%3E
> )
> 2. A roughly release schedule on the website (something like this:
> http://karaf.apache.org/download.html#container-schedule for instance).
>
> Just my $0.01 ;)
>
> Regards
> JB
>
> On 10/20/2016 06:30 PM, Dan Halperin wrote:
> > Hi JB,
> >
> > This is a great discussion to have! IMO, there's no special functionality
> > requirements for these pre-TLP releases. It's more important to make sure
> > we keep the process going. (I think we should start the release as soon
> as
> > possible, because it's been 2 months since the last one.)
> >
> > If we hold a release a week for MQTT, we'll hold it another week for some
> > other new feature, and then hold it again for some other new feature.
> >
> > Can you make a strong argument for why MQTT in particular should be
> release
> > blocking?
> >
> > Dan
> >
> > On Thu, Oct 20, 2016 at 9:26 AM, Jean-Baptiste Onofré 
> > wrote:
> >
> >> +1
> >>
> >> Thanks Aljosha !!
> >>
> >> Do you mind to wait the week end or Monday to start the release ? I
> would
> >> like to include MqttIO if possible.
> >>
> >> Thanks !
> >> Regards
> >> JB
> >>
> >> ⁣​
> >>
> >> On Oct 20, 2016, 18:07, at 18:07, Dan Halperin
> 
> >> wrote:
> >>> On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> >>> 
> >>> wrote:
> >>>
>  Hi,
>  thanks for taking the time and writing this extensive doc!
> 
>  If no-one is against this I would like to be the release manager for
> >>> the
>  next (0.3.0-incubating) release. I would work with the guide and
> >>> update it
>  with anything that I learn along the way. Should I open a new thread
> >>> for
>  this or is it ok of nobody objects here?
> 
>  Cheers,
>  Aljoscha
> 
> >>>
> >>> Spinning this out as a separate thread.
> >>>
> >>> +1 -- Sounds great to me!
> >>>
> >>> Dan
> >>>
> >>> On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> >>> 
> >>> wrote:
> >>>
>  Hi,
>  thanks for taking the time and writing this extensive doc!
> 
>  If no-one is against this I would like to be the release manager for
> >>> the
>  next (0.3.0-incubating) release. I would work with the guide and
> >>> update it
>  with anything that I learn along the way. Should I open a new thread
> >>> for
>  this or is it ok of nobody objects here?
> 
>  Cheers,
>  Aljoscha
> 
>  On Thu, 20 Oct 2016 at 07:10 Jean-Baptiste Onofré 
> >>> wrote:
> 
> > Hi,
> >
> > well done.
> >
> > As already discussed, it looks good to me ;)
> >
> > Regards
> > JB
> >
> > On 10/20/2016 01:24 AM, Davor Bonaci wrote:
> >> Hi everybody,
> >> As a project, I think we should have a Release Guide to document
> >>> the
> >> process, have consistent releases, on-board additional release
>  managers,
> >> and generally share knowledge. It is also one of the project
> >>> graduation
> >> guidelines.
> >>
> >> Dan and I wrote a draft version, documenting the process we did
> >>> for the
> >> first two releases. It is currently in a pull request [1]. I'd
> >>> invite
> >> everyone interested to take a peek and comment, either on the
> >>> pull
> > request
> >> itself or here on mailing list, as appropriate.
> >>
> >> Thanks,
> >> Davor
> >>
> >> [1] https://github.com/apache/incubator-beam-site/pull/49
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
> 
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>