Re: 2.7.1 (LTS) release?

2019-01-31 Thread Thomas Weise
Either looks fine to me. Same content, different label :)


On Thu, Jan 31, 2019 at 6:32 PM Michael Luckey  wrote:

> Thx Thomas for that clarification. I tried to express, I d slightly prefer
> to have branches
>
> 2.7.x
> 2.8.x
> 2.9.x
>
> and tags:
> 2.7.0
> 2.7.1
> ...
>
> So only difference would be to be more explicit on the branch name, i.e.
> that it embraces all the patch versions. (I do not know how to better
> express, that '2.7.x' is a literal string and should not be confused as
> some placeholder.)
>
> Regarding the versioning, I always prefer the explicit version including
> patch version. It might make it easier to help and resolve issues if it is
> known on which patch level a user is running. I spent lot of lifetime
> assuming some version and realising later it was 'just another snapshot'
> version...
>
> Just my 2 ct... Also fine with the previous suggestion.
>
>
>
> On Fri, Feb 1, 2019 at 3:18 AM Thomas Weise  wrote:
>
>> Hi,
>>
>> As Kenn had already examplified, the suggestion was to have branches:
>>
>> 2.7
>> 2.8
>> 2.9
>> ...
>>
>> and tags:
>>
>> 2.7.0
>> 2.7.1
>> ...
>> 2.8.0
>> ...
>>
>> Changes would go to the 2.7 branch, at some point release 2.7.1 is
>> created. Then more changes may accrue on the same branch, maybe at some
>> point 2.7.2 is released and so on.
>>
>> We could also consider changing the snapshot version to 2.7-SNAPSHOT,
>> instead of 2.7.{0,1,...}-SNAPSHOT.
>>
>> With that it wouldn't even be necessary to change the version number on
>> the branch.
>>
>> Thanks,
>> Thomas
>>
>>
>>
>> On Thu, Jan 31, 2019 at 5:59 PM Michael Luckey 
>> wrote:
>>
>>> Ah, sorry, I misread that.
>>>
>>> I slightly prefer the branch to have that '.x' suffix, as it is slightly
>>> more explicit. But technically there will be no difference.
>>>
>>> On Fri, Feb 1, 2019 at 2:55 AM Chamikara Jayalath 
>>> wrote:
>>>
 Sorry, what I meant was branches+tags for each minor version release
 and adding updates and tags to the same branch for patch releases. Name of
 the branch can be release-2.X for minor version release 2.X.0 as Thomas
 mentioned.

 - Cham

 On Thu, Jan 31, 2019 at 5:46 PM Michael Luckey 
 wrote:

> Maybe we should not go so far to name branches 2.x. This will probably
> make it difficult to support more than 1 LTS. Don't know, whether we ever
> intent to do so, but supporting 2.7 and 2.13 on a 2.x branch seems
> difficult?
>
> A more explicit 2.7.x with tags 2.7.1, 2.7.2 etc might be better? If
> we are going to support a second LTS later on, we could just add that
> 2.??.x branch.
>
> michel
>
> On Fri, Feb 1, 2019 at 2:37 AM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> +1 for 2.x branches and tags for 2.x.y releases.
>>
>> Also, I think we should integrate the dependency upgrade
>> https://issues.apache.org/jira/browse/BEAM-6552 to 2.7.1 which fixes
>> a rare but critical bug.
>>
>> Thanks,
>> Cham
>>
>> On Thu, Jan 31, 2019 at 12:17 PM Kenneth Knowles 
>> wrote:
>>
>>> It makes sense to me that 2.7 is a branch and just tags for 2.7.0,
>>> 2.7.1, etc.
>>>
>>> On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise 
>>> wrote:
>>>
 How about naming the branches release-X.Y and use them as base for
 all the X.Y.Z releases? We already have the X.Y.Z tags to refer to the
 actual release.

 On Thu, Jan 31, 2019 at 11:23 AM Charles Chen 
 wrote:

> I would be in favor of keeping the old 2.7.0 release branch / tag
> static so that referring to it will always get the right 2.7.0 code.
>
> On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles 
> wrote:
>
>> I have waffled on whether to have release-2.7 and only branch
>> release-2.7.1 when starting that release. I think that whenever we 
>> release
>> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, 
>> no? Or
>> perhaps on release-2.7 branch the hardcoded version strings could be
>> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new 
>> release
>> branch? I guess I think either one is fine. I think starting the 
>> branch now
>> is smart, so that you can accumulate cherrypicks of backports.
>>
>> Kenn
>>
>> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels <
>> m...@apache.org> wrote:
>>
>>> 2.10.0 will be done when its done. Same goes for 2.7.1, which is
>>> likely going to
>>> be done later since we are focusing on 2.10.0 at the moment.
>>>
>>> I've created the release-2.7.1 branch because there is no other
>>> place for fixes
>>> of future versions. It would be helpful to have a minor version
>>> branch (e.g.
>>> 

Re: Another new contributor!

2019-01-31 Thread Ankur Goenka
Welcome Brian!

On Fri, Feb 1, 2019 at 6:49 AM Mikhail Gryzykhin <
gryzykhin.mikh...@gmail.com> wrote:

> Welcome to the community!
>
> On Thu, Jan 31, 2019, 16:50 Alex Amato  wrote:
>
>> Great to start working with you Brian, welcome.
>>
>> On Thu, Jan 31, 2019 at 4:23 PM Brian Hulette 
>> wrote:
>>
>>> Can I get bhulette@ added to the BEAM project in Jira?
>>>
>>> Gleb - I'd definitely be interested in seeing that discussion, is it
>>> available somewhere in the archives? I can't find it.
>>>
>>> On Thu, Jan 31, 2019 at 5:13 AM Gleb Kanterov  wrote:
>>>
 Welcome! Would be interesting to hear your thoughts on Arrow, Arrow
 Flight, and Beam Portability relation, this topic was recently discussed in
 dev@.

 On Thu, Jan 31, 2019 at 2:00 PM Ismaël Mejía  wrote:

> Welcome Brian!
> Great to have someone with Apache experience already and also with
> Arrow knowledge.
>
> On Thu, Jan 31, 2019 at 1:32 PM Maximilian Michels 
> wrote:
> >
> > Welcome! Arrow and Beam together would open lots of possibilities.
> Portability
> > documentation improvements would be much appreciated :)
> >
> > On 31.01.19 11:25, Łukasz Gajowy wrote:
> > > Welcome!
> > >
> > > czw., 31 sty 2019 o 02:40 Kenneth Knowles  > > > napisał(a):
> > >
> > > Welcome!
> > >
> > > On Wed, Jan 30, 2019, 17:30 Connell O'Callaghan <
> conne...@google.com
> > >  wrote:
> > >
> > > Welcome on board Brian!
> > >
> > > On Wed, Jan 30, 2019 at 5:29 PM Ahmet Altay <
> al...@google.com
> > > > wrote:
> > >
> > > Welcome Brian!
> > >
> > > On Wed, Jan 30, 2019 at 5:26 PM Brian Hulette <
> bhule...@google.com
> > > > wrote:
> > >
> > > Hi everyone,
> > > I'm Brian Hulette, I just switched roles at Google
> and I'll be
> > > contributing to Beam Portability as part of my new
> position. For
> > > now I'm just going through documentation and
> getting familiar
> > > with Beam from the user perspective, so if
> anything I'll just be
> > > suggesting minor edits to documentation, but I
> hope to be
> > > putting up PRs soon enough.
> > >
> > > I am also an Apache committer (bhulette is my ASF
> id and Jira
> > > username). I worked on the Arrow project's
> Javascript
> > > implementation in a previous job, and I'm really
> excited to look
> > > for ways to use Arrow and Beam together once I've
> ramped up.
> > >
> > > Brian
> > >
>


 --
 Cheers,
 Gleb

>>>


Re: 2.7.1 (LTS) release?

2019-01-31 Thread Michael Luckey
Thx Thomas for that clarification. I tried to express, I d slightly prefer
to have branches

2.7.x
2.8.x
2.9.x

and tags:
2.7.0
2.7.1
...

So only difference would be to be more explicit on the branch name, i.e.
that it embraces all the patch versions. (I do not know how to better
express, that '2.7.x' is a literal string and should not be confused as
some placeholder.)

Regarding the versioning, I always prefer the explicit version including
patch version. It might make it easier to help and resolve issues if it is
known on which patch level a user is running. I spent lot of lifetime
assuming some version and realising later it was 'just another snapshot'
version...

Just my 2 ct... Also fine with the previous suggestion.



On Fri, Feb 1, 2019 at 3:18 AM Thomas Weise  wrote:

> Hi,
>
> As Kenn had already examplified, the suggestion was to have branches:
>
> 2.7
> 2.8
> 2.9
> ...
>
> and tags:
>
> 2.7.0
> 2.7.1
> ...
> 2.8.0
> ...
>
> Changes would go to the 2.7 branch, at some point release 2.7.1 is
> created. Then more changes may accrue on the same branch, maybe at some
> point 2.7.2 is released and so on.
>
> We could also consider changing the snapshot version to 2.7-SNAPSHOT,
> instead of 2.7.{0,1,...}-SNAPSHOT.
>
> With that it wouldn't even be necessary to change the version number on
> the branch.
>
> Thanks,
> Thomas
>
>
>
> On Thu, Jan 31, 2019 at 5:59 PM Michael Luckey 
> wrote:
>
>> Ah, sorry, I misread that.
>>
>> I slightly prefer the branch to have that '.x' suffix, as it is slightly
>> more explicit. But technically there will be no difference.
>>
>> On Fri, Feb 1, 2019 at 2:55 AM Chamikara Jayalath 
>> wrote:
>>
>>> Sorry, what I meant was branches+tags for each minor version release and
>>> adding updates and tags to the same branch for patch releases. Name of the
>>> branch can be release-2.X for minor version release 2.X.0 as Thomas
>>> mentioned.
>>>
>>> - Cham
>>>
>>> On Thu, Jan 31, 2019 at 5:46 PM Michael Luckey 
>>> wrote:
>>>
 Maybe we should not go so far to name branches 2.x. This will probably
 make it difficult to support more than 1 LTS. Don't know, whether we ever
 intent to do so, but supporting 2.7 and 2.13 on a 2.x branch seems
 difficult?

 A more explicit 2.7.x with tags 2.7.1, 2.7.2 etc might be better? If we
 are going to support a second LTS later on, we could just add that 2.??.x
 branch.

 michel

 On Fri, Feb 1, 2019 at 2:37 AM Chamikara Jayalath 
 wrote:

> +1 for 2.x branches and tags for 2.x.y releases.
>
> Also, I think we should integrate the dependency upgrade
> https://issues.apache.org/jira/browse/BEAM-6552 to 2.7.1 which fixes
> a rare but critical bug.
>
> Thanks,
> Cham
>
> On Thu, Jan 31, 2019 at 12:17 PM Kenneth Knowles 
> wrote:
>
>> It makes sense to me that 2.7 is a branch and just tags for 2.7.0,
>> 2.7.1, etc.
>>
>> On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise  wrote:
>>
>>> How about naming the branches release-X.Y and use them as base for
>>> all the X.Y.Z releases? We already have the X.Y.Z tags to refer to the
>>> actual release.
>>>
>>> On Thu, Jan 31, 2019 at 11:23 AM Charles Chen 
>>> wrote:
>>>
 I would be in favor of keeping the old 2.7.0 release branch / tag
 static so that referring to it will always get the right 2.7.0 code.

 On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles 
 wrote:

> I have waffled on whether to have release-2.7 and only branch
> release-2.7.1 when starting that release. I think that whenever we 
> release
> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, 
> no? Or
> perhaps on release-2.7 branch the hardcoded version strings could be
> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new 
> release
> branch? I guess I think either one is fine. I think starting the 
> branch now
> is smart, so that you can accumulate cherrypicks of backports.
>
> Kenn
>
> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
> wrote:
>
>> 2.10.0 will be done when its done. Same goes for 2.7.1, which is
>> likely going to
>> be done later since we are focusing on 2.10.0 at the moment.
>>
>> I've created the release-2.7.1 branch because there is no other
>> place for fixes
>> of future versions. It would be helpful to have a minor version
>> branch (e.g.
>> release-2.7) which can be continuously updated.
>>
>> More generally speaking, we should dedicate time for LTS
>> releases. What is the
>> point otherwise of having an LTS version?
>>
>> -Max
>>
>> On 31.01.19 16:28, Thomas Weise wrote:
>> > Since you were originally 

Re: 2.7.1 (LTS) release?

2019-01-31 Thread Kenneth Knowles
By that last bit of logic, wouldn't it also work for master to public
2-SNAPSHOT? It feels a bit odd, though I don't have a concrete objection. I
expect it is easier for tools and our own scripts if we stick to 3 part
versions even when we don't have to.

Kenn

On Thu, Jan 31, 2019 at 6:18 PM Thomas Weise  wrote:

> Hi,
>
> As Kenn had already examplified, the suggestion was to have branches:
>
> 2.7
> 2.8
> 2.9
> ...
>
> and tags:
>
> 2.7.0
> 2.7.1
> ...
> 2.8.0
> ...
>
> Changes would go to the 2.7 branch, at some point release 2.7.1 is
> created. Then more changes may accrue on the same branch, maybe at some
> point 2.7.2 is released and so on.
>
> We could also consider changing the snapshot version to 2.7-SNAPSHOT,
> instead of 2.7.{0,1,...}-SNAPSHOT.
>
> With that it wouldn't even be necessary to change the version number on
> the branch.
>
> Thanks,
> Thomas
>
>
>
> On Thu, Jan 31, 2019 at 5:59 PM Michael Luckey 
> wrote:
>
>> Ah, sorry, I misread that.
>>
>> I slightly prefer the branch to have that '.x' suffix, as it is slightly
>> more explicit. But technically there will be no difference.
>>
>> On Fri, Feb 1, 2019 at 2:55 AM Chamikara Jayalath 
>> wrote:
>>
>>> Sorry, what I meant was branches+tags for each minor version release and
>>> adding updates and tags to the same branch for patch releases. Name of the
>>> branch can be release-2.X for minor version release 2.X.0 as Thomas
>>> mentioned.
>>>
>>> - Cham
>>>
>>> On Thu, Jan 31, 2019 at 5:46 PM Michael Luckey 
>>> wrote:
>>>
 Maybe we should not go so far to name branches 2.x. This will probably
 make it difficult to support more than 1 LTS. Don't know, whether we ever
 intent to do so, but supporting 2.7 and 2.13 on a 2.x branch seems
 difficult?

 A more explicit 2.7.x with tags 2.7.1, 2.7.2 etc might be better? If we
 are going to support a second LTS later on, we could just add that 2.??.x
 branch.

 michel

 On Fri, Feb 1, 2019 at 2:37 AM Chamikara Jayalath 
 wrote:

> +1 for 2.x branches and tags for 2.x.y releases.
>
> Also, I think we should integrate the dependency upgrade
> https://issues.apache.org/jira/browse/BEAM-6552 to 2.7.1 which fixes
> a rare but critical bug.
>
> Thanks,
> Cham
>
> On Thu, Jan 31, 2019 at 12:17 PM Kenneth Knowles 
> wrote:
>
>> It makes sense to me that 2.7 is a branch and just tags for 2.7.0,
>> 2.7.1, etc.
>>
>> On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise  wrote:
>>
>>> How about naming the branches release-X.Y and use them as base for
>>> all the X.Y.Z releases? We already have the X.Y.Z tags to refer to the
>>> actual release.
>>>
>>> On Thu, Jan 31, 2019 at 11:23 AM Charles Chen 
>>> wrote:
>>>
 I would be in favor of keeping the old 2.7.0 release branch / tag
 static so that referring to it will always get the right 2.7.0 code.

 On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles 
 wrote:

> I have waffled on whether to have release-2.7 and only branch
> release-2.7.1 when starting that release. I think that whenever we 
> release
> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, 
> no? Or
> perhaps on release-2.7 branch the hardcoded version strings could be
> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new 
> release
> branch? I guess I think either one is fine. I think starting the 
> branch now
> is smart, so that you can accumulate cherrypicks of backports.
>
> Kenn
>
> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
> wrote:
>
>> 2.10.0 will be done when its done. Same goes for 2.7.1, which is
>> likely going to
>> be done later since we are focusing on 2.10.0 at the moment.
>>
>> I've created the release-2.7.1 branch because there is no other
>> place for fixes
>> of future versions. It would be helpful to have a minor version
>> branch (e.g.
>> release-2.7) which can be continuously updated.
>>
>> More generally speaking, we should dedicate time for LTS
>> releases. What is the
>> point otherwise of having an LTS version?
>>
>> -Max
>>
>> On 31.01.19 16:28, Thomas Weise wrote:
>> > Since you were originally thinking of 2.9.x as target, 2.10.0
>> seems closer both
>> > in time and upgrade path.
>> >
>> > I see no reason why a 2.7.1 release would materialize any
>> sooner than 2.10.0.
>> >
>> > Or is the intention is to just stack up fixes in the 2.7.x
>> branch for a
>> > potential future release?
>> >
>> > Thomas
>> >
>> >
>> > On Thu, Jan 31, 2019 at 5:03 AM 

Re: 2.7.1 (LTS) release?

2019-01-31 Thread Thomas Weise
Hi,

As Kenn had already examplified, the suggestion was to have branches:

2.7
2.8
2.9
...

and tags:

2.7.0
2.7.1
...
2.8.0
...

Changes would go to the 2.7 branch, at some point release 2.7.1 is created.
Then more changes may accrue on the same branch, maybe at some point 2.7.2
is released and so on.

We could also consider changing the snapshot version to 2.7-SNAPSHOT,
instead of 2.7.{0,1,...}-SNAPSHOT.

With that it wouldn't even be necessary to change the version number on the
branch.

Thanks,
Thomas



On Thu, Jan 31, 2019 at 5:59 PM Michael Luckey  wrote:

> Ah, sorry, I misread that.
>
> I slightly prefer the branch to have that '.x' suffix, as it is slightly
> more explicit. But technically there will be no difference.
>
> On Fri, Feb 1, 2019 at 2:55 AM Chamikara Jayalath 
> wrote:
>
>> Sorry, what I meant was branches+tags for each minor version release and
>> adding updates and tags to the same branch for patch releases. Name of the
>> branch can be release-2.X for minor version release 2.X.0 as Thomas
>> mentioned.
>>
>> - Cham
>>
>> On Thu, Jan 31, 2019 at 5:46 PM Michael Luckey 
>> wrote:
>>
>>> Maybe we should not go so far to name branches 2.x. This will probably
>>> make it difficult to support more than 1 LTS. Don't know, whether we ever
>>> intent to do so, but supporting 2.7 and 2.13 on a 2.x branch seems
>>> difficult?
>>>
>>> A more explicit 2.7.x with tags 2.7.1, 2.7.2 etc might be better? If we
>>> are going to support a second LTS later on, we could just add that 2.??.x
>>> branch.
>>>
>>> michel
>>>
>>> On Fri, Feb 1, 2019 at 2:37 AM Chamikara Jayalath 
>>> wrote:
>>>
 +1 for 2.x branches and tags for 2.x.y releases.

 Also, I think we should integrate the dependency upgrade
 https://issues.apache.org/jira/browse/BEAM-6552 to 2.7.1 which fixes a
 rare but critical bug.

 Thanks,
 Cham

 On Thu, Jan 31, 2019 at 12:17 PM Kenneth Knowles 
 wrote:

> It makes sense to me that 2.7 is a branch and just tags for 2.7.0,
> 2.7.1, etc.
>
> On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise  wrote:
>
>> How about naming the branches release-X.Y and use them as base for
>> all the X.Y.Z releases? We already have the X.Y.Z tags to refer to the
>> actual release.
>>
>> On Thu, Jan 31, 2019 at 11:23 AM Charles Chen  wrote:
>>
>>> I would be in favor of keeping the old 2.7.0 release branch / tag
>>> static so that referring to it will always get the right 2.7.0 code.
>>>
>>> On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles 
>>> wrote:
>>>
 I have waffled on whether to have release-2.7 and only branch
 release-2.7.1 when starting that release. I think that whenever we 
 release
 2.7.n the branch for 2.7.(n+1) should start from exactly that point, 
 no? Or
 perhaps on release-2.7 branch the hardcoded version strings could be
 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new 
 release
 branch? I guess I think either one is fine. I think starting the 
 branch now
 is smart, so that you can accumulate cherrypicks of backports.

 Kenn

 On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
 wrote:

> 2.10.0 will be done when its done. Same goes for 2.7.1, which is
> likely going to
> be done later since we are focusing on 2.10.0 at the moment.
>
> I've created the release-2.7.1 branch because there is no other
> place for fixes
> of future versions. It would be helpful to have a minor version
> branch (e.g.
> release-2.7) which can be continuously updated.
>
> More generally speaking, we should dedicate time for LTS releases.
> What is the
> point otherwise of having an LTS version?
>
> -Max
>
> On 31.01.19 16:28, Thomas Weise wrote:
> > Since you were originally thinking of 2.9.x as target, 2.10.0
> seems closer both
> > in time and upgrade path.
> >
> > I see no reason why a 2.7.1 release would materialize any sooner
> than 2.10.0.
> >
> > Or is the intention is to just stack up fixes in the 2.7.x
> branch for a
> > potential future release?
> >
> > Thomas
> >
> >
> > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels <
> m...@apache.org
> > > wrote:
> >
> > I agree it's better to take some extra time to ensure the
> quality of 2.10.0.
> >
> > I've created a 2.7.1 branch and cherry-picked the relevant
> commits[1]. We could
> > start collecting other fixes in case there are any.
> >
> > -Max
> >
> > [1] https://github.com/apache/beam/pull/7687

Re: 2.7.1 (LTS) release?

2019-01-31 Thread Michael Luckey
Ah, sorry, I misread that.

I slightly prefer the branch to have that '.x' suffix, as it is slightly
more explicit. But technically there will be no difference.

On Fri, Feb 1, 2019 at 2:55 AM Chamikara Jayalath 
wrote:

> Sorry, what I meant was branches+tags for each minor version release and
> adding updates and tags to the same branch for patch releases. Name of the
> branch can be release-2.X for minor version release 2.X.0 as Thomas
> mentioned.
>
> - Cham
>
> On Thu, Jan 31, 2019 at 5:46 PM Michael Luckey 
> wrote:
>
>> Maybe we should not go so far to name branches 2.x. This will probably
>> make it difficult to support more than 1 LTS. Don't know, whether we ever
>> intent to do so, but supporting 2.7 and 2.13 on a 2.x branch seems
>> difficult?
>>
>> A more explicit 2.7.x with tags 2.7.1, 2.7.2 etc might be better? If we
>> are going to support a second LTS later on, we could just add that 2.??.x
>> branch.
>>
>> michel
>>
>> On Fri, Feb 1, 2019 at 2:37 AM Chamikara Jayalath 
>> wrote:
>>
>>> +1 for 2.x branches and tags for 2.x.y releases.
>>>
>>> Also, I think we should integrate the dependency upgrade
>>> https://issues.apache.org/jira/browse/BEAM-6552 to 2.7.1 which fixes a
>>> rare but critical bug.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Thu, Jan 31, 2019 at 12:17 PM Kenneth Knowles  wrote:
>>>
 It makes sense to me that 2.7 is a branch and just tags for 2.7.0,
 2.7.1, etc.

 On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise  wrote:

> How about naming the branches release-X.Y and use them as base for all
> the X.Y.Z releases? We already have the X.Y.Z tags to refer to the actual
> release.
>
> On Thu, Jan 31, 2019 at 11:23 AM Charles Chen  wrote:
>
>> I would be in favor of keeping the old 2.7.0 release branch / tag
>> static so that referring to it will always get the right 2.7.0 code.
>>
>> On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles 
>> wrote:
>>
>>> I have waffled on whether to have release-2.7 and only branch
>>> release-2.7.1 when starting that release. I think that whenever we 
>>> release
>>> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, 
>>> no? Or
>>> perhaps on release-2.7 branch the hardcoded version strings could be
>>> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new 
>>> release
>>> branch? I guess I think either one is fine. I think starting the branch 
>>> now
>>> is smart, so that you can accumulate cherrypicks of backports.
>>>
>>> Kenn
>>>
>>> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
>>> wrote:
>>>
 2.10.0 will be done when its done. Same goes for 2.7.1, which is
 likely going to
 be done later since we are focusing on 2.10.0 at the moment.

 I've created the release-2.7.1 branch because there is no other
 place for fixes
 of future versions. It would be helpful to have a minor version
 branch (e.g.
 release-2.7) which can be continuously updated.

 More generally speaking, we should dedicate time for LTS releases.
 What is the
 point otherwise of having an LTS version?

 -Max

 On 31.01.19 16:28, Thomas Weise wrote:
 > Since you were originally thinking of 2.9.x as target, 2.10.0
 seems closer both
 > in time and upgrade path.
 >
 > I see no reason why a 2.7.1 release would materialize any sooner
 than 2.10.0.
 >
 > Or is the intention is to just stack up fixes in the 2.7.x branch
 for a
 > potential future release?
 >
 > Thomas
 >
 >
 > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels <
 m...@apache.org
 > > wrote:
 >
 > I agree it's better to take some extra time to ensure the
 quality of 2.10.0.
 >
 > I've created a 2.7.1 branch and cherry-picked the relevant
 commits[1]. We could
 > start collecting other fixes in case there are any.
 >
 > -Max
 >
 > [1] https://github.com/apache/beam/pull/7687
 >
 > On 30.01.19 20:57, Kenneth Knowles wrote:
 >  > Sounds good to me to target 2.7.1 and 2.10.0. I will have
 to re-roll RC2
 > after
 >  > confirming fixes for the latest blockers that were found.
 These are not
 >  > regressions from 2.9.0. But they seem severe enough that
 they are worth
 > taking
 >  > an extra day or two, because 2.9.0 had enough problems
 that I would like
 > to make
 >  > 2.10.0 a more attractive upgrade target for users still on
 very old versions.
 >  >
 >  > Kenn
 >  >

Re: 2.7.1 (LTS) release?

2019-01-31 Thread Chamikara Jayalath
Sorry, what I meant was branches+tags for each minor version release and
adding updates and tags to the same branch for patch releases. Name of the
branch can be release-2.X for minor version release 2.X.0 as Thomas
mentioned.

- Cham

On Thu, Jan 31, 2019 at 5:46 PM Michael Luckey  wrote:

> Maybe we should not go so far to name branches 2.x. This will probably
> make it difficult to support more than 1 LTS. Don't know, whether we ever
> intent to do so, but supporting 2.7 and 2.13 on a 2.x branch seems
> difficult?
>
> A more explicit 2.7.x with tags 2.7.1, 2.7.2 etc might be better? If we
> are going to support a second LTS later on, we could just add that 2.??.x
> branch.
>
> michel
>
> On Fri, Feb 1, 2019 at 2:37 AM Chamikara Jayalath 
> wrote:
>
>> +1 for 2.x branches and tags for 2.x.y releases.
>>
>> Also, I think we should integrate the dependency upgrade
>> https://issues.apache.org/jira/browse/BEAM-6552 to 2.7.1 which fixes a
>> rare but critical bug.
>>
>> Thanks,
>> Cham
>>
>> On Thu, Jan 31, 2019 at 12:17 PM Kenneth Knowles  wrote:
>>
>>> It makes sense to me that 2.7 is a branch and just tags for 2.7.0,
>>> 2.7.1, etc.
>>>
>>> On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise  wrote:
>>>
 How about naming the branches release-X.Y and use them as base for all
 the X.Y.Z releases? We already have the X.Y.Z tags to refer to the actual
 release.

 On Thu, Jan 31, 2019 at 11:23 AM Charles Chen  wrote:

> I would be in favor of keeping the old 2.7.0 release branch / tag
> static so that referring to it will always get the right 2.7.0 code.
>
> On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles 
> wrote:
>
>> I have waffled on whether to have release-2.7 and only branch
>> release-2.7.1 when starting that release. I think that whenever we 
>> release
>> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, no? 
>> Or
>> perhaps on release-2.7 branch the hardcoded version strings could be
>> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new 
>> release
>> branch? I guess I think either one is fine. I think starting the branch 
>> now
>> is smart, so that you can accumulate cherrypicks of backports.
>>
>> Kenn
>>
>> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
>> wrote:
>>
>>> 2.10.0 will be done when its done. Same goes for 2.7.1, which is
>>> likely going to
>>> be done later since we are focusing on 2.10.0 at the moment.
>>>
>>> I've created the release-2.7.1 branch because there is no other
>>> place for fixes
>>> of future versions. It would be helpful to have a minor version
>>> branch (e.g.
>>> release-2.7) which can be continuously updated.
>>>
>>> More generally speaking, we should dedicate time for LTS releases.
>>> What is the
>>> point otherwise of having an LTS version?
>>>
>>> -Max
>>>
>>> On 31.01.19 16:28, Thomas Weise wrote:
>>> > Since you were originally thinking of 2.9.x as target, 2.10.0
>>> seems closer both
>>> > in time and upgrade path.
>>> >
>>> > I see no reason why a 2.7.1 release would materialize any sooner
>>> than 2.10.0.
>>> >
>>> > Or is the intention is to just stack up fixes in the 2.7.x branch
>>> for a
>>> > potential future release?
>>> >
>>> > Thomas
>>> >
>>> >
>>> > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels >> > > wrote:
>>> >
>>> > I agree it's better to take some extra time to ensure the
>>> quality of 2.10.0.
>>> >
>>> > I've created a 2.7.1 branch and cherry-picked the relevant
>>> commits[1]. We could
>>> > start collecting other fixes in case there are any.
>>> >
>>> > -Max
>>> >
>>> > [1] https://github.com/apache/beam/pull/7687
>>> >
>>> > On 30.01.19 20:57, Kenneth Knowles wrote:
>>> >  > Sounds good to me to target 2.7.1 and 2.10.0. I will have
>>> to re-roll RC2
>>> > after
>>> >  > confirming fixes for the latest blockers that were found.
>>> These are not
>>> >  > regressions from 2.9.0. But they seem severe enough that
>>> they are worth
>>> > taking
>>> >  > an extra day or two, because 2.9.0 had enough problems that
>>> I would like
>>> > to make
>>> >  > 2.10.0 a more attractive upgrade target for users still on
>>> very old versions.
>>> >  >
>>> >  > Kenn
>>> >  >
>>> >  > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels <
>>> m...@apache.org
>>> > 
>>> >  > >> wrote:
>>> >  >
>>> >  > Hi everyone,
>>> >  >
>>> >  > I know we are in the midst of releasing 2.10.0, but
>>> with the release
>>> > 

Re: 2.7.1 (LTS) release?

2019-01-31 Thread Michael Luckey
Maybe we should not go so far to name branches 2.x. This will probably make
it difficult to support more than 1 LTS. Don't know, whether we ever intent
to do so, but supporting 2.7 and 2.13 on a 2.x branch seems difficult?

A more explicit 2.7.x with tags 2.7.1, 2.7.2 etc might be better? If we are
going to support a second LTS later on, we could just add that 2.??.x
branch.

michel

On Fri, Feb 1, 2019 at 2:37 AM Chamikara Jayalath 
wrote:

> +1 for 2.x branches and tags for 2.x.y releases.
>
> Also, I think we should integrate the dependency upgrade
> https://issues.apache.org/jira/browse/BEAM-6552 to 2.7.1 which fixes a
> rare but critical bug.
>
> Thanks,
> Cham
>
> On Thu, Jan 31, 2019 at 12:17 PM Kenneth Knowles  wrote:
>
>> It makes sense to me that 2.7 is a branch and just tags for 2.7.0, 2.7.1,
>> etc.
>>
>> On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise  wrote:
>>
>>> How about naming the branches release-X.Y and use them as base for all
>>> the X.Y.Z releases? We already have the X.Y.Z tags to refer to the actual
>>> release.
>>>
>>> On Thu, Jan 31, 2019 at 11:23 AM Charles Chen  wrote:
>>>
 I would be in favor of keeping the old 2.7.0 release branch / tag
 static so that referring to it will always get the right 2.7.0 code.

 On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles 
 wrote:

> I have waffled on whether to have release-2.7 and only branch
> release-2.7.1 when starting that release. I think that whenever we release
> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, no? 
> Or
> perhaps on release-2.7 branch the hardcoded version strings could be
> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new 
> release
> branch? I guess I think either one is fine. I think starting the branch 
> now
> is smart, so that you can accumulate cherrypicks of backports.
>
> Kenn
>
> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
> wrote:
>
>> 2.10.0 will be done when its done. Same goes for 2.7.1, which is
>> likely going to
>> be done later since we are focusing on 2.10.0 at the moment.
>>
>> I've created the release-2.7.1 branch because there is no other place
>> for fixes
>> of future versions. It would be helpful to have a minor version
>> branch (e.g.
>> release-2.7) which can be continuously updated.
>>
>> More generally speaking, we should dedicate time for LTS releases.
>> What is the
>> point otherwise of having an LTS version?
>>
>> -Max
>>
>> On 31.01.19 16:28, Thomas Weise wrote:
>> > Since you were originally thinking of 2.9.x as target, 2.10.0 seems
>> closer both
>> > in time and upgrade path.
>> >
>> > I see no reason why a 2.7.1 release would materialize any sooner
>> than 2.10.0.
>> >
>> > Or is the intention is to just stack up fixes in the 2.7.x branch
>> for a
>> > potential future release?
>> >
>> > Thomas
>> >
>> >
>> > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels > > > wrote:
>> >
>> > I agree it's better to take some extra time to ensure the
>> quality of 2.10.0.
>> >
>> > I've created a 2.7.1 branch and cherry-picked the relevant
>> commits[1]. We could
>> > start collecting other fixes in case there are any.
>> >
>> > -Max
>> >
>> > [1] https://github.com/apache/beam/pull/7687
>> >
>> > On 30.01.19 20:57, Kenneth Knowles wrote:
>> >  > Sounds good to me to target 2.7.1 and 2.10.0. I will have to
>> re-roll RC2
>> > after
>> >  > confirming fixes for the latest blockers that were found.
>> These are not
>> >  > regressions from 2.9.0. But they seem severe enough that
>> they are worth
>> > taking
>> >  > an extra day or two, because 2.9.0 had enough problems that
>> I would like
>> > to make
>> >  > 2.10.0 a more attractive upgrade target for users still on
>> very old versions.
>> >  >
>> >  > Kenn
>> >  >
>> >  > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels <
>> m...@apache.org
>> > 
>> >  > >> wrote:
>> >  >
>> >  > Hi everyone,
>> >  >
>> >  > I know we are in the midst of releasing 2.10.0, but with
>> the release
>> > process
>> >  > taking its time I consider creating a patch release for
>> this issue in the
>> >  > FlinkRunner:
>> https://jira.apache.org/jira/browse/BEAM-5386
>> >  >
>> >  > Initially I thought it would be good to do a 2.9.1
>> release, but since we
>> >  > have an
>> >  > LTS version, we should probably do a 2.7.1 (LTS) release
>> instead.
>> >  

Re: 2.7.1 (LTS) release?

2019-01-31 Thread Chamikara Jayalath
+1 for 2.x branches and tags for 2.x.y releases.

Also, I think we should integrate the dependency upgrade
https://issues.apache.org/jira/browse/BEAM-6552 to 2.7.1 which fixes a rare
but critical bug.

Thanks,
Cham

On Thu, Jan 31, 2019 at 12:17 PM Kenneth Knowles  wrote:

> It makes sense to me that 2.7 is a branch and just tags for 2.7.0, 2.7.1,
> etc.
>
> On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise  wrote:
>
>> How about naming the branches release-X.Y and use them as base for all
>> the X.Y.Z releases? We already have the X.Y.Z tags to refer to the actual
>> release.
>>
>> On Thu, Jan 31, 2019 at 11:23 AM Charles Chen  wrote:
>>
>>> I would be in favor of keeping the old 2.7.0 release branch / tag static
>>> so that referring to it will always get the right 2.7.0 code.
>>>
>>> On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles 
>>> wrote:
>>>
 I have waffled on whether to have release-2.7 and only branch
 release-2.7.1 when starting that release. I think that whenever we release
 2.7.n the branch for 2.7.(n+1) should start from exactly that point, no? Or
 perhaps on release-2.7 branch the hardcoded version strings could be
 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new release
 branch? I guess I think either one is fine. I think starting the branch now
 is smart, so that you can accumulate cherrypicks of backports.

 Kenn

 On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
 wrote:

> 2.10.0 will be done when its done. Same goes for 2.7.1, which is
> likely going to
> be done later since we are focusing on 2.10.0 at the moment.
>
> I've created the release-2.7.1 branch because there is no other place
> for fixes
> of future versions. It would be helpful to have a minor version branch
> (e.g.
> release-2.7) which can be continuously updated.
>
> More generally speaking, we should dedicate time for LTS releases.
> What is the
> point otherwise of having an LTS version?
>
> -Max
>
> On 31.01.19 16:28, Thomas Weise wrote:
> > Since you were originally thinking of 2.9.x as target, 2.10.0 seems
> closer both
> > in time and upgrade path.
> >
> > I see no reason why a 2.7.1 release would materialize any sooner
> than 2.10.0.
> >
> > Or is the intention is to just stack up fixes in the 2.7.x branch
> for a
> > potential future release?
> >
> > Thomas
> >
> >
> > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels  > > wrote:
> >
> > I agree it's better to take some extra time to ensure the
> quality of 2.10.0.
> >
> > I've created a 2.7.1 branch and cherry-picked the relevant
> commits[1]. We could
> > start collecting other fixes in case there are any.
> >
> > -Max
> >
> > [1] https://github.com/apache/beam/pull/7687
> >
> > On 30.01.19 20:57, Kenneth Knowles wrote:
> >  > Sounds good to me to target 2.7.1 and 2.10.0. I will have to
> re-roll RC2
> > after
> >  > confirming fixes for the latest blockers that were found.
> These are not
> >  > regressions from 2.9.0. But they seem severe enough that they
> are worth
> > taking
> >  > an extra day or two, because 2.9.0 had enough problems that I
> would like
> > to make
> >  > 2.10.0 a more attractive upgrade target for users still on
> very old versions.
> >  >
> >  > Kenn
> >  >
> >  > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels <
> m...@apache.org
> > 
> >  > >> wrote:
> >  >
> >  > Hi everyone,
> >  >
> >  > I know we are in the midst of releasing 2.10.0, but with
> the release
> > process
> >  > taking its time I consider creating a patch release for
> this issue in the
> >  > FlinkRunner:
> https://jira.apache.org/jira/browse/BEAM-5386
> >  >
> >  > Initially I thought it would be good to do a 2.9.1
> release, but since we
> >  > have an
> >  > LTS version, we should probably do a 2.7.1 (LTS) release
> instead.
> >  >
> >  > What do you think? I could only find one Fix Version
> 2.7.1 issue in JIRA:
> >  >
> >
> https://jira.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.7.1
> >  >
> >  > Best,
> >  > Max
> >  >
> >
>



Re: Another new contributor!

2019-01-31 Thread Mikhail Gryzykhin
Welcome to the community!

On Thu, Jan 31, 2019, 16:50 Alex Amato  wrote:

> Great to start working with you Brian, welcome.
>
> On Thu, Jan 31, 2019 at 4:23 PM Brian Hulette  wrote:
>
>> Can I get bhulette@ added to the BEAM project in Jira?
>>
>> Gleb - I'd definitely be interested in seeing that discussion, is it
>> available somewhere in the archives? I can't find it.
>>
>> On Thu, Jan 31, 2019 at 5:13 AM Gleb Kanterov  wrote:
>>
>>> Welcome! Would be interesting to hear your thoughts on Arrow, Arrow
>>> Flight, and Beam Portability relation, this topic was recently discussed in
>>> dev@.
>>>
>>> On Thu, Jan 31, 2019 at 2:00 PM Ismaël Mejía  wrote:
>>>
 Welcome Brian!
 Great to have someone with Apache experience already and also with
 Arrow knowledge.

 On Thu, Jan 31, 2019 at 1:32 PM Maximilian Michels 
 wrote:
 >
 > Welcome! Arrow and Beam together would open lots of possibilities.
 Portability
 > documentation improvements would be much appreciated :)
 >
 > On 31.01.19 11:25, Łukasz Gajowy wrote:
 > > Welcome!
 > >
 > > czw., 31 sty 2019 o 02:40 Kenneth Knowles >>> > > > napisał(a):
 > >
 > > Welcome!
 > >
 > > On Wed, Jan 30, 2019, 17:30 Connell O'Callaghan <
 conne...@google.com
 > >  wrote:
 > >
 > > Welcome on board Brian!
 > >
 > > On Wed, Jan 30, 2019 at 5:29 PM Ahmet Altay <
 al...@google.com
 > > > wrote:
 > >
 > > Welcome Brian!
 > >
 > > On Wed, Jan 30, 2019 at 5:26 PM Brian Hulette <
 bhule...@google.com
 > > > wrote:
 > >
 > > Hi everyone,
 > > I'm Brian Hulette, I just switched roles at Google
 and I'll be
 > > contributing to Beam Portability as part of my new
 position. For
 > > now I'm just going through documentation and
 getting familiar
 > > with Beam from the user perspective, so if anything
 I'll just be
 > > suggesting minor edits to documentation, but I hope
 to be
 > > putting up PRs soon enough.
 > >
 > > I am also an Apache committer (bhulette is my ASF
 id and Jira
 > > username). I worked on the Arrow project's
 Javascript
 > > implementation in a previous job, and I'm really
 excited to look
 > > for ways to use Arrow and Beam together once I've
 ramped up.
 > >
 > > Brian
 > >

>>>
>>>
>>> --
>>> Cheers,
>>> Gleb
>>>
>>


Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Ahmet Altay
+1 to Thomas's idea as a way to enable python users on Flink. On the other
hand his will be a throwaway work once SDF is supported. How far are we
from SDF support?

On Thu, Jan 31, 2019 at 9:18 AM Maximilian Michels  wrote:

> Ah, I thought you meant native Flink transforms.
>
> Exactly! The translation code is already there. The main challenge is how
> to
> programmatically configure the BeamIO from Python. I suppose that is also
> an
> unsolved problem for cross-language transforms in general.
>
> For Matthias' pipeline with PubSubIO we can build something specific, but
> for
> the general case there should be way to initialize a Beam IO via a
> configuration
> map provided by an external environment.
>
> On 31.01.19 17:36, Thomas Weise wrote:
> > Exactly, that's what I had in mind.
> >
> > A Flink runner native transform would make the existing unbounded
> sources
> > available, similar to:
> >
> >
> https://github.com/apache/beam/blob/2e89c1e4d35e7b5f95a622259d23d921c3d6ad1f/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java#L167
> >
> >
> >
> >
> > On Thu, Jan 31, 2019 at 8:18 AM Maximilian Michels  > > wrote:
> >
> > Wouldn't it be even more useful for the transition period if we
> enabled Beam IO
> > to be used via Flink (like in the legacy Flink Runner)? In this
> particular
> > example, Matthias wants to use PubSubIO, which is not even available
> as a
> > native
> > Flink transform.
> >
> > On 31.01.19 16:21, Thomas Weise wrote:
> >  > Until SDF is supported, we could also add Flink runner native
> transforms for
> >  > selected unbounded sources [1].
> >  >
> >  > That might be a reasonable option to unblock users that want to
> try Python
> >  > streaming on Flink.
> >  >
> >  > Thomas
> >  >
> >  > [1]
> >  >
> >
> https://github.com/lyft/beam/blob/release-2.10.0-lyft/runners/flink/src/main/java/org/apache/beam/runners/flink/LyftFlinkStreamingPortableTranslations.java
> >  >
> >  >
> >  > On Thu, Jan 31, 2019 at 6:51 AM Maximilian Michels <
> m...@apache.org
> > 
> >  > >> wrote:
> >  >
> >  >  > I have a hard time to imagine how can we map in a generic
> way
> >  > RestrictionTrackers into the existing
> Bounded/UnboundedSource, so I would
> >  > love to hear more about the details.
> >  >
> >  > Isn't it the other way around? The SDF is a generalization of
> > UnboundedSource.
> >  > So we would wrap UnboundedSource using SDF. I'm not saying it
> is
> > trivial, but
> >  > SDF offers all the functionality that UnboundedSource needs.
> >  >
> >  > For example, the @GetInitialRestriction method would call
> split on the
> >  > UnboundedSource and the restriction trackers would then be
> used to
> > process the
> >  > splits.
> >  >
> >  > On 31.01.19 15:16, Ismaël Mejía wrote:
> >  >  >> Not necessarily. This would be one way. Another way is
> build an SDF
> >  > wrapper for UnboundedSource. Probably the easier path for
> migration.
> >  >  >
> >  >  > That would be fantastic, I have heard about such wrapper
> multiple
> >  >  > times but so far there is not any realistic proposal. I
> have a hard
> >  >  > time to imagine how can we map in a generic way
> RestrictionTrackers
> >  >  > into the existing Bounded/UnboundedSource, so I would love
> to hear
> >  >  > more about the details.
> >  >  >
> >  >  > On Thu, Jan 31, 2019 at 3:07 PM Maximilian Michels <
> m...@apache.org
> > 
> >  > >> wrote:
> >  >  >>
> >  >  >>   > In addition to have support in the runners, this will
> require a
> >  >  >>   > rewrite of PubsubIO to use the new SDF API.
> >  >  >>
> >  >  >> Not necessarily. This would be one way. Another way is
> build an SDF
> >  > wrapper for
> >  >  >> UnboundedSource. Probably the easier path for migration.
> >  >  >>
> >  >  >> On 31.01.19 14:03, Ismaël Mejía wrote:
> >  >   Fortunately, there is already a pending PR for
> cross-language
> >  > pipelines which
> >  >   will allow us to use Java IO like PubSub in Python jobs.
> >  >  >>>
> >  >  >>> In addition to have support in the runners, this will
> require a
> >  >  >>> rewrite of PubsubIO to use the new SDF API.
> >  >  >>>
> >  >  >>> On Thu, Jan 31, 2019 at 12:23 PM Maximilian Michels
> > mailto:m...@apache.org>
> >  > >> wrote:
> >  >  
> >  >   Hi Matthias,
> >  >  
> >  >   This 

Re: Another new contributor!

2019-01-31 Thread Alex Amato
Great to start working with you Brian, welcome.

On Thu, Jan 31, 2019 at 4:23 PM Brian Hulette  wrote:

> Can I get bhulette@ added to the BEAM project in Jira?
>
> Gleb - I'd definitely be interested in seeing that discussion, is it
> available somewhere in the archives? I can't find it.
>
> On Thu, Jan 31, 2019 at 5:13 AM Gleb Kanterov  wrote:
>
>> Welcome! Would be interesting to hear your thoughts on Arrow, Arrow
>> Flight, and Beam Portability relation, this topic was recently discussed in
>> dev@.
>>
>> On Thu, Jan 31, 2019 at 2:00 PM Ismaël Mejía  wrote:
>>
>>> Welcome Brian!
>>> Great to have someone with Apache experience already and also with
>>> Arrow knowledge.
>>>
>>> On Thu, Jan 31, 2019 at 1:32 PM Maximilian Michels 
>>> wrote:
>>> >
>>> > Welcome! Arrow and Beam together would open lots of possibilities.
>>> Portability
>>> > documentation improvements would be much appreciated :)
>>> >
>>> > On 31.01.19 11:25, Łukasz Gajowy wrote:
>>> > > Welcome!
>>> > >
>>> > > czw., 31 sty 2019 o 02:40 Kenneth Knowles >> > > > napisał(a):
>>> > >
>>> > > Welcome!
>>> > >
>>> > > On Wed, Jan 30, 2019, 17:30 Connell O'Callaghan <
>>> conne...@google.com
>>> > >  wrote:
>>> > >
>>> > > Welcome on board Brian!
>>> > >
>>> > > On Wed, Jan 30, 2019 at 5:29 PM Ahmet Altay <
>>> al...@google.com
>>> > > > wrote:
>>> > >
>>> > > Welcome Brian!
>>> > >
>>> > > On Wed, Jan 30, 2019 at 5:26 PM Brian Hulette <
>>> bhule...@google.com
>>> > > > wrote:
>>> > >
>>> > > Hi everyone,
>>> > > I'm Brian Hulette, I just switched roles at Google
>>> and I'll be
>>> > > contributing to Beam Portability as part of my new
>>> position. For
>>> > > now I'm just going through documentation and getting
>>> familiar
>>> > > with Beam from the user perspective, so if anything
>>> I'll just be
>>> > > suggesting minor edits to documentation, but I hope
>>> to be
>>> > > putting up PRs soon enough.
>>> > >
>>> > > I am also an Apache committer (bhulette is my ASF id
>>> and Jira
>>> > > username). I worked on the Arrow project's Javascript
>>> > > implementation in a previous job, and I'm really
>>> excited to look
>>> > > for ways to use Arrow and Beam together once I've
>>> ramped up.
>>> > >
>>> > > Brian
>>> > >
>>>
>>
>>
>> --
>> Cheers,
>> Gleb
>>
>


Re: Another new contributor!

2019-01-31 Thread Brian Hulette
Can I get bhulette@ added to the BEAM project in Jira?

Gleb - I'd definitely be interested in seeing that discussion, is it
available somewhere in the archives? I can't find it.

On Thu, Jan 31, 2019 at 5:13 AM Gleb Kanterov  wrote:

> Welcome! Would be interesting to hear your thoughts on Arrow, Arrow
> Flight, and Beam Portability relation, this topic was recently discussed in
> dev@.
>
> On Thu, Jan 31, 2019 at 2:00 PM Ismaël Mejía  wrote:
>
>> Welcome Brian!
>> Great to have someone with Apache experience already and also with
>> Arrow knowledge.
>>
>> On Thu, Jan 31, 2019 at 1:32 PM Maximilian Michels 
>> wrote:
>> >
>> > Welcome! Arrow and Beam together would open lots of possibilities.
>> Portability
>> > documentation improvements would be much appreciated :)
>> >
>> > On 31.01.19 11:25, Łukasz Gajowy wrote:
>> > > Welcome!
>> > >
>> > > czw., 31 sty 2019 o 02:40 Kenneth Knowles > > > > napisał(a):
>> > >
>> > > Welcome!
>> > >
>> > > On Wed, Jan 30, 2019, 17:30 Connell O'Callaghan <
>> conne...@google.com
>> > >  wrote:
>> > >
>> > > Welcome on board Brian!
>> > >
>> > > On Wed, Jan 30, 2019 at 5:29 PM Ahmet Altay > > > > wrote:
>> > >
>> > > Welcome Brian!
>> > >
>> > > On Wed, Jan 30, 2019 at 5:26 PM Brian Hulette <
>> bhule...@google.com
>> > > > wrote:
>> > >
>> > > Hi everyone,
>> > > I'm Brian Hulette, I just switched roles at Google
>> and I'll be
>> > > contributing to Beam Portability as part of my new
>> position. For
>> > > now I'm just going through documentation and getting
>> familiar
>> > > with Beam from the user perspective, so if anything
>> I'll just be
>> > > suggesting minor edits to documentation, but I hope
>> to be
>> > > putting up PRs soon enough.
>> > >
>> > > I am also an Apache committer (bhulette is my ASF id
>> and Jira
>> > > username). I worked on the Arrow project's Javascript
>> > > implementation in a previous job, and I'm really
>> excited to look
>> > > for ways to use Arrow and Beam together once I've
>> ramped up.
>> > >
>> > > Brian
>> > >
>>
>
>
> --
> Cheers,
> Gleb
>


Re: [Proposal] Get Metrics API: Metric Extraction via proto RPC API.

2019-01-31 Thread Ismaël Mejía
Please don't forget to add this document to the design documents webpage.

On Thu, Jan 31, 2019 at 8:46 PM Alex Amato  wrote:
>
> Hello Beam,
>
> Robert Ryan and I have been designing a metric extraction API for Beam. 
> Please take a look at this design, I would love to get more feedback on this 
> to improve the design.
>
> https://s.apache.org/get-metrics-api
>
> The primary goal of this proposal is to offer a simple way to obtain all the 
> metrics for a job. The following issues are addressed:
>
> The current design requires implementing metric querying for every 
> runner+language combination.
>
> Duplication of MetricResult related classes in each language.
>
> The existing MetricResult format only allows querying metrics defined by a 
> namespace, name and step, and does not allow generalized labelling as used by 
> MonitoringInfos.
>
> Enhance Beam’s ability to integration test new metrics
>
>
> Thank for taking a look,
> Alex


Re: 2.7.1 (LTS) release?

2019-01-31 Thread Kenneth Knowles
It makes sense to me that 2.7 is a branch and just tags for 2.7.0, 2.7.1,
etc.

On Thu, Jan 31, 2019 at 11:43 AM Thomas Weise  wrote:

> How about naming the branches release-X.Y and use them as base for all the
> X.Y.Z releases? We already have the X.Y.Z tags to refer to the actual
> release.
>
> On Thu, Jan 31, 2019 at 11:23 AM Charles Chen  wrote:
>
>> I would be in favor of keeping the old 2.7.0 release branch / tag static
>> so that referring to it will always get the right 2.7.0 code.
>>
>> On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles  wrote:
>>
>>> I have waffled on whether to have release-2.7 and only branch
>>> release-2.7.1 when starting that release. I think that whenever we release
>>> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, no? Or
>>> perhaps on release-2.7 branch the hardcoded version strings could be
>>> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new release
>>> branch? I guess I think either one is fine. I think starting the branch now
>>> is smart, so that you can accumulate cherrypicks of backports.
>>>
>>> Kenn
>>>
>>> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
>>> wrote:
>>>
 2.10.0 will be done when its done. Same goes for 2.7.1, which is likely
 going to
 be done later since we are focusing on 2.10.0 at the moment.

 I've created the release-2.7.1 branch because there is no other place
 for fixes
 of future versions. It would be helpful to have a minor version branch
 (e.g.
 release-2.7) which can be continuously updated.

 More generally speaking, we should dedicate time for LTS releases. What
 is the
 point otherwise of having an LTS version?

 -Max

 On 31.01.19 16:28, Thomas Weise wrote:
 > Since you were originally thinking of 2.9.x as target, 2.10.0 seems
 closer both
 > in time and upgrade path.
 >
 > I see no reason why a 2.7.1 release would materialize any sooner than
 2.10.0.
 >
 > Or is the intention is to just stack up fixes in the 2.7.x branch for
 a
 > potential future release?
 >
 > Thomas
 >
 >
 > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels >>> > > wrote:
 >
 > I agree it's better to take some extra time to ensure the quality
 of 2.10.0.
 >
 > I've created a 2.7.1 branch and cherry-picked the relevant
 commits[1]. We could
 > start collecting other fixes in case there are any.
 >
 > -Max
 >
 > [1] https://github.com/apache/beam/pull/7687
 >
 > On 30.01.19 20:57, Kenneth Knowles wrote:
 >  > Sounds good to me to target 2.7.1 and 2.10.0. I will have to
 re-roll RC2
 > after
 >  > confirming fixes for the latest blockers that were found.
 These are not
 >  > regressions from 2.9.0. But they seem severe enough that they
 are worth
 > taking
 >  > an extra day or two, because 2.9.0 had enough problems that I
 would like
 > to make
 >  > 2.10.0 a more attractive upgrade target for users still on
 very old versions.
 >  >
 >  > Kenn
 >  >
 >  > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels <
 m...@apache.org
 > 
 >  > >> wrote:
 >  >
 >  > Hi everyone,
 >  >
 >  > I know we are in the midst of releasing 2.10.0, but with
 the release
 > process
 >  > taking its time I consider creating a patch release for
 this issue in the
 >  > FlinkRunner: https://jira.apache.org/jira/browse/BEAM-5386
 >  >
 >  > Initially I thought it would be good to do a 2.9.1
 release, but since we
 >  > have an
 >  > LTS version, we should probably do a 2.7.1 (LTS) release
 instead.
 >  >
 >  > What do you think? I could only find one Fix Version 2.7.1
 issue in JIRA:
 >  >
 >
 https://jira.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.7.1
 >  >
 >  > Best,
 >  > Max
 >  >
 >

>>>


[Proposal] Get Metrics API: Metric Extraction via proto RPC API.

2019-01-31 Thread Alex Amato
Hello Beam,

Robert Ryan and I have been designing a metric extraction API for Beam.
Please take a look at this design, I would love to get more feedback on
this to improve the design.

https://s.apache.org/get-metrics-api

The primary goal of this proposal is to offer a simple way to obtain all
the metrics for a job. The following issues are addressed:

   -

   The current design requires implementing metric querying for every
   runner+language combination.
   -

   Duplication of MetricResult related classes in each language.
   -

   The existing MetricResult format only allows querying metrics defined by
   a namespace, name and step, and does not allow generalized labelling as
   used by MonitoringInfos.
   -

   Enhance Beam’s ability to integration test new metrics


Thank for taking a look,
Alex


Re: 2.7.1 (LTS) release?

2019-01-31 Thread Thomas Weise
How about naming the branches release-X.Y and use them as base for all the
X.Y.Z releases? We already have the X.Y.Z tags to refer to the actual
release.

On Thu, Jan 31, 2019 at 11:23 AM Charles Chen  wrote:

> I would be in favor of keeping the old 2.7.0 release branch / tag static
> so that referring to it will always get the right 2.7.0 code.
>
> On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles  wrote:
>
>> I have waffled on whether to have release-2.7 and only branch
>> release-2.7.1 when starting that release. I think that whenever we release
>> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, no? Or
>> perhaps on release-2.7 branch the hardcoded version strings could be
>> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new release
>> branch? I guess I think either one is fine. I think starting the branch now
>> is smart, so that you can accumulate cherrypicks of backports.
>>
>> Kenn
>>
>> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels 
>> wrote:
>>
>>> 2.10.0 will be done when its done. Same goes for 2.7.1, which is likely
>>> going to
>>> be done later since we are focusing on 2.10.0 at the moment.
>>>
>>> I've created the release-2.7.1 branch because there is no other place
>>> for fixes
>>> of future versions. It would be helpful to have a minor version branch
>>> (e.g.
>>> release-2.7) which can be continuously updated.
>>>
>>> More generally speaking, we should dedicate time for LTS releases. What
>>> is the
>>> point otherwise of having an LTS version?
>>>
>>> -Max
>>>
>>> On 31.01.19 16:28, Thomas Weise wrote:
>>> > Since you were originally thinking of 2.9.x as target, 2.10.0 seems
>>> closer both
>>> > in time and upgrade path.
>>> >
>>> > I see no reason why a 2.7.1 release would materialize any sooner than
>>> 2.10.0.
>>> >
>>> > Or is the intention is to just stack up fixes in the 2.7.x branch for
>>> a
>>> > potential future release?
>>> >
>>> > Thomas
>>> >
>>> >
>>> > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels >> > > wrote:
>>> >
>>> > I agree it's better to take some extra time to ensure the quality
>>> of 2.10.0.
>>> >
>>> > I've created a 2.7.1 branch and cherry-picked the relevant
>>> commits[1]. We could
>>> > start collecting other fixes in case there are any.
>>> >
>>> > -Max
>>> >
>>> > [1] https://github.com/apache/beam/pull/7687
>>> >
>>> > On 30.01.19 20:57, Kenneth Knowles wrote:
>>> >  > Sounds good to me to target 2.7.1 and 2.10.0. I will have to
>>> re-roll RC2
>>> > after
>>> >  > confirming fixes for the latest blockers that were found. These
>>> are not
>>> >  > regressions from 2.9.0. But they seem severe enough that they
>>> are worth
>>> > taking
>>> >  > an extra day or two, because 2.9.0 had enough problems that I
>>> would like
>>> > to make
>>> >  > 2.10.0 a more attractive upgrade target for users still on very
>>> old versions.
>>> >  >
>>> >  > Kenn
>>> >  >
>>> >  > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels <
>>> m...@apache.org
>>> > 
>>> >  > >> wrote:
>>> >  >
>>> >  > Hi everyone,
>>> >  >
>>> >  > I know we are in the midst of releasing 2.10.0, but with
>>> the release
>>> > process
>>> >  > taking its time I consider creating a patch release for
>>> this issue in the
>>> >  > FlinkRunner: https://jira.apache.org/jira/browse/BEAM-5386
>>> >  >
>>> >  > Initially I thought it would be good to do a 2.9.1 release,
>>> but since we
>>> >  > have an
>>> >  > LTS version, we should probably do a 2.7.1 (LTS) release
>>> instead.
>>> >  >
>>> >  > What do you think? I could only find one Fix Version 2.7.1
>>> issue in JIRA:
>>> >  >
>>> >
>>> https://jira.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.7.1
>>> >  >
>>> >  > Best,
>>> >  > Max
>>> >  >
>>> >
>>>
>>


Re: Findbugs -> Spotbugs ?

2019-01-31 Thread Mikhail Gryzykhin
+1 for spotbugs

On Thu, Jan 31, 2019, 10:38 Udi Meiri  wrote:

> +1 for spotbugs
>
> On Thu, Jan 31, 2019 at 5:09 AM Gleb Kanterov  wrote:
>
>> Agree, spotbugs brings static checks that aren't covered in error-prone,
>> it's a good addition. There are few conflicts between error-prone and
>> spotbugs, for instance, the approach to enum switch exhaustiveness, but it
>> can be configured.
>>
>> On Thu, Jan 31, 2019 at 10:53 AM Ismaël Mejía  wrote:
>>
>>> Not a blocker but there is not a spotbugs plugin for IntelliJ.
>>>
>>> On Thu, Jan 31, 2019 at 10:45 AM Ismaël Mejía  wrote:
>>> >
>>> > YES PLEASE let's move to spotbugs !
>>> > Findbugs has not had a new release in ages, and does not support Java
>>> > 11 either, so this will address another possible issue.
>>> >
>>> > On Thu, Jan 31, 2019 at 8:28 AM Kenneth Knowles 
>>> wrote:
>>> > >
>>> > > Over the last few hours I activated findbugs on the Dataflow Java
>>> worker and fixed or suppressed the errors. They started around 60 but
>>> fixing some uncovered others, etc. You can see the result at
>>> https://github.com/apache/beam/pull/7684.
>>> > >
>>> > > It has convinced me that findbugs still adds value, beyond
>>> errorprone and nullaway/checker/infer. Quite a few of the issues were not
>>> nullability related, though nullability remains the most obvious
>>> low-hanging fruit where a different tool would do even better than
>>> findbugs. I have not yet enable "non null by default" which exposes 100+
>>> new bugs in the worker, at minimum.
>>> > >
>>> > > Are there known blockers for upgrading to spotbugs so we are
>>> depending on an active project?
>>> > >
>>> > > Kenn
>>>
>>
>>
>> --
>> Cheers,
>> Gleb
>>
>


Re: 2.7.1 (LTS) release?

2019-01-31 Thread Charles Chen
I would be in favor of keeping the old 2.7.0 release branch / tag static so
that referring to it will always get the right 2.7.0 code.

On Thu, Jan 31, 2019 at 10:24 AM Kenneth Knowles  wrote:

> I have waffled on whether to have release-2.7 and only branch
> release-2.7.1 when starting that release. I think that whenever we release
> 2.7.n the branch for 2.7.(n+1) should start from exactly that point, no? Or
> perhaps on release-2.7 branch the hardcoded version strings could be
> 2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new release
> branch? I guess I think either one is fine. I think starting the branch now
> is smart, so that you can accumulate cherrypicks of backports.
>
> Kenn
>
> On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels  wrote:
>
>> 2.10.0 will be done when its done. Same goes for 2.7.1, which is likely
>> going to
>> be done later since we are focusing on 2.10.0 at the moment.
>>
>> I've created the release-2.7.1 branch because there is no other place for
>> fixes
>> of future versions. It would be helpful to have a minor version branch
>> (e.g.
>> release-2.7) which can be continuously updated.
>>
>> More generally speaking, we should dedicate time for LTS releases. What
>> is the
>> point otherwise of having an LTS version?
>>
>> -Max
>>
>> On 31.01.19 16:28, Thomas Weise wrote:
>> > Since you were originally thinking of 2.9.x as target, 2.10.0 seems
>> closer both
>> > in time and upgrade path.
>> >
>> > I see no reason why a 2.7.1 release would materialize any sooner than
>> 2.10.0.
>> >
>> > Or is the intention is to just stack up fixes in the 2.7.x branch for a
>> > potential future release?
>> >
>> > Thomas
>> >
>> >
>> > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels > > > wrote:
>> >
>> > I agree it's better to take some extra time to ensure the quality
>> of 2.10.0.
>> >
>> > I've created a 2.7.1 branch and cherry-picked the relevant
>> commits[1]. We could
>> > start collecting other fixes in case there are any.
>> >
>> > -Max
>> >
>> > [1] https://github.com/apache/beam/pull/7687
>> >
>> > On 30.01.19 20:57, Kenneth Knowles wrote:
>> >  > Sounds good to me to target 2.7.1 and 2.10.0. I will have to
>> re-roll RC2
>> > after
>> >  > confirming fixes for the latest blockers that were found. These
>> are not
>> >  > regressions from 2.9.0. But they seem severe enough that they
>> are worth
>> > taking
>> >  > an extra day or two, because 2.9.0 had enough problems that I
>> would like
>> > to make
>> >  > 2.10.0 a more attractive upgrade target for users still on very
>> old versions.
>> >  >
>> >  > Kenn
>> >  >
>> >  > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels <
>> m...@apache.org
>> > 
>> >  > >> wrote:
>> >  >
>> >  > Hi everyone,
>> >  >
>> >  > I know we are in the midst of releasing 2.10.0, but with the
>> release
>> > process
>> >  > taking its time I consider creating a patch release for this
>> issue in the
>> >  > FlinkRunner: https://jira.apache.org/jira/browse/BEAM-5386
>> >  >
>> >  > Initially I thought it would be good to do a 2.9.1 release,
>> but since we
>> >  > have an
>> >  > LTS version, we should probably do a 2.7.1 (LTS) release
>> instead.
>> >  >
>> >  > What do you think? I could only find one Fix Version 2.7.1
>> issue in JIRA:
>> >  >
>> >
>> https://jira.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.7.1
>> >  >
>> >  > Best,
>> >  > Max
>> >  >
>> >
>>
>


Re: Findbugs -> Spotbugs ?

2019-01-31 Thread Anton Kedin
It would be nice. How fast is it on Beam codebase?

Regards,
Anton

On Thu, Jan 31, 2019 at 10:38 AM Udi Meiri  wrote:

> +1 for spotbugs
>
> On Thu, Jan 31, 2019 at 5:09 AM Gleb Kanterov  wrote:
>
>> Agree, spotbugs brings static checks that aren't covered in error-prone,
>> it's a good addition. There are few conflicts between error-prone and
>> spotbugs, for instance, the approach to enum switch exhaustiveness, but it
>> can be configured.
>>
>> On Thu, Jan 31, 2019 at 10:53 AM Ismaël Mejía  wrote:
>>
>>> Not a blocker but there is not a spotbugs plugin for IntelliJ.
>>>
>>> On Thu, Jan 31, 2019 at 10:45 AM Ismaël Mejía  wrote:
>>> >
>>> > YES PLEASE let's move to spotbugs !
>>> > Findbugs has not had a new release in ages, and does not support Java
>>> > 11 either, so this will address another possible issue.
>>> >
>>> > On Thu, Jan 31, 2019 at 8:28 AM Kenneth Knowles 
>>> wrote:
>>> > >
>>> > > Over the last few hours I activated findbugs on the Dataflow Java
>>> worker and fixed or suppressed the errors. They started around 60 but
>>> fixing some uncovered others, etc. You can see the result at
>>> https://github.com/apache/beam/pull/7684.
>>> > >
>>> > > It has convinced me that findbugs still adds value, beyond
>>> errorprone and nullaway/checker/infer. Quite a few of the issues were not
>>> nullability related, though nullability remains the most obvious
>>> low-hanging fruit where a different tool would do even better than
>>> findbugs. I have not yet enable "non null by default" which exposes 100+
>>> new bugs in the worker, at minimum.
>>> > >
>>> > > Are there known blockers for upgrading to spotbugs so we are
>>> depending on an active project?
>>> > >
>>> > > Kenn
>>>
>>
>>
>> --
>> Cheers,
>> Gleb
>>
>


Re: Findbugs -> Spotbugs ?

2019-01-31 Thread Udi Meiri
+1 for spotbugs

On Thu, Jan 31, 2019 at 5:09 AM Gleb Kanterov  wrote:

> Agree, spotbugs brings static checks that aren't covered in error-prone,
> it's a good addition. There are few conflicts between error-prone and
> spotbugs, for instance, the approach to enum switch exhaustiveness, but it
> can be configured.
>
> On Thu, Jan 31, 2019 at 10:53 AM Ismaël Mejía  wrote:
>
>> Not a blocker but there is not a spotbugs plugin for IntelliJ.
>>
>> On Thu, Jan 31, 2019 at 10:45 AM Ismaël Mejía  wrote:
>> >
>> > YES PLEASE let's move to spotbugs !
>> > Findbugs has not had a new release in ages, and does not support Java
>> > 11 either, so this will address another possible issue.
>> >
>> > On Thu, Jan 31, 2019 at 8:28 AM Kenneth Knowles 
>> wrote:
>> > >
>> > > Over the last few hours I activated findbugs on the Dataflow Java
>> worker and fixed or suppressed the errors. They started around 60 but
>> fixing some uncovered others, etc. You can see the result at
>> https://github.com/apache/beam/pull/7684.
>> > >
>> > > It has convinced me that findbugs still adds value, beyond errorprone
>> and nullaway/checker/infer. Quite a few of the issues were not nullability
>> related, though nullability remains the most obvious low-hanging fruit
>> where a different tool would do even better than findbugs. I have not yet
>> enable "non null by default" which exposes 100+ new bugs in the worker, at
>> minimum.
>> > >
>> > > Are there known blockers for upgrading to spotbugs so we are
>> depending on an active project?
>> > >
>> > > Kenn
>>
>
>
> --
> Cheers,
> Gleb
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: 2.7.1 (LTS) release?

2019-01-31 Thread Kenneth Knowles
I have waffled on whether to have release-2.7 and only branch release-2.7.1
when starting that release. I think that whenever we release 2.7.n the
branch for 2.7.(n+1) should start from exactly that point, no? Or perhaps
on release-2.7 branch the hardcoded version strings could be
2.7.1-SNAPSHOT/dev and remove the SNAPSHOT/dev when cutting the new release
branch? I guess I think either one is fine. I think starting the branch now
is smart, so that you can accumulate cherrypicks of backports.

Kenn

On Thu, Jan 31, 2019 at 7:55 AM Maximilian Michels  wrote:

> 2.10.0 will be done when its done. Same goes for 2.7.1, which is likely
> going to
> be done later since we are focusing on 2.10.0 at the moment.
>
> I've created the release-2.7.1 branch because there is no other place for
> fixes
> of future versions. It would be helpful to have a minor version branch
> (e.g.
> release-2.7) which can be continuously updated.
>
> More generally speaking, we should dedicate time for LTS releases. What is
> the
> point otherwise of having an LTS version?
>
> -Max
>
> On 31.01.19 16:28, Thomas Weise wrote:
> > Since you were originally thinking of 2.9.x as target, 2.10.0 seems
> closer both
> > in time and upgrade path.
> >
> > I see no reason why a 2.7.1 release would materialize any sooner than
> 2.10.0.
> >
> > Or is the intention is to just stack up fixes in the 2.7.x branch for a
> > potential future release?
> >
> > Thomas
> >
> >
> > On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels  > > wrote:
> >
> > I agree it's better to take some extra time to ensure the quality of
> 2.10.0.
> >
> > I've created a 2.7.1 branch and cherry-picked the relevant
> commits[1]. We could
> > start collecting other fixes in case there are any.
> >
> > -Max
> >
> > [1] https://github.com/apache/beam/pull/7687
> >
> > On 30.01.19 20:57, Kenneth Knowles wrote:
> >  > Sounds good to me to target 2.7.1 and 2.10.0. I will have to
> re-roll RC2
> > after
> >  > confirming fixes for the latest blockers that were found. These
> are not
> >  > regressions from 2.9.0. But they seem severe enough that they are
> worth
> > taking
> >  > an extra day or two, because 2.9.0 had enough problems that I
> would like
> > to make
> >  > 2.10.0 a more attractive upgrade target for users still on very
> old versions.
> >  >
> >  > Kenn
> >  >
> >  > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels <
> m...@apache.org
> > 
> >  > >> wrote:
> >  >
> >  > Hi everyone,
> >  >
> >  > I know we are in the midst of releasing 2.10.0, but with the
> release
> > process
> >  > taking its time I consider creating a patch release for this
> issue in the
> >  > FlinkRunner: https://jira.apache.org/jira/browse/BEAM-5386
> >  >
> >  > Initially I thought it would be good to do a 2.9.1 release,
> but since we
> >  > have an
> >  > LTS version, we should probably do a 2.7.1 (LTS) release
> instead.
> >  >
> >  > What do you think? I could only find one Fix Version 2.7.1
> issue in JIRA:
> >  >
> >
> https://jira.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.7.1
> >  >
> >  > Best,
> >  > Max
> >  >
> >
>


Re: Example project configuration (maven or gradle) for projects depending on BeamSQL sdk extensions

2019-01-31 Thread Yi Pan
Hi, Kenn,

Thanks for the quick response! Just FYI, I downgrade to 2.8.0 and the same
project runs fine now. Will update the ticket accordingly.

-Yi

On Wed, Jan 30, 2019 at 9:11 PM Kenneth Knowles  wrote:

> Wow, thanks for the great report. Your configuration looks good to me. I
> filed https://issues.apache.org/jira/browse/BEAM-6558 to figure this out.
>
> Kenn
>
> On Wed, Jan 30, 2019 at 7:01 PM Yi Pan  wrote:
>
>> Hi, all,
>>
>> Newbie here trying to figure out how to use published
>> beam-sdks-java-extensions-sql-2.9.0 in my own project.
>>
>> I tried to create a gradle project to use BeamSQL sdk libraries. Here is
>> the build.gradle I have:
>> {code}
>> plugins {
>> id 'java'
>> }
>>
>> group 'com.mycompany.myproject'
>> version '1.0-SNAPSHOT'
>>
>> sourceCompatibility = 1.8
>>
>> repositories {
>> mavenCentral()
>> }
>>
>> apply plugin: 'java'
>>
>> sourceSets.main.java.srcDirs = [
>> 'src/main/java'
>> ]
>>
>> dependencies {
>> compile 'org.apache.beam:beam-sdks-java-core:2.9.0'
>> compile 'org.apache.beam:beam-sdks-java-extensions-sql:2.9.0'
>> compile 'com.google.code.findbugs:jsr305:3.0.2'
>> runtime 'org.apache.beam:beam-runners-direct-java:2.9.0'
>> testCompile group: 'junit', name: 'junit', version: '4.12'
>> }
>>
>> // Run basic SQL example
>> task runBasicExample(type: JavaExec) {
>>   description = "Run basic SQL example"
>>   main = "com.mycompany.myproject.streamsql.examples.BeamSQLExample"
>>   classpath = sourceSets.main.runtimeClasspath
>>   args = ["--runner=DirectRunner"]
>>   println classpath.getAsPath()
>>   println args
>> }
>> {code}
>>
>> The example BeamSQLExample is just copied from
>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java
>> .
>>
>> I was able to compile the example in JDK8. However, when I tried to run
>> it, I hit the following exception:
>> {code}
>> Exception in thread "main" java.util.ServiceConfigurationError:
>> org.apache.beam.sdk.extensions.sql.impl.udf.BeamBuiltinFunctionProvider:
>> Provider org.apache.beam.sdk.extensions.sql.impl.udf.BuiltinStringFunctions
>> could not be instantiated
>> at java.util.ServiceLoader.fail(ServiceLoader.java:232)
>> at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
>> at
>> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
>> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
>> at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
>> at
>> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.loadBeamBuiltinFunctions(BeamSqlEnv.java:128)
>> at
>> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:94)
>> at
>> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:76)
>> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:471)
>> at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:357)
>> at
>> com.mycompany.myprject.streamsql.examples.BeamSQLExample.main(BeamSQLExample.java:72)
>> Caused by: java.lang.NoClassDefFoundError:
>> org/apache/commons/codec/DecoderException
>> at java.lang.Class.getDeclaredConstructors0(Native Method)
>> at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
>> at java.lang.Class.getConstructor0(Class.java:3075)
>> at java.lang.Class.newInstance(Class.java:412)
>> at
>> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
>> ... 9 more
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.commons.codec.DecoderException
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 14 more
>> {code}
>>
>> When I traced into the code, it turns out that the failure occurs when
>> the classloader tries to get the default constructor w/o any parameters for
>> BuiltinStringFunctions.class. I double checked my local gradle cache and
>> confirmed that the jar is there:
>> {code}
>> SJCMAC91THJHD4:beamsql-demo ypan$ ls -l
>> ~/.gradle/caches/modules-2/files-2.1/org.apache.beam/beam-sdks-java-extensions-sql/2.9.0/67e7675519859ff332619c4c6ea5d26a505dbd50/beam-sdks-java-extensions-sql-2.9.0.jar
>> -rw-r--r--  1 ypan  192360288  12761025 Jan 29 18:02
>> /Users/ypan/.gradle/caches/modules-2/files-2.1/org.apache.beam/beam-sdks-java-extensions-sql/2.9.0/67e7675519859ff332619c4c6ea5d26a505dbd50/beam-sdks-java-extensions-sql-2.9.0.jar
>> {code}
>>
>> I also have tried to compile the sql sdk libraries in Beam's source repo
>> and just copying over the generated class files over to my runtime
>> classpath. Apparently, that did not work well since there are many shadowed
>> libraries now need explicit declaring dependencies. I tried to search for
>> an example of maven 

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels

Ah, I thought you meant native Flink transforms.

Exactly! The translation code is already there. The main challenge is how to 
programmatically configure the BeamIO from Python. I suppose that is also an 
unsolved problem for cross-language transforms in general.


For Matthias' pipeline with PubSubIO we can build something specific, but for 
the general case there should be way to initialize a Beam IO via a configuration 
map provided by an external environment.


On 31.01.19 17:36, Thomas Weise wrote:

Exactly, that's what I had in mind.

A Flink runner native transform would make the existing unbounded sources 
available, similar to:


https://github.com/apache/beam/blob/2e89c1e4d35e7b5f95a622259d23d921c3d6ad1f/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java#L167




On Thu, Jan 31, 2019 at 8:18 AM Maximilian Michels > wrote:


Wouldn't it be even more useful for the transition period if we enabled 
Beam IO
to be used via Flink (like in the legacy Flink Runner)? In this particular
example, Matthias wants to use PubSubIO, which is not even available as a
native
Flink transform.

On 31.01.19 16:21, Thomas Weise wrote:
 > Until SDF is supported, we could also add Flink runner native transforms 
for
 > selected unbounded sources [1].
 >
 > That might be a reasonable option to unblock users that want to try 
Python
 > streaming on Flink.
 >
 > Thomas
 >
 > [1]
 >

https://github.com/lyft/beam/blob/release-2.10.0-lyft/runners/flink/src/main/java/org/apache/beam/runners/flink/LyftFlinkStreamingPortableTranslations.java
 >
 >
 > On Thu, Jan 31, 2019 at 6:51 AM Maximilian Michels mailto:m...@apache.org>
 > >> wrote:
 >
 >      > I have a hard time to imagine how can we map in a generic way
 >     RestrictionTrackers into the existing Bounded/UnboundedSource, so I 
would
 >     love to hear more about the details.
 >
 >     Isn't it the other way around? The SDF is a generalization of
UnboundedSource.
 >     So we would wrap UnboundedSource using SDF. I'm not saying it is
trivial, but
 >     SDF offers all the functionality that UnboundedSource needs.
 >
 >     For example, the @GetInitialRestriction method would call split on 
the
 >     UnboundedSource and the restriction trackers would then be used to
process the
 >     splits.
 >
 >     On 31.01.19 15:16, Ismaël Mejía wrote:
 >      >> Not necessarily. This would be one way. Another way is build an 
SDF
 >     wrapper for UnboundedSource. Probably the easier path for migration.
 >      >
 >      > That would be fantastic, I have heard about such wrapper multiple
 >      > times but so far there is not any realistic proposal. I have a 
hard
 >      > time to imagine how can we map in a generic way 
RestrictionTrackers
 >      > into the existing Bounded/UnboundedSource, so I would love to hear
 >      > more about the details.
 >      >
 >      > On Thu, Jan 31, 2019 at 3:07 PM Maximilian Michels 
mailto:m...@apache.org>
 >     >> wrote:
 >      >>
 >      >>   > In addition to have support in the runners, this will 
require a
 >      >>   > rewrite of PubsubIO to use the new SDF API.
 >      >>
 >      >> Not necessarily. This would be one way. Another way is build an 
SDF
 >     wrapper for
 >      >> UnboundedSource. Probably the easier path for migration.
 >      >>
 >      >> On 31.01.19 14:03, Ismaël Mejía wrote:
 >       Fortunately, there is already a pending PR for cross-language
 >     pipelines which
 >       will allow us to use Java IO like PubSub in Python jobs.
 >      >>>
 >      >>> In addition to have support in the runners, this will require a
 >      >>> rewrite of PubsubIO to use the new SDF API.
 >      >>>
 >      >>> On Thu, Jan 31, 2019 at 12:23 PM Maximilian Michels
mailto:m...@apache.org>
 >     >> wrote:
 >      
 >       Hi Matthias,
 >      
 >       This is already reflected in the compatibility matrix, if you 
look
 >     under SDF.
 >       There is no UnboundedSource interface for portable pipelines.
That's a
 >     legacy
 >       abstraction that will be replaced with SDF.
 >      
 >       Fortunately, there is already a pending PR for cross-language
 >     pipelines which
 >       will allow us to use Java IO like PubSub in Python jobs.
 >      
 >       Thanks,
 >       Max
 >      
 >       On 31.01.19 12:06, Matthias Baetens wrote:
 >      > Hey Ankur,
 >      >
 >      > Thanks 

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Thomas Weise
Exactly, that's what I had in mind.

A Flink runner native transform would make the existing unbounded sources
available, similar to:

https://github.com/apache/beam/blob/2e89c1e4d35e7b5f95a622259d23d921c3d6ad1f/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslators.java#L167




On Thu, Jan 31, 2019 at 8:18 AM Maximilian Michels  wrote:

> Wouldn't it be even more useful for the transition period if we enabled
> Beam IO
> to be used via Flink (like in the legacy Flink Runner)? In this particular
> example, Matthias wants to use PubSubIO, which is not even available as a
> native
> Flink transform.
>
> On 31.01.19 16:21, Thomas Weise wrote:
> > Until SDF is supported, we could also add Flink runner native transforms
> for
> > selected unbounded sources [1].
> >
> > That might be a reasonable option to unblock users that want to try
> Python
> > streaming on Flink.
> >
> > Thomas
> >
> > [1]
> >
> https://github.com/lyft/beam/blob/release-2.10.0-lyft/runners/flink/src/main/java/org/apache/beam/runners/flink/LyftFlinkStreamingPortableTranslations.java
> >
> >
> > On Thu, Jan 31, 2019 at 6:51 AM Maximilian Michels  > > wrote:
> >
> >  > I have a hard time to imagine how can we map in a generic way
> > RestrictionTrackers into the existing Bounded/UnboundedSource, so I
> would
> > love to hear more about the details.
> >
> > Isn't it the other way around? The SDF is a generalization of
> UnboundedSource.
> > So we would wrap UnboundedSource using SDF. I'm not saying it is
> trivial, but
> > SDF offers all the functionality that UnboundedSource needs.
> >
> > For example, the @GetInitialRestriction method would call split on
> the
> > UnboundedSource and the restriction trackers would then be used to
> process the
> > splits.
> >
> > On 31.01.19 15:16, Ismaël Mejía wrote:
> >  >> Not necessarily. This would be one way. Another way is build an
> SDF
> > wrapper for UnboundedSource. Probably the easier path for migration.
> >  >
> >  > That would be fantastic, I have heard about such wrapper multiple
> >  > times but so far there is not any realistic proposal. I have a
> hard
> >  > time to imagine how can we map in a generic way
> RestrictionTrackers
> >  > into the existing Bounded/UnboundedSource, so I would love to hear
> >  > more about the details.
> >  >
> >  > On Thu, Jan 31, 2019 at 3:07 PM Maximilian Michels <
> m...@apache.org
> > > wrote:
> >  >>
> >  >>   > In addition to have support in the runners, this will
> require a
> >  >>   > rewrite of PubsubIO to use the new SDF API.
> >  >>
> >  >> Not necessarily. This would be one way. Another way is build an
> SDF
> > wrapper for
> >  >> UnboundedSource. Probably the easier path for migration.
> >  >>
> >  >> On 31.01.19 14:03, Ismaël Mejía wrote:
> >   Fortunately, there is already a pending PR for cross-language
> > pipelines which
> >   will allow us to use Java IO like PubSub in Python jobs.
> >  >>>
> >  >>> In addition to have support in the runners, this will require a
> >  >>> rewrite of PubsubIO to use the new SDF API.
> >  >>>
> >  >>> On Thu, Jan 31, 2019 at 12:23 PM Maximilian Michels <
> m...@apache.org
> > > wrote:
> >  
> >   Hi Matthias,
> >  
> >   This is already reflected in the compatibility matrix, if you
> look
> > under SDF.
> >   There is no UnboundedSource interface for portable pipelines.
> That's a
> > legacy
> >   abstraction that will be replaced with SDF.
> >  
> >   Fortunately, there is already a pending PR for cross-language
> > pipelines which
> >   will allow us to use Java IO like PubSub in Python jobs.
> >  
> >   Thanks,
> >   Max
> >  
> >   On 31.01.19 12:06, Matthias Baetens wrote:
> >  > Hey Ankur,
> >  >
> >  > Thanks for the swift reply. Should I change this in the
> capability matrix
> >  > 
> then?
> >  >
> >  > Many thanks.
> >  > Best,
> >  > Matthias
> >  >
> >  > On Thu, 31 Jan 2019 at 09:31, Ankur Goenka  > 
> >  > >> wrote:
> >  >
> >  >   Hi Matthias,
> >  >
> >  >   Unfortunately, unbounded reads including pubsub are not
> yet
> > supported for
> >  >   portable runners.
> >  >
> >  >   Thanks,
> >  >   Ankur
> >  >
> >  >   On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens
> > mailto:baetensmatth...@gmail.com>
> >  >    > 

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels
Wouldn't it be even more useful for the transition period if we enabled Beam IO 
to be used via Flink (like in the legacy Flink Runner)? In this particular 
example, Matthias wants to use PubSubIO, which is not even available as a native 
Flink transform.


On 31.01.19 16:21, Thomas Weise wrote:
Until SDF is supported, we could also add Flink runner native transforms for 
selected unbounded sources [1].


That might be a reasonable option to unblock users that want to try Python 
streaming on Flink.


Thomas

[1] 
https://github.com/lyft/beam/blob/release-2.10.0-lyft/runners/flink/src/main/java/org/apache/beam/runners/flink/LyftFlinkStreamingPortableTranslations.java



On Thu, Jan 31, 2019 at 6:51 AM Maximilian Michels > wrote:


 > I have a hard time to imagine how can we map in a generic way
RestrictionTrackers into the existing Bounded/UnboundedSource, so I would
love to hear more about the details.

Isn't it the other way around? The SDF is a generalization of 
UnboundedSource.
So we would wrap UnboundedSource using SDF. I'm not saying it is trivial, 
but
SDF offers all the functionality that UnboundedSource needs.

For example, the @GetInitialRestriction method would call split on the
UnboundedSource and the restriction trackers would then be used to process 
the
splits.

On 31.01.19 15:16, Ismaël Mejía wrote:
 >> Not necessarily. This would be one way. Another way is build an SDF
wrapper for UnboundedSource. Probably the easier path for migration.
 >
 > That would be fantastic, I have heard about such wrapper multiple
 > times but so far there is not any realistic proposal. I have a hard
 > time to imagine how can we map in a generic way RestrictionTrackers
 > into the existing Bounded/UnboundedSource, so I would love to hear
 > more about the details.
 >
 > On Thu, Jan 31, 2019 at 3:07 PM Maximilian Michels mailto:m...@apache.org>> wrote:
 >>
 >>   > In addition to have support in the runners, this will require a
 >>   > rewrite of PubsubIO to use the new SDF API.
 >>
 >> Not necessarily. This would be one way. Another way is build an SDF
wrapper for
 >> UnboundedSource. Probably the easier path for migration.
 >>
 >> On 31.01.19 14:03, Ismaël Mejía wrote:
  Fortunately, there is already a pending PR for cross-language
pipelines which
  will allow us to use Java IO like PubSub in Python jobs.
 >>>
 >>> In addition to have support in the runners, this will require a
 >>> rewrite of PubsubIO to use the new SDF API.
 >>>
 >>> On Thu, Jan 31, 2019 at 12:23 PM Maximilian Michels mailto:m...@apache.org>> wrote:
 
  Hi Matthias,
 
  This is already reflected in the compatibility matrix, if you look
under SDF.
  There is no UnboundedSource interface for portable pipelines. That's a
legacy
  abstraction that will be replaced with SDF.
 
  Fortunately, there is already a pending PR for cross-language
pipelines which
  will allow us to use Java IO like PubSub in Python jobs.
 
  Thanks,
  Max
 
  On 31.01.19 12:06, Matthias Baetens wrote:
 > Hey Ankur,
 >
 > Thanks for the swift reply. Should I change this in the capability 
matrix
 >  then?
 >
 > Many thanks.
 > Best,
 > Matthias
 >
 > On Thu, 31 Jan 2019 at 09:31, Ankur Goenka mailto:goe...@google.com>
 > >> wrote:
 >
 >       Hi Matthias,
 >
 >       Unfortunately, unbounded reads including pubsub are not yet
supported for
 >       portable runners.
 >
 >       Thanks,
 >       Ankur
 >
 >       On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens
mailto:baetensmatth...@gmail.com>
 >       >> wrote:
 >
 >           Hi everyone,
 >
 >           Last few days I have been trying to run a streaming
pipeline (code on
 >           Github ) on a
Flink Runner.
 >
 >           I am running a Flink cluster locally (v1.5.6
 >           )
 >           I have built the SDK Harness Container: /./gradlew
 >           :beam-sdks-python-container:docker/
 >           and started the JobServer: /./gradlew
 >           :beam-runners-flink_2.11-job-server:runShadow
 >           -PflinkMasterUrl=localhost:8081./
 >
 >           I run my pipeline with: /env/bin/python 
streaming_pipeline.py
 >           

Re: 2.7.1 (LTS) release?

2019-01-31 Thread Maximilian Michels
2.10.0 will be done when its done. Same goes for 2.7.1, which is likely going to 
be done later since we are focusing on 2.10.0 at the moment.


I've created the release-2.7.1 branch because there is no other place for fixes 
of future versions. It would be helpful to have a minor version branch (e.g. 
release-2.7) which can be continuously updated.


More generally speaking, we should dedicate time for LTS releases. What is the 
point otherwise of having an LTS version?


-Max

On 31.01.19 16:28, Thomas Weise wrote:
Since you were originally thinking of 2.9.x as target, 2.10.0 seems closer both 
in time and upgrade path.


I see no reason why a 2.7.1 release would materialize any sooner than 2.10.0.

Or is the intention is to just stack up fixes in the 2.7.x branch for a 
potential future release?


Thomas


On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels > wrote:


I agree it's better to take some extra time to ensure the quality of 2.10.0.

I've created a 2.7.1 branch and cherry-picked the relevant commits[1]. We 
could
start collecting other fixes in case there are any.

-Max

[1] https://github.com/apache/beam/pull/7687

On 30.01.19 20:57, Kenneth Knowles wrote:
 > Sounds good to me to target 2.7.1 and 2.10.0. I will have to re-roll RC2
after
 > confirming fixes for the latest blockers that were found. These are not
 > regressions from 2.9.0. But they seem severe enough that they are worth
taking
 > an extra day or two, because 2.9.0 had enough problems that I would like
to make
 > 2.10.0 a more attractive upgrade target for users still on very old 
versions.
 >
 > Kenn
 >
 > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels mailto:m...@apache.org>
 > >> wrote:
 >
 >     Hi everyone,
 >
 >     I know we are in the midst of releasing 2.10.0, but with the release
process
 >     taking its time I consider creating a patch release for this issue 
in the
 >     FlinkRunner: https://jira.apache.org/jira/browse/BEAM-5386
 >
 >     Initially I thought it would be good to do a 2.9.1 release, but 
since we
 >     have an
 >     LTS version, we should probably do a 2.7.1 (LTS) release instead.
 >
 >     What do you think? I could only find one Fix Version 2.7.1 issue in 
JIRA:
 >

https://jira.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.7.1
 >
 >     Best,
 >     Max
 >



Re: 2.7.1 (LTS) release?

2019-01-31 Thread Thomas Weise
Since you were originally thinking of 2.9.x as target, 2.10.0 seems closer
both in time and upgrade path.

I see no reason why a 2.7.1 release would materialize any sooner than
2.10.0.

Or is the intention is to just stack up fixes in the 2.7.x branch for a
potential future release?

Thomas


On Thu, Jan 31, 2019 at 5:03 AM Maximilian Michels  wrote:

> I agree it's better to take some extra time to ensure the quality of
> 2.10.0.
>
> I've created a 2.7.1 branch and cherry-picked the relevant commits[1]. We
> could
> start collecting other fixes in case there are any.
>
> -Max
>
> [1] https://github.com/apache/beam/pull/7687
>
> On 30.01.19 20:57, Kenneth Knowles wrote:
> > Sounds good to me to target 2.7.1 and 2.10.0. I will have to re-roll RC2
> after
> > confirming fixes for the latest blockers that were found. These are not
> > regressions from 2.9.0. But they seem severe enough that they are worth
> taking
> > an extra day or two, because 2.9.0 had enough problems that I would like
> to make
> > 2.10.0 a more attractive upgrade target for users still on very old
> versions.
> >
> > Kenn
> >
> > On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels  > > wrote:
> >
> > Hi everyone,
> >
> > I know we are in the midst of releasing 2.10.0, but with the release
> process
> > taking its time I consider creating a patch release for this issue
> in the
> > FlinkRunner: https://jira.apache.org/jira/browse/BEAM-5386
> >
> > Initially I thought it would be good to do a 2.9.1 release, but
> since we
> > have an
> > LTS version, we should probably do a 2.7.1 (LTS) release instead.
> >
> > What do you think? I could only find one Fix Version 2.7.1 issue in
> JIRA:
> >
> https://jira.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.7.1
> >
> > Best,
> > Max
> >
>


Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Thomas Weise
Until SDF is supported, we could also add Flink runner native transforms
for selected unbounded sources [1].

That might be a reasonable option to unblock users that want to try Python
streaming on Flink.

Thomas

[1]
https://github.com/lyft/beam/blob/release-2.10.0-lyft/runners/flink/src/main/java/org/apache/beam/runners/flink/LyftFlinkStreamingPortableTranslations.java


On Thu, Jan 31, 2019 at 6:51 AM Maximilian Michels  wrote:

> > I have a hard time to imagine how can we map in a generic way
> RestrictionTrackers into the existing Bounded/UnboundedSource, so I would
> love to hear more about the details.
>
> Isn't it the other way around? The SDF is a generalization of
> UnboundedSource.
> So we would wrap UnboundedSource using SDF. I'm not saying it is trivial,
> but
> SDF offers all the functionality that UnboundedSource needs.
>
> For example, the @GetInitialRestriction method would call split on the
> UnboundedSource and the restriction trackers would then be used to process
> the
> splits.
>
> On 31.01.19 15:16, Ismaël Mejía wrote:
> >> Not necessarily. This would be one way. Another way is build an SDF
> wrapper for UnboundedSource. Probably the easier path for migration.
> >
> > That would be fantastic, I have heard about such wrapper multiple
> > times but so far there is not any realistic proposal. I have a hard
> > time to imagine how can we map in a generic way RestrictionTrackers
> > into the existing Bounded/UnboundedSource, so I would love to hear
> > more about the details.
> >
> > On Thu, Jan 31, 2019 at 3:07 PM Maximilian Michels 
> wrote:
> >>
> >>   > In addition to have support in the runners, this will require a
> >>   > rewrite of PubsubIO to use the new SDF API.
> >>
> >> Not necessarily. This would be one way. Another way is build an SDF
> wrapper for
> >> UnboundedSource. Probably the easier path for migration.
> >>
> >> On 31.01.19 14:03, Ismaël Mejía wrote:
>  Fortunately, there is already a pending PR for cross-language
> pipelines which
>  will allow us to use Java IO like PubSub in Python jobs.
> >>>
> >>> In addition to have support in the runners, this will require a
> >>> rewrite of PubsubIO to use the new SDF API.
> >>>
> >>> On Thu, Jan 31, 2019 at 12:23 PM Maximilian Michels 
> wrote:
> 
>  Hi Matthias,
> 
>  This is already reflected in the compatibility matrix, if you look
> under SDF.
>  There is no UnboundedSource interface for portable pipelines. That's
> a legacy
>  abstraction that will be replaced with SDF.
> 
>  Fortunately, there is already a pending PR for cross-language
> pipelines which
>  will allow us to use Java IO like PubSub in Python jobs.
> 
>  Thanks,
>  Max
> 
>  On 31.01.19 12:06, Matthias Baetens wrote:
> > Hey Ankur,
> >
> > Thanks for the swift reply. Should I change this in the capability
> matrix
> >  then?
> >
> > Many thanks.
> > Best,
> > Matthias
> >
> > On Thu, 31 Jan 2019 at 09:31, Ankur Goenka  > > wrote:
> >
> >   Hi Matthias,
> >
> >   Unfortunately, unbounded reads including pubsub are not yet
> supported for
> >   portable runners.
> >
> >   Thanks,
> >   Ankur
> >
> >   On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens <
> baetensmatth...@gmail.com
> >   > wrote:
> >
> >   Hi everyone,
> >
> >   Last few days I have been trying to run a streaming
> pipeline (code on
> >   Github ) on a
> Flink Runner.
> >
> >   I am running a Flink cluster locally (v1.5.6
> >   )
> >   I have built the SDK Harness Container: /./gradlew
> >   :beam-sdks-python-container:docker/
> >   and started the JobServer: /./gradlew
> >   :beam-runners-flink_2.11-job-server:runShadow
> >   -PflinkMasterUrl=localhost:8081./
> >
> >   I run my pipeline with: /env/bin/python
> streaming_pipeline.py
> >   --runner=PortableRunner --job_endpoint=localhost:8099
> --output xxx
> >   --input_subscription xxx --output_subscription xxx/
> >   /
> >   /
> >   All this is running inside a Ubuntu (Bionic) in a
> Virtualbox.
> >
> >   The job submits fine, but unfortunately fails after a few
> seconds with
> >   the error attached.
> >
> >   Anything I am missing or doing wrong?
> >
> >   Many thanks.
> >   Best,
> >   Matthias
> >
> >
>


Re: ContainerLaunchException in precommit [BEAM-6497]

2019-01-31 Thread Gleb Kanterov
There are two tests using testcontainers. I've noticed that in one of the
failed builds
 [1] only
one of them has failed to pull docker image. I suspect that adding retries
to container pull step can mitigate this issue. I've submitted a pull
request apache/beam#7689  [2].

[1] https://builds.apache.org/job/beam_PreCommit_Java_Commit/3869/
[2] https://github.com/apache/beam/pull/7689

On Wed, Jan 30, 2019 at 12:15 AM Kenneth Knowles  wrote:

> I retract my statement. I failed at web browsing.
>
> On Tue, Jan 29, 2019 at 3:14 PM Kenneth Knowles  wrote:
>
>> Version 18.10.3 no longer appears on the linked page.
>>
>> On Tue, Jan 29, 2019 at 3:08 PM David Rieber  wrote:
>>
>>> I am consistently hitting that error on this PR:
>>> https://github.com/apache/beam/pull/7631
>>>
>>>
>>> On Thu, Jan 24, 2019 at 9:14 AM Alex Amato  wrote:
>>>
 I have just seen it randomly occur on presubmits. So I don't have a
 reliable repro, unfortunately.
 It may be a specific environmental issue to the beam1 machine the tests
 ran on?
 https://builds.apache.org/job/beam_PreCommit_Java_Commit/3722/


 On Thu, Jan 24, 2019 at 8:16 AM Gleb Kanterov  wrote:

> I'm wondering if anybody can reproduce this issue. The build has
> failed once because testcontainers didn't pull docker image. If we use
> caching proxy for docker, it could be a reason for that. I didn't find any
> similar known issue in testcontainers fixed recently, but just in case, I
> bumped testcontainers to use never docker-java.
>
> https://github.com/apache/beam/pull/7610
>
> On Thu, Jan 24, 2019 at 12:27 AM Alex Amato 
> wrote:
>
>> Thank you Gleb, appreciate it.
>>
>> On Wed, Jan 23, 2019 at 2:40 PM Gleb Kanterov 
>> wrote:
>>
>>> I'm looking into it. This image exists in docker hub [1], but for
>>> some reason, it wasn't picked up.
>>>
>>> [1] https://hub.docker.com/r/yandex/clickhouse-server/tags
>>>
>>> On Wed, Jan 23, 2019 at 10:01 PM Alex Amato 
>>> wrote:
>>>

1.
   See: BEAM-6497
   
   1. This is also causing issues blocking precommits.
   2.
   Seems to be caused by this failure to locate the image. Are
  these stored somewhere or built by the build process? Any 
 idea why these
  are failing?

  Caused by: 
 com.github.dockerjava.api.exception.NotFoundException: {"message":"No 
 such image: yandex/clickhouse-server:18.10.3"}




>>>
>>> --
>>> Cheers,
>>> Gleb
>>>
>>
>
> --
> Cheers,
> Gleb
>


-- 
Cheers,
Gleb


Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels

I have a hard time to imagine how can we map in a generic way 
RestrictionTrackers into the existing Bounded/UnboundedSource, so I would love 
to hear more about the details.


Isn't it the other way around? The SDF is a generalization of UnboundedSource. 
So we would wrap UnboundedSource using SDF. I'm not saying it is trivial, but 
SDF offers all the functionality that UnboundedSource needs.


For example, the @GetInitialRestriction method would call split on the 
UnboundedSource and the restriction trackers would then be used to process the 
splits.


On 31.01.19 15:16, Ismaël Mejía wrote:

Not necessarily. This would be one way. Another way is build an SDF wrapper for 
UnboundedSource. Probably the easier path for migration.


That would be fantastic, I have heard about such wrapper multiple
times but so far there is not any realistic proposal. I have a hard
time to imagine how can we map in a generic way RestrictionTrackers
into the existing Bounded/UnboundedSource, so I would love to hear
more about the details.

On Thu, Jan 31, 2019 at 3:07 PM Maximilian Michels  wrote:


  > In addition to have support in the runners, this will require a
  > rewrite of PubsubIO to use the new SDF API.

Not necessarily. This would be one way. Another way is build an SDF wrapper for
UnboundedSource. Probably the easier path for migration.

On 31.01.19 14:03, Ismaël Mejía wrote:

Fortunately, there is already a pending PR for cross-language pipelines which
will allow us to use Java IO like PubSub in Python jobs.


In addition to have support in the runners, this will require a
rewrite of PubsubIO to use the new SDF API.

On Thu, Jan 31, 2019 at 12:23 PM Maximilian Michels  wrote:


Hi Matthias,

This is already reflected in the compatibility matrix, if you look under SDF.
There is no UnboundedSource interface for portable pipelines. That's a legacy
abstraction that will be replaced with SDF.

Fortunately, there is already a pending PR for cross-language pipelines which
will allow us to use Java IO like PubSub in Python jobs.

Thanks,
Max

On 31.01.19 12:06, Matthias Baetens wrote:

Hey Ankur,

Thanks for the swift reply. Should I change this in the capability matrix
 then?

Many thanks.
Best,
Matthias

On Thu, 31 Jan 2019 at 09:31, Ankur Goenka mailto:goe...@google.com>> wrote:

  Hi Matthias,

  Unfortunately, unbounded reads including pubsub are not yet supported for
  portable runners.

  Thanks,
  Ankur

  On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens 
mailto:baetensmatth...@gmail.com>> wrote:

  Hi everyone,

  Last few days I have been trying to run a streaming pipeline (code on
  Github ) on a Flink Runner.

  I am running a Flink cluster locally (v1.5.6
  )
  I have built the SDK Harness Container: /./gradlew
  :beam-sdks-python-container:docker/
  and started the JobServer: /./gradlew
  :beam-runners-flink_2.11-job-server:runShadow
  -PflinkMasterUrl=localhost:8081./

  I run my pipeline with: /env/bin/python streaming_pipeline.py
  --runner=PortableRunner --job_endpoint=localhost:8099 --output xxx
  --input_subscription xxx --output_subscription xxx/
  /
  /
  All this is running inside a Ubuntu (Bionic) in a Virtualbox.

  The job submits fine, but unfortunately fails after a few seconds with
  the error attached.

  Anything I am missing or doing wrong?

  Many thanks.
  Best,
  Matthias




Re: BEAM-6324 / #7340: "I've pretty much given up on the PR being merged. I use my own fork for my projects"

2019-01-31 Thread Etienne Chauchot
I also missed the sentence Kenn mentioned. I think it is worth enlightening it.
Thx for your PR around that Lukasz !
Etienne
Le mercredi 30 janvier 2019 à 11:03 +0100, Łukasz Gajowy a écrit :
> Wow. I missed the sentence. Judging from the fact that others also proposed 
> adding it, I think it might need some
> care. I proposed a PR here: https://github.com/apache/beam/pull/7670
> 
> Łukasz
> śr., 30 sty 2019 o 00:39 Kenneth Knowles  napisał(a):
> > On Mon, Jan 28, 2019 at 5:25 AM Łukasz Gajowy  wrote:
> > > IMHO, I don't think committers spend time watching new PRs coming up, but 
> > > they more likely act when pinged. So, we
> > > may need some automation in case a contributor do not use github reviewed 
> > > proposal. Auto reviewer assignment seem
> > > too much but modifying the PR template to add a sentence such as "please 
> > > pickup a reviewer in the proposed list"
> > > could be enough. 
> > > WDYT ?
> > > 
> > > and
> > > 
> > > (1) A sentence in the PR template suggesting adding a reviewer. (easy)
> > > 
> > > 
> > > +100! Let's improve the message in the PR template. It costs nothing and 
> > > can help a lot. I'd say it should be in
> > > bold letters as this is super important.
> > > 
> > 
> > There is already a message. Is it unclear? Can you rephrase it to something 
> > better?
> > Kenn 
> > > Maybe this is also worth reconsidering if auto reviewer assignment (or at 
> > > least some form of it) is a bad idea. We
> > > can assign committers (the most "hardcore" option, maybe too much) or 
> > > ping them in emails/github comments if
> > > there's inactivity in pull requests (the soft one but requires a bot to 
> > > be implemented). The way I see this is
> > > that such auto assigned reviewer could always say "I have lots on my 
> > > plate" but suggest someone else to take care
> > > of the PR. This way the problem that nobody is mentioned by the PR author 
> > > is completely gone. Other than that,
> > > such an approach feels efficient to me because it's more "in person" 
> > > (similar to what Robert said). 
> > > 
> > > It's certainly disheartening as a
> > > reviewer to put time into reviewing a PR and then the author doesn't
> > > bother to even respond, or (as has happened to me) be told "hey, this
> > > wasn't ready for review yet."
> > > 
> > > As for "this wasn't ready for review" - there are sometimes situations 
> > > that require a PR to be opened before they
> > > are actually completed (especially when working with Jenkins jobs). Given 
> > > that there might be misunderstandings
> > > authors of such commits should give a clear message saying "do not merge 
> > > yet" or "not ready for review" in title
> > > or comments or even close such PR and reopen until the change is ready. 
> > > It's all about giving a clear signal to
> > > others. 
> > > 
> > > Maybe we should mention it in guidelines/PR message too to avoid 
> > > situations like this?
> > > 
> > > Łukasz
> > > 
> > > 
> > > 
> > > pon., 28 sty 2019 o 11:30 Robert Bradshaw  
> > > napisał(a):
> > > > On Mon, Jan 28, 2019 at 10:37 AM Etienne Chauchot 
> > > >  wrote:
> > > > >
> > > > > Sure it's a pity than this PR got unnoticed and I think it is a 
> > > > > combination of factors (PR date around
> > > > Christmas, the fact that the author forgot - AFAIK - to ping a reviewer 
> > > > in either the PR or the ML).
> > > > >
> > > > > I agree with Rui's proposal to enhance visibility of the "how to get 
> > > > > a reviewed" process.
> > > > >
> > > > > IMHO, I don't think committers spend time watching new PRs coming up, 
> > > > > but they more likely act when pinged.
> > > > So, we may need some automation in case a contributor do not use github 
> > > > reviewed proposal. Auto reviewer
> > > > assignment seem too much but modifying the PR template to add a 
> > > > sentence such as "please pickup a reviewer in
> > > > the proposed list" could be enough.
> > > > > WDYT ?
> > > > 
> > > > 
> > > > +1
> > > > 
> > > > 
> > > > I see two somewhat separable areas of improvement:
> > > > 
> > > > 
> > > > (1) Getting a reviewer assigned to a PR, and
> > > > (2) Expectations of feedback/timeliness from a reviewer once it has
> > > > been assigned.
> > > > 
> > > > 
> > > > The first is the motivation for this thread, but I think we're
> > > > suffering on the second as well.
> > > > 
> > > > 
> > > > Given the reactions here, it sounds like most of us are just as
> > > > unhappy this happened as the author, and would be happy to pitch in
> > > > and improve things.
> > > > 
> > > > 
> > > > I agree with Kenn that just adding to more the contributor guide
> > > > always help because a contributor guide with everything one might need
> > > > to know is the least likely to actually be read in its entirety.
> > > > Rather it's useful to provide such guidance at the point that it's
> > > > needed. Specifically, I would like to see
> > > > 
> > > > 
> > > > (1) A sentence in the PR template suggesting adding a reviewer. 

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels

> In addition to have support in the runners, this will require a
> rewrite of PubsubIO to use the new SDF API.

Not necessarily. This would be one way. Another way is build an SDF wrapper for 
UnboundedSource. Probably the easier path for migration.


On 31.01.19 14:03, Ismaël Mejía wrote:

Fortunately, there is already a pending PR for cross-language pipelines which
will allow us to use Java IO like PubSub in Python jobs.


In addition to have support in the runners, this will require a
rewrite of PubsubIO to use the new SDF API.

On Thu, Jan 31, 2019 at 12:23 PM Maximilian Michels  wrote:


Hi Matthias,

This is already reflected in the compatibility matrix, if you look under SDF.
There is no UnboundedSource interface for portable pipelines. That's a legacy
abstraction that will be replaced with SDF.

Fortunately, there is already a pending PR for cross-language pipelines which
will allow us to use Java IO like PubSub in Python jobs.

Thanks,
Max

On 31.01.19 12:06, Matthias Baetens wrote:

Hey Ankur,

Thanks for the swift reply. Should I change this in the capability matrix
 then?

Many thanks.
Best,
Matthias

On Thu, 31 Jan 2019 at 09:31, Ankur Goenka mailto:goe...@google.com>> wrote:

 Hi Matthias,

 Unfortunately, unbounded reads including pubsub are not yet supported for
 portable runners.

 Thanks,
 Ankur

 On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens mailto:baetensmatth...@gmail.com>> wrote:

 Hi everyone,

 Last few days I have been trying to run a streaming pipeline (code on
 Github ) on a Flink Runner.

 I am running a Flink cluster locally (v1.5.6
 )
 I have built the SDK Harness Container: /./gradlew
 :beam-sdks-python-container:docker/
 and started the JobServer: /./gradlew
 :beam-runners-flink_2.11-job-server:runShadow
 -PflinkMasterUrl=localhost:8081./

 I run my pipeline with: /env/bin/python streaming_pipeline.py
 --runner=PortableRunner --job_endpoint=localhost:8099 --output xxx
 --input_subscription xxx --output_subscription xxx/
 /
 /
 All this is running inside a Ubuntu (Bionic) in a Virtualbox.

 The job submits fine, but unfortunately fails after a few seconds with
 the error attached.

 Anything I am missing or doing wrong?

 Many thanks.
 Best,
 Matthias




Re: Another new contributor!

2019-01-31 Thread Gleb Kanterov
Welcome! Would be interesting to hear your thoughts on Arrow, Arrow Flight,
and Beam Portability relation, this topic was recently discussed in dev@.

On Thu, Jan 31, 2019 at 2:00 PM Ismaël Mejía  wrote:

> Welcome Brian!
> Great to have someone with Apache experience already and also with
> Arrow knowledge.
>
> On Thu, Jan 31, 2019 at 1:32 PM Maximilian Michels  wrote:
> >
> > Welcome! Arrow and Beam together would open lots of possibilities.
> Portability
> > documentation improvements would be much appreciated :)
> >
> > On 31.01.19 11:25, Łukasz Gajowy wrote:
> > > Welcome!
> > >
> > > czw., 31 sty 2019 o 02:40 Kenneth Knowles  > > > napisał(a):
> > >
> > > Welcome!
> > >
> > > On Wed, Jan 30, 2019, 17:30 Connell O'Callaghan <
> conne...@google.com
> > >  wrote:
> > >
> > > Welcome on board Brian!
> > >
> > > On Wed, Jan 30, 2019 at 5:29 PM Ahmet Altay  > > > wrote:
> > >
> > > Welcome Brian!
> > >
> > > On Wed, Jan 30, 2019 at 5:26 PM Brian Hulette <
> bhule...@google.com
> > > > wrote:
> > >
> > > Hi everyone,
> > > I'm Brian Hulette, I just switched roles at Google and
> I'll be
> > > contributing to Beam Portability as part of my new
> position. For
> > > now I'm just going through documentation and getting
> familiar
> > > with Beam from the user perspective, so if anything
> I'll just be
> > > suggesting minor edits to documentation, but I hope to
> be
> > > putting up PRs soon enough.
> > >
> > > I am also an Apache committer (bhulette is my ASF id
> and Jira
> > > username). I worked on the Arrow project's Javascript
> > > implementation in a previous job, and I'm really
> excited to look
> > > for ways to use Arrow and Beam together once I've
> ramped up.
> > >
> > > Brian
> > >
>


-- 
Cheers,
Gleb


Re: Findbugs -> Spotbugs ?

2019-01-31 Thread Gleb Kanterov
Agree, spotbugs brings static checks that aren't covered in error-prone,
it's a good addition. There are few conflicts between error-prone and
spotbugs, for instance, the approach to enum switch exhaustiveness, but it
can be configured.

On Thu, Jan 31, 2019 at 10:53 AM Ismaël Mejía  wrote:

> Not a blocker but there is not a spotbugs plugin for IntelliJ.
>
> On Thu, Jan 31, 2019 at 10:45 AM Ismaël Mejía  wrote:
> >
> > YES PLEASE let's move to spotbugs !
> > Findbugs has not had a new release in ages, and does not support Java
> > 11 either, so this will address another possible issue.
> >
> > On Thu, Jan 31, 2019 at 8:28 AM Kenneth Knowles  wrote:
> > >
> > > Over the last few hours I activated findbugs on the Dataflow Java
> worker and fixed or suppressed the errors. They started around 60 but
> fixing some uncovered others, etc. You can see the result at
> https://github.com/apache/beam/pull/7684.
> > >
> > > It has convinced me that findbugs still adds value, beyond errorprone
> and nullaway/checker/infer. Quite a few of the issues were not nullability
> related, though nullability remains the most obvious low-hanging fruit
> where a different tool would do even better than findbugs. I have not yet
> enable "non null by default" which exposes 100+ new bugs in the worker, at
> minimum.
> > >
> > > Are there known blockers for upgrading to spotbugs so we are depending
> on an active project?
> > >
> > > Kenn
>


-- 
Cheers,
Gleb


Re: Another new contributor!

2019-01-31 Thread Ismaël Mejía
Welcome Brian!
Great to have someone with Apache experience already and also with
Arrow knowledge.

On Thu, Jan 31, 2019 at 1:32 PM Maximilian Michels  wrote:
>
> Welcome! Arrow and Beam together would open lots of possibilities. Portability
> documentation improvements would be much appreciated :)
>
> On 31.01.19 11:25, Łukasz Gajowy wrote:
> > Welcome!
> >
> > czw., 31 sty 2019 o 02:40 Kenneth Knowles  > > napisał(a):
> >
> > Welcome!
> >
> > On Wed, Jan 30, 2019, 17:30 Connell O'Callaghan  >  wrote:
> >
> > Welcome on board Brian!
> >
> > On Wed, Jan 30, 2019 at 5:29 PM Ahmet Altay  > > wrote:
> >
> > Welcome Brian!
> >
> > On Wed, Jan 30, 2019 at 5:26 PM Brian Hulette 
> >  > > wrote:
> >
> > Hi everyone,
> > I'm Brian Hulette, I just switched roles at Google and I'll 
> > be
> > contributing to Beam Portability as part of my new 
> > position. For
> > now I'm just going through documentation and getting 
> > familiar
> > with Beam from the user perspective, so if anything I'll 
> > just be
> > suggesting minor edits to documentation, but I hope to be
> > putting up PRs soon enough.
> >
> > I am also an Apache committer (bhulette is my ASF id and 
> > Jira
> > username). I worked on the Arrow project's Javascript
> > implementation in a previous job, and I'm really excited to 
> > look
> > for ways to use Arrow and Beam together once I've ramped up.
> >
> > Brian
> >


Re: 2.7.1 (LTS) release?

2019-01-31 Thread Maximilian Michels

I agree it's better to take some extra time to ensure the quality of 2.10.0.

I've created a 2.7.1 branch and cherry-picked the relevant commits[1]. We could 
start collecting other fixes in case there are any.


-Max

[1] https://github.com/apache/beam/pull/7687

On 30.01.19 20:57, Kenneth Knowles wrote:
Sounds good to me to target 2.7.1 and 2.10.0. I will have to re-roll RC2 after 
confirming fixes for the latest blockers that were found. These are not 
regressions from 2.9.0. But they seem severe enough that they are worth taking 
an extra day or two, because 2.9.0 had enough problems that I would like to make 
2.10.0 a more attractive upgrade target for users still on very old versions.


Kenn

On Wed, Jan 30, 2019 at 5:22 AM Maximilian Michels > wrote:


Hi everyone,

I know we are in the midst of releasing 2.10.0, but with the release process
taking its time I consider creating a patch release for this issue in the
FlinkRunner: https://jira.apache.org/jira/browse/BEAM-5386

Initially I thought it would be good to do a 2.9.1 release, but since we
have an
LTS version, we should probably do a 2.7.1 (LTS) release instead.

What do you think? I could only find one Fix Version 2.7.1 issue in JIRA:

https://jira.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.7.1

Best,
Max



Re: Another new contributor!

2019-01-31 Thread Maximilian Michels
Welcome! Arrow and Beam together would open lots of possibilities. Portability 
documentation improvements would be much appreciated :)


On 31.01.19 11:25, Łukasz Gajowy wrote:

Welcome!

czw., 31 sty 2019 o 02:40 Kenneth Knowles > napisał(a):


Welcome!

On Wed, Jan 30, 2019, 17:30 Connell O'Callaghan mailto:conne...@google.com> wrote:

Welcome on board Brian!

On Wed, Jan 30, 2019 at 5:29 PM Ahmet Altay mailto:al...@google.com>> wrote:

Welcome Brian!

On Wed, Jan 30, 2019 at 5:26 PM Brian Hulette mailto:bhule...@google.com>> wrote:

Hi everyone,
I'm Brian Hulette, I just switched roles at Google and I'll be
contributing to Beam Portability as part of my new position. For
now I'm just going through documentation and getting familiar
with Beam from the user perspective, so if anything I'll just be
suggesting minor edits to documentation, but I hope to be
putting up PRs soon enough.

I am also an Apache committer (bhulette is my ASF id and Jira
username). I worked on the Arrow project's Javascript
implementation in a previous job, and I'm really excited to look
for ways to use Arrow and Beam together once I've ramped up.

Brian



Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels

Hi Matthias,

This is already reflected in the compatibility matrix, if you look under SDF. 
There is no UnboundedSource interface for portable pipelines. That's a legacy 
abstraction that will be replaced with SDF.


Fortunately, there is already a pending PR for cross-language pipelines which 
will allow us to use Java IO like PubSub in Python jobs.


Thanks,
Max

On 31.01.19 12:06, Matthias Baetens wrote:

Hey Ankur,

Thanks for the swift reply. Should I change this in the capability matrix 
 then?


Many thanks.
Best,
Matthias

On Thu, 31 Jan 2019 at 09:31, Ankur Goenka > wrote:


Hi Matthias,

Unfortunately, unbounded reads including pubsub are not yet supported for
portable runners.

Thanks,
Ankur

On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens mailto:baetensmatth...@gmail.com>> wrote:

Hi everyone,

Last few days I have been trying to run a streaming pipeline (code on
Github ) on a Flink Runner.

I am running a Flink cluster locally (v1.5.6
)
I have built the SDK Harness Container: /./gradlew
:beam-sdks-python-container:docker/
and started the JobServer: /./gradlew
:beam-runners-flink_2.11-job-server:runShadow
-PflinkMasterUrl=localhost:8081./

I run my pipeline with: /env/bin/python streaming_pipeline.py
--runner=PortableRunner --job_endpoint=localhost:8099 --output xxx
--input_subscription xxx --output_subscription xxx/
/
/
All this is running inside a Ubuntu (Bionic) in a Virtualbox.

The job submits fine, but unfortunately fails after a few seconds with
the error attached.

Anything I am missing or doing wrong?

Many thanks.
Best,
Matthias




Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Matthias Baetens
Hey Ankur,

Thanks for the swift reply. Should I change this in the capability matrix
 then?

Many thanks.
Best,
Matthias

On Thu, 31 Jan 2019 at 09:31, Ankur Goenka  wrote:

> Hi Matthias,
>
> Unfortunately, unbounded reads including pubsub are not yet supported for
> portable runners.
>
> Thanks,
> Ankur
>
> On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens <
> baetensmatth...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> Last few days I have been trying to run a streaming pipeline (code on
>> Github ) on a Flink Runner.
>>
>> I am running a Flink cluster locally (v1.5.6
>> )
>> I have built the SDK Harness Container: *./gradlew
>> :beam-sdks-python-container:docker*
>> and started the JobServer: *./gradlew
>> :beam-runners-flink_2.11-job-server:runShadow
>> -PflinkMasterUrl=localhost:8081.*
>>
>> I run my pipeline with: *env/bin/python streaming_pipeline.py
>> --runner=PortableRunner --job_endpoint=localhost:8099 --output xxx
>> --input_subscription xxx --output_subscription xxx*
>>
>> All this is running inside a Ubuntu (Bionic) in a Virtualbox.
>>
>> The job submits fine, but unfortunately fails after a few seconds with
>> the error attached.
>>
>> Anything I am missing or doing wrong?
>>
>> Many thanks.
>> Best,
>> Matthias
>>
>>
>>


Re: Another new contributor!

2019-01-31 Thread Łukasz Gajowy
Welcome!

czw., 31 sty 2019 o 02:40 Kenneth Knowles  napisał(a):

> Welcome!
>
> On Wed, Jan 30, 2019, 17:30 Connell O'Callaghan  wrote:
>
>> Welcome on board Brian!
>>
>> On Wed, Jan 30, 2019 at 5:29 PM Ahmet Altay  wrote:
>>
>>> Welcome Brian!
>>>
>>> On Wed, Jan 30, 2019 at 5:26 PM Brian Hulette 
>>> wrote:
>>>
 Hi everyone,
 I'm Brian Hulette, I just switched roles at Google and I'll be
 contributing to Beam Portability as part of my new position. For now I'm
 just going through documentation and getting familiar with Beam from the
 user perspective, so if anything I'll just be suggesting minor edits to
 documentation, but I hope to be putting up PRs soon enough.

 I am also an Apache committer (bhulette is my ASF id and Jira
 username). I worked on the Arrow project's Javascript implementation in a
 previous job, and I'm really excited to look for ways to use Arrow and Beam
 together once I've ramped up.

 Brian

>>>


Re: Findbugs -> Spotbugs ?

2019-01-31 Thread Ismaël Mejía
YES PLEASE let's move to spotbugs !
Findbugs has not had a new release in ages, and does not support Java
11 either, so this will address another possible issue.

On Thu, Jan 31, 2019 at 8:28 AM Kenneth Knowles  wrote:
>
> Over the last few hours I activated findbugs on the Dataflow Java worker and 
> fixed or suppressed the errors. They started around 60 but fixing some 
> uncovered others, etc. You can see the result at 
> https://github.com/apache/beam/pull/7684.
>
> It has convinced me that findbugs still adds value, beyond errorprone and 
> nullaway/checker/infer. Quite a few of the issues were not nullability 
> related, though nullability remains the most obvious low-hanging fruit where 
> a different tool would do even better than findbugs. I have not yet enable 
> "non null by default" which exposes 100+ new bugs in the worker, at minimum.
>
> Are there known blockers for upgrading to spotbugs so we are depending on an 
> active project?
>
> Kenn


Re: Findbugs -> Spotbugs ?

2019-01-31 Thread Ismaël Mejía
Not a blocker but there is not a spotbugs plugin for IntelliJ.

On Thu, Jan 31, 2019 at 10:45 AM Ismaël Mejía  wrote:
>
> YES PLEASE let's move to spotbugs !
> Findbugs has not had a new release in ages, and does not support Java
> 11 either, so this will address another possible issue.
>
> On Thu, Jan 31, 2019 at 8:28 AM Kenneth Knowles  wrote:
> >
> > Over the last few hours I activated findbugs on the Dataflow Java worker 
> > and fixed or suppressed the errors. They started around 60 but fixing some 
> > uncovered others, etc. You can see the result at 
> > https://github.com/apache/beam/pull/7684.
> >
> > It has convinced me that findbugs still adds value, beyond errorprone and 
> > nullaway/checker/infer. Quite a few of the issues were not nullability 
> > related, though nullability remains the most obvious low-hanging fruit 
> > where a different tool would do even better than findbugs. I have not yet 
> > enable "non null by default" which exposes 100+ new bugs in the worker, at 
> > minimum.
> >
> > Are there known blockers for upgrading to spotbugs so we are depending on 
> > an active project?
> >
> > Kenn


Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Ankur Goenka
Hi Matthias,

Unfortunately, unbounded reads including pubsub are not yet supported for
portable runners.

Thanks,
Ankur

On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens 
wrote:

> Hi everyone,
>
> Last few days I have been trying to run a streaming pipeline (code on
> Github ) on a Flink Runner.
>
> I am running a Flink cluster locally (v1.5.6
> )
> I have built the SDK Harness Container: *./gradlew
> :beam-sdks-python-container:docker*
> and started the JobServer: *./gradlew
> :beam-runners-flink_2.11-job-server:runShadow
> -PflinkMasterUrl=localhost:8081.*
>
> I run my pipeline with: *env/bin/python streaming_pipeline.py
> --runner=PortableRunner --job_endpoint=localhost:8099 --output xxx
> --input_subscription xxx --output_subscription xxx*
>
> All this is running inside a Ubuntu (Bionic) in a Virtualbox.
>
> The job submits fine, but unfortunately fails after a few seconds with the
> error attached.
>
> Anything I am missing or doing wrong?
>
> Many thanks.
> Best,
> Matthias
>
>
>


Re: New contributor: Michał Walenia

2019-01-31 Thread Alexey Romanenko
Welcome on board, Michał!

> On 31 Jan 2019, at 10:17, Reza Ardeshir Rokni  wrote:
> 
> Welcome! 
> 
> On Thu, 31 Jan 2019 at 15:48, Michał Walenia  > wrote:
> HI all,
> thanks for a warm welcome :)
> 
> Michał
> 
>> Wiadomość napisana przez Ahmet Altay > > w dniu 30.01.2019, o godz. 21:32:
>> 
>> Welcome Michał!
>> 
>> On Wed, Jan 30, 2019 at 11:38 AM Kenneth Knowles > > wrote:
>> Welcome Michał!
>> 
>> Kenn
>> 
>> *And maybe your system uses a compose key. Ubuntu: 
>> https://help.ubuntu.com/community/ComposeKey 
>> . It is composition of L and / 
>> just like it looks. (unless I can't see it clearly)
>> 
>> 
>> On Wed, Jan 30, 2019 at 10:20 AM Rui Wang > > wrote:
>> Welcome! Welcome!
>> 
>> -Rui
>> 
>> On Wed, Jan 30, 2019 at 9:22 AM Łukasz Gajowy > > wrote:
>> Impressive, so many ways! I didn't know the mac trick though, thanks Ankur. 
>> :D
>> 
>> śr., 30 sty 2019 o 17:24 Ismaël Mejía > > napisał(a):
>> Welcome Michał!
>> 
>> For more foreign languages copy/pastables characters:
>> http://polish.typeit.org/ 
>> 
>> Yay for more people with crazy accents, (yes I know I can be biased :P)
>> 
>> Ismaël
>> 
>> On Wed, Jan 30, 2019 at 3:30 PM Ankur Goenka > > wrote:
>> >
>> > Welcome Michał!
>> >
>> > long press "l" on mac to type "ł' :)
>> >
>> > On Wed, Jan 30, 2019 at 7:57 PM Maximilian Michels > > > wrote:
>> >>
>> >> Welcome Michał!
>> >>
>> >> I do have to find out how to type ł without copy/pasting it every time ;)
>> >>
>> >> On 30.01.19 15:22, Łukasz Gajowy wrote:
>> >> > Hi all,
>> >> >
>> >> > a new fellow joined Kasia Kucharczyk and me to work on integration and 
>> >> > load
>> >> > testing areas. Welcome, Michał!
>> >> >
>> >> > Łukasz
>> >> >
> 



Re: New contributor: Michał Walenia

2019-01-31 Thread Reza Ardeshir Rokni
Welcome!

On Thu, 31 Jan 2019 at 15:48, Michał Walenia 
wrote:

> HI all,
> thanks for a warm welcome :)
>
> Michał
>
> Wiadomość napisana przez Ahmet Altay  w dniu
> 30.01.2019, o godz. 21:32:
>
> Welcome Michał!
>
> On Wed, Jan 30, 2019 at 11:38 AM Kenneth Knowles  wrote:
>
>> Welcome Michał!
>>
>> Kenn
>>
>> *And maybe your system uses a compose key. Ubuntu:
>> https://help.ubuntu.com/community/ComposeKey. It is composition of L and
>> / just like it looks. (unless I can't see it clearly)
>>
>>
>> On Wed, Jan 30, 2019 at 10:20 AM Rui Wang  wrote:
>>
>>> Welcome! Welcome!
>>>
>>> -Rui
>>>
>>> On Wed, Jan 30, 2019 at 9:22 AM Łukasz Gajowy 
>>> wrote:
>>>
 Impressive, so many ways! I didn't know the mac trick though, thanks
 Ankur. :D

 śr., 30 sty 2019 o 17:24 Ismaël Mejía  napisał(a):

> Welcome Michał!
>
> For more foreign languages copy/pastables characters:
> http://polish.typeit.org/
>
> Yay for more people with crazy accents, (yes I know I can be biased :P)
>
> Ismaël
>
> On Wed, Jan 30, 2019 at 3:30 PM Ankur Goenka 
> wrote:
> >
> > Welcome Michał!
> >
> > long press "l" on mac to type "ł' :)
> >
> > On Wed, Jan 30, 2019 at 7:57 PM Maximilian Michels 
> wrote:
> >>
> >> Welcome Michał!
> >>
> >> I do have to find out how to type ł without copy/pasting it every
> time ;)
> >>
> >> On 30.01.19 15:22, Łukasz Gajowy wrote:
> >> > Hi all,
> >> >
> >> > a new fellow joined Kasia Kucharczyk and me to work on
> integration and load
> >> > testing areas. Welcome, Michał!
> >> >
> >> > Łukasz
> >> >
>

>


Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Matthias Baetens
Hi everyone,

Last few days I have been trying to run a streaming pipeline (code on Github
) on a Flink Runner.

I am running a Flink cluster locally (v1.5.6
)
I have built the SDK Harness Container: *./gradlew
:beam-sdks-python-container:docker*
and started the JobServer: *./gradlew
:beam-runners-flink_2.11-job-server:runShadow
-PflinkMasterUrl=localhost:8081.*

I run my pipeline with: *env/bin/python streaming_pipeline.py
--runner=PortableRunner --job_endpoint=localhost:8099 --output xxx
--input_subscription xxx --output_subscription xxx*

All this is running inside a Ubuntu (Bionic) in a Virtualbox.

The job submits fine, but unfortunately fails after a few seconds with the
error attached.

Anything I am missing or doing wrong?

Many thanks.
Best,
Matthias
TimerException{java.lang.RuntimeException: Failed to finish remote bundle}
at 
org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$RepeatedTriggerTask.run(SystemProcessingTimeService.java:335)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to finish remote bundle
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:624)
at 
org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.finishBundle(DoFnRunnerWithMetricsUpdate.java:87)
at 
org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.finishBundle(SimplePushbackSideInputDoFnRunner.java:118)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.invokeFinishBundle(DoFnOperator.java:679)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.checkInvokeFinishBundleByTime(DoFnOperator.java:673)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.lambda$open$1(DoFnOperator.java:378)
at 
org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$RepeatedTriggerTask.run(SystemProcessingTimeService.java:330)
... 7 more
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
Error received from SDK harness for instruction 5: Traceback (most recent call 
last):
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 148, in _execute
response = task()
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 183, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 256, in do_instruction
request.instruction_id)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 272, in process_bundle
bundle_processor.process_bundle(instruction_id)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 489, in process_bundle
].process_encoded(data.data)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 126, in process_encoded
self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 182, in 
apache_beam.runners.worker.operations.Operation.output
def output(self, windowed_value, output_index=0):
  File "apache_beam/runners/worker/operations.py", line 183, in 
apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 89, in 
apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 497, in 
apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 498, in 
apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 678, in 
apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
  File "apache_beam/runners/common.py", line 684, in 
apache_beam.runners.common.DoFnRunner.process