Re: @TearDown guarantees

2018-02-20 Thread Romain Manni-Bucau
Le 21 févr. 2018 07:26, "Reuven Lax"  a écrit :

To close the loop here:

Romain, I think your actual concern was that the Javadoc made it sound like
a runner could simply decide not to call Teardown. If so, then I agree with
you - the Javadoc was misleading (and appears it was confusing to Ismael as
well). If a runner destroys a DoFn, it _must_ call TearDown before it calls
Setup on a new DoFn.


95% yes. 5 remaining % being a runner+setup must do its best to call it
whatever happens and the setup must allow it (avoiding blind kills as a
normal stop procedure for instance).


If so, then most of the back and forth on this thread had little to do with
your actual concern. However it did take almost three days of discussion
before Eugene understood what your real concern was, leading to the side
discussions.


Underlying issue which popped is that beam doesnt give yet much importance
to the lifecycle of the instances it manages in a lot of places so Im quite
careful when a potential regression happens at API level. This is key if
beam can enable to build dsl and other API on top of itself.

Now Im not sure what was unclear in the first mail, happy to get feedback
on it - feel free to ping me offline to not bother everyone ;).

And thanks for the fix again.



Reuven

On Mon, Feb 19, 2018 at 6:08 PM, Reuven Lax  wrote:

> +1 This PR clarifies the semantics quite a bit.
>
> On Mon, Feb 19, 2018 at 3:24 PM, Eugene Kirpichov 
> wrote:
>
>> I've sent out a PR editing the Javadoc https://github.com/apa
>> che/beam/pull/4711 . Hopefully, that should be sufficient.
>>
>> On Mon, Feb 19, 2018 at 3:20 PM Reuven Lax  wrote:
>>
>>> Ismael, your understanding is appropriate for FinishBundle.
>>>
>>> One basic issue with this understanding, is that the lifecycle of a DoFn
>>> is much longer than a single bundle (which I think you expressed by adding
>>> the *s). How long the DoFn lives is not defined. In fact a runner is
>>> completely free to decide that it will _never_ destroy the DoFn, in which
>>> case TearDown is never called simply because the DoFn was never torn down.
>>>
>>> Also, as mentioned before, the runner can only call TearDown in cases
>>> where the shutdown is in its control. If the JVM is shut down externally,
>>> the runner has no chance to call TearDown. This means that while TearDown
>>> is appropriate for cleaning up in-process resources (open connections,
>>> etc.), it's not the right answer for cleaning up persistent resources. If
>>> you rely on TearDown to delete VMs or delete files, there will be cases in
>>> which those files of VMs are not deleted.
>>>
>>> What we are _not_ saying is that the runner is free to just ignore
>>> TearDown. If the runner is explicitly destroying a DoFn object, it should
>>> call TearDown.
>>>
>>> Reuven
>>>
>>>
>>> On Mon, Feb 19, 2018 at 2:35 PM, Ismaël Mejía  wrote:
>>>
 I also had a different understanding of the lifecycle of a DoFn.

 My understanding of the use case for every method in the DoFn was clear
 and
 perfectly aligned with Thomas explanation, but what I understood was
 that in a
 general terms ‘@Setup was where I got resources/prepare connections and
 @Teardown where I free them’, so calling Teardown seemed essential to
 have a
 complete lifecycle:
 Setup → StartBundle* → ProcessElement* → FinishBundle* → Teardown

 The fact that @Teardown could not be called is a new detail for me too,
 and I
 also find weird to have a method that may or not be called as part of
 an API,
 why would users implement teardown if it will not be called? In that
 case
 probably a cleaner approach would be to get rid of that method
 altogether, no?

 But well maybe that’s not so easy too, there was another point: Some
 user
 reported an issue with leaking resources using KafkaIO in the Spark
 runner, for
 ref.
 https://apachebeam.slack.com/archives/C1AAFJYMP/p1510596938000622

 In that moment my understanding was that there was something fishy
 because we
 should be calling Teardown to close correctly the connections and free
 the
 resources in case of exceptions on start/process/finish, so I filled a
 JIRA and
 fixed this by enforcing the call of teardown for the Spark runner and
 the Flink
 runner:
 https://issues.apache.org/jira/browse/BEAM-3187
 https://issues.apache.org/jira/browse/BEAM-3244

 As you can see not calling this method does have consequences at least
 for
 non-containerized runners. Of course a runner that uses containers
 could not
 care about cleaning the resources this way, but a long living JVM in a
 Hadoop
 environment probably won’t have the same luck. So I am not sure that
 having a
 loose semantic there is the right option, I mean, runners could simply
 

Re: @TearDown guarantees

2018-02-20 Thread Reuven Lax
To close the loop here:

Romain, I think your actual concern was that the Javadoc made it sound like
a runner could simply decide not to call Teardown. If so, then I agree with
you - the Javadoc was misleading (and appears it was confusing to Ismael as
well). If a runner destroys a DoFn, it _must_ call TearDown before it calls
Setup on a new DoFn.

If so, then most of the back and forth on this thread had little to do with
your actual concern. However it did take almost three days of discussion
before Eugene understood what your real concern was, leading to the side
discussions.

Reuven

On Mon, Feb 19, 2018 at 6:08 PM, Reuven Lax  wrote:

> +1 This PR clarifies the semantics quite a bit.
>
> On Mon, Feb 19, 2018 at 3:24 PM, Eugene Kirpichov 
> wrote:
>
>> I've sent out a PR editing the Javadoc https://github.com/apa
>> che/beam/pull/4711 . Hopefully, that should be sufficient.
>>
>> On Mon, Feb 19, 2018 at 3:20 PM Reuven Lax  wrote:
>>
>>> Ismael, your understanding is appropriate for FinishBundle.
>>>
>>> One basic issue with this understanding, is that the lifecycle of a DoFn
>>> is much longer than a single bundle (which I think you expressed by adding
>>> the *s). How long the DoFn lives is not defined. In fact a runner is
>>> completely free to decide that it will _never_ destroy the DoFn, in which
>>> case TearDown is never called simply because the DoFn was never torn down.
>>>
>>> Also, as mentioned before, the runner can only call TearDown in cases
>>> where the shutdown is in its control. If the JVM is shut down externally,
>>> the runner has no chance to call TearDown. This means that while TearDown
>>> is appropriate for cleaning up in-process resources (open connections,
>>> etc.), it's not the right answer for cleaning up persistent resources. If
>>> you rely on TearDown to delete VMs or delete files, there will be cases in
>>> which those files of VMs are not deleted.
>>>
>>> What we are _not_ saying is that the runner is free to just ignore
>>> TearDown. If the runner is explicitly destroying a DoFn object, it should
>>> call TearDown.
>>>
>>> Reuven
>>>
>>>
>>> On Mon, Feb 19, 2018 at 2:35 PM, Ismaël Mejía  wrote:
>>>
 I also had a different understanding of the lifecycle of a DoFn.

 My understanding of the use case for every method in the DoFn was clear
 and
 perfectly aligned with Thomas explanation, but what I understood was
 that in a
 general terms ‘@Setup was where I got resources/prepare connections and
 @Teardown where I free them’, so calling Teardown seemed essential to
 have a
 complete lifecycle:
 Setup → StartBundle* → ProcessElement* → FinishBundle* → Teardown

 The fact that @Teardown could not be called is a new detail for me too,
 and I
 also find weird to have a method that may or not be called as part of
 an API,
 why would users implement teardown if it will not be called? In that
 case
 probably a cleaner approach would be to get rid of that method
 altogether, no?

 But well maybe that’s not so easy too, there was another point: Some
 user
 reported an issue with leaking resources using KafkaIO in the Spark
 runner, for
 ref.
 https://apachebeam.slack.com/archives/C1AAFJYMP/p1510596938000622

 In that moment my understanding was that there was something fishy
 because we
 should be calling Teardown to close correctly the connections and free
 the
 resources in case of exceptions on start/process/finish, so I filled a
 JIRA and
 fixed this by enforcing the call of teardown for the Spark runner and
 the Flink
 runner:
 https://issues.apache.org/jira/browse/BEAM-3187
 https://issues.apache.org/jira/browse/BEAM-3244

 As you can see not calling this method does have consequences at least
 for
 non-containerized runners. Of course a runner that uses containers
 could not
 care about cleaning the resources this way, but a long living JVM in a
 Hadoop
 environment probably won’t have the same luck. So I am not sure that
 having a
 loose semantic there is the right option, I mean, runners could simply
 guarantee
 that they call teardown and if teardown takes too long they can decide
 to send a
 signal or kill the process/container/etc and go ahead, that way at
 least users
 would have a motivation to implement the teardown method, otherwise it
 doesn’t
 make any sense to have it (API wise).

 On Mon, Feb 19, 2018 at 11:30 PM, Eugene Kirpichov <
 kirpic...@google.com> wrote:
 > Romain, would it be fair to say that currently the goal of your
 > participation in this discussion is to identify situations where
 @Teardown
 > in principle could have been called, but some of the current runners
 don't
 > make a good enough effort to call it? If 

Re: Beam 2.4.0

2018-02-20 Thread Reuven Lax
I think it's fair to request that the reviewers of these PRs help with your
effort to get them merged before the 2.4.0 cut. Existing comments on the PR
imply that reviewers think the approaches are reasonable. Assuming that
there's not too much work left to be done to address comments, there's a
good chance of getting them in.

Reuven

On Tue, Feb 20, 2018 at 10:10 PM, Romain Manni-Bucau 
wrote:

> Ok
>
> In terms of what I'd like included, here is the list:
>
> 1. https://github.com/apache/beam/pull/4412 (important to prevent
> regressions)
> 2. https://github.com/apache/beam/pull/4674 (can need some more work but
> can break some api if we do, so current state is a functional trade off).
> On a more personal side Im blocked by this one for some features.
> 3. https://github.com/apache/beam/pull/4372 (important cause doesnt make
> the execution deterministic depending your surefire config, IDE, main usage)
>
>
>
> Le 21 févr. 2018 01:29, "Reuven Lax"  a écrit :
>
>> +1, this is keeping with an every-six weeks cadence.
>>
>> Romain, you can always target Jiras to this release, and then the release
>> manager can decide on a case-by-case basis whether to make sure the fix is
>> included.
>>
>> On Tue, Feb 20, 2018 at 2:30 PM, Robert Bradshaw 
>> wrote:
>>
>>> Yep. I am starting the "Let's do a 2.4.0 release" thread almost
>>> exactly 6 weeks after JB first started the 2.3.0 release thread.
>>>
>>> On Tue, Feb 20, 2018 at 2:20 PM, Charles Chen  wrote:
>>> > I would like to +1 the faster release cycle process JB and Robert have
>>> been
>>> > advocating and implementing, and thank JB for releasing 2.3.0 smoothly.
>>> > When we block for specific features and increase the time between
>>> releases,
>>> > we increase the urgency for PR authors to push for their change to go
>>> into
>>> > an upcoming release, which is a feedback loop that results in our
>>> releases
>>> > taking months instead of weeks.  We should however try to get pending
>>> PRs
>>> > wrapped up.
>>> >
>>> > On Tue, Feb 20, 2018 at 2:15 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just
>>> out
>>> >> so 1 week is a bit fast IMHO.
>>> >>
>>> >> Le 20 févr. 2018 23:13, "Robert Bradshaw"  a
>>> écrit :
>>> >>>
>>> >>> One of the main shifts that I think helped this release was
>>> explicitly
>>> >>> not being feature driven, rather releasing what's already in the
>>> >>> branch. That doesn't mean it's not a good call to action to try and
>>> >>> get long-pending PRs or similar wrapped up.
>>> >>>
>>> >>> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
>>> >>>  wrote:
>>> >>> > There are a lot of long pending PR, would be good to merge them
>>> before
>>> >>> > 2.4.
>>> >>> > Some are bringing tests for the 2.3 release which can be critical
>>> to
>>> >>> > include.
>>> >>> >
>>> >>> > Maybe we should list the pr and jira we want it before picking a
>>> date?
>>> >>> >
>>> >>> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" <
>>> katsia...@google.com>
>>> >>> > a
>>> >>> > écrit :
>>> >>> >>
>>> >>> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6
>>> (and
>>> >>> >> the
>>> >>> >> latter already has an RC out, so we will likely be blocked on
>>> Beam).
>>> >>> >>
>>> >>> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw
>>> >>> >> 
>>> >>> >> wrote:
>>> >>> >>>
>>> >>> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all
>>> that
>>> >>> >>> made this happen!) It'd be great to keep the ball rolling for a
>>> >>> >>> similarly well-executed 2.4. A lot has gone in [1] since we made
>>> the
>>> >>> >>> 2.3 cut, and to keep our cadence up I would propose a time-based
>>> cut
>>> >>> >>> date early next week (say the 28th).
>>> >>> >>>
>>> >>> >>> I'll volunteer to do this release.
>>> >>> >>>
>>> >>> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...maste
>>> r
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Gus Katsiapis | Software Engineer | katsia...@google.com |
>>> >>> >> 650-918-7487
>>>
>>
>>


Re: Beam 2.4.0

2018-02-20 Thread Romain Manni-Bucau
Ok

In terms of what I'd like included, here is the list:

1. https://github.com/apache/beam/pull/4412 (important to prevent
regressions)
2. https://github.com/apache/beam/pull/4674 (can need some more work but
can break some api if we do, so current state is a functional trade off).
On a more personal side Im blocked by this one for some features.
3. https://github.com/apache/beam/pull/4372 (important cause doesnt make
the execution deterministic depending your surefire config, IDE, main usage)



Le 21 févr. 2018 01:29, "Reuven Lax"  a écrit :

> +1, this is keeping with an every-six weeks cadence.
>
> Romain, you can always target Jiras to this release, and then the release
> manager can decide on a case-by-case basis whether to make sure the fix is
> included.
>
> On Tue, Feb 20, 2018 at 2:30 PM, Robert Bradshaw 
> wrote:
>
>> Yep. I am starting the "Let's do a 2.4.0 release" thread almost
>> exactly 6 weeks after JB first started the 2.3.0 release thread.
>>
>> On Tue, Feb 20, 2018 at 2:20 PM, Charles Chen  wrote:
>> > I would like to +1 the faster release cycle process JB and Robert have
>> been
>> > advocating and implementing, and thank JB for releasing 2.3.0 smoothly.
>> > When we block for specific features and increase the time between
>> releases,
>> > we increase the urgency for PR authors to push for their change to go
>> into
>> > an upcoming release, which is a feedback loop that results in our
>> releases
>> > taking months instead of weeks.  We should however try to get pending
>> PRs
>> > wrapped up.
>> >
>> > On Tue, Feb 20, 2018 at 2:15 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com>
>> > wrote:
>> >>
>> >> Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just
>> out
>> >> so 1 week is a bit fast IMHO.
>> >>
>> >> Le 20 févr. 2018 23:13, "Robert Bradshaw"  a
>> écrit :
>> >>>
>> >>> One of the main shifts that I think helped this release was explicitly
>> >>> not being feature driven, rather releasing what's already in the
>> >>> branch. That doesn't mean it's not a good call to action to try and
>> >>> get long-pending PRs or similar wrapped up.
>> >>>
>> >>> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
>> >>>  wrote:
>> >>> > There are a lot of long pending PR, would be good to merge them
>> before
>> >>> > 2.4.
>> >>> > Some are bringing tests for the 2.3 release which can be critical to
>> >>> > include.
>> >>> >
>> >>> > Maybe we should list the pr and jira we want it before picking a
>> date?
>> >>> >
>> >>> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" <
>> katsia...@google.com>
>> >>> > a
>> >>> > écrit :
>> >>> >>
>> >>> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6
>> (and
>> >>> >> the
>> >>> >> latter already has an RC out, so we will likely be blocked on
>> Beam).
>> >>> >>
>> >>> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw
>> >>> >> 
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all
>> that
>> >>> >>> made this happen!) It'd be great to keep the ball rolling for a
>> >>> >>> similarly well-executed 2.4. A lot has gone in [1] since we made
>> the
>> >>> >>> 2.3 cut, and to keep our cadence up I would propose a time-based
>> cut
>> >>> >>> date early next week (say the 28th).
>> >>> >>>
>> >>> >>> I'll volunteer to do this release.
>> >>> >>>
>> >>> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Gus Katsiapis | Software Engineer | katsia...@google.com |
>> >>> >> 650-918-7487
>>
>
>


Re: [YouTube channel] Add video: Apache Beam meetup London 2: use case in finance + IO in Beam and Splittable DoFns

2018-02-20 Thread Gaurav Thakur
How can people get access to the channel?

Thanks, Gaurav

On Wed, Feb 21, 2018 at 1:34 PM, Matthias Baetens <
matthias.baet...@datatonic.com> wrote:

> Hi all,
>
> This is a proposal to launch the Apache Beam YouTube channel and at the
> same time add the first video to the channel.
>
> We would like to use the channel to centralize all videos / recordings
> related to Apache Beam and make it a community driven channel, so people
> have a one stop shop for learnings about Beam.
>
> The first video would be the recording of the second Beam meetup in London:
> Apache Beam meetup London 2: use case in finance + IO in Beam and
> Splittable DoFns. The video can be seen on the channel if you have login
> details (it is currently set to private).
>
> Please let me know if there are any questions or comments!
>
> Best regards,
> Matthias
>


[YouTube channel] Add video: Apache Beam meetup London 2: use case in finance + IO in Beam and Splittable DoFns

2018-02-20 Thread Matthias Baetens
Hi all,

This is a proposal to launch the Apache Beam YouTube channel and at the
same time add the first video to the channel.

We would like to use the channel to centralize all videos / recordings
related to Apache Beam and make it a community driven channel, so people
have a one stop shop for learnings about Beam.

The first video would be the recording of the second Beam meetup in London:
Apache Beam meetup London 2: use case in finance + IO in Beam and
Splittable DoFns. The video can be seen on the channel if you have login
details (it is currently set to private).

Please let me know if there are any questions or comments!

Best regards,
Matthias


Re: Beam 2.4.0

2018-02-20 Thread Reuven Lax
+1, this is keeping with an every-six weeks cadence.

Romain, you can always target Jiras to this release, and then the release
manager can decide on a case-by-case basis whether to make sure the fix is
included.

On Tue, Feb 20, 2018 at 2:30 PM, Robert Bradshaw 
wrote:

> Yep. I am starting the "Let's do a 2.4.0 release" thread almost
> exactly 6 weeks after JB first started the 2.3.0 release thread.
>
> On Tue, Feb 20, 2018 at 2:20 PM, Charles Chen  wrote:
> > I would like to +1 the faster release cycle process JB and Robert have
> been
> > advocating and implementing, and thank JB for releasing 2.3.0 smoothly.
> > When we block for specific features and increase the time between
> releases,
> > we increase the urgency for PR authors to push for their change to go
> into
> > an upcoming release, which is a feedback loop that results in our
> releases
> > taking months instead of weeks.  We should however try to get pending PRs
> > wrapped up.
> >
> > On Tue, Feb 20, 2018 at 2:15 PM Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >>
> >> Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just out
> >> so 1 week is a bit fast IMHO.
> >>
> >> Le 20 févr. 2018 23:13, "Robert Bradshaw"  a
> écrit :
> >>>
> >>> One of the main shifts that I think helped this release was explicitly
> >>> not being feature driven, rather releasing what's already in the
> >>> branch. That doesn't mean it's not a good call to action to try and
> >>> get long-pending PRs or similar wrapped up.
> >>>
> >>> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
> >>>  wrote:
> >>> > There are a lot of long pending PR, would be good to merge them
> before
> >>> > 2.4.
> >>> > Some are bringing tests for the 2.3 release which can be critical to
> >>> > include.
> >>> >
> >>> > Maybe we should list the pr and jira we want it before picking a
> date?
> >>> >
> >>> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" <
> katsia...@google.com>
> >>> > a
> >>> > écrit :
> >>> >>
> >>> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6
> (and
> >>> >> the
> >>> >> latter already has an RC out, so we will likely be blocked on Beam).
> >>> >>
> >>> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw
> >>> >> 
> >>> >> wrote:
> >>> >>>
> >>> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all that
> >>> >>> made this happen!) It'd be great to keep the ball rolling for a
> >>> >>> similarly well-executed 2.4. A lot has gone in [1] since we made
> the
> >>> >>> 2.3 cut, and to keep our cadence up I would propose a time-based
> cut
> >>> >>> date early next week (say the 28th).
> >>> >>>
> >>> >>> I'll volunteer to do this release.
> >>> >>>
> >>> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Gus Katsiapis | Software Engineer | katsia...@google.com |
> >>> >> 650-918-7487
>


Re: Beam 2.4.0

2018-02-20 Thread Rafael Fernandez
+1 on having release trains scheduled.

Romain: Do you have a list of PRs that could benefit from increased focus
if they want to make it on the upcoming train?


On Tue, Feb 20, 2018 at 3:30 PM Ahmet Altay  wrote:

> +1 for having regular release cycles. Finalizing a release takes time in
> the order of a few weeks and starting a new release soon after the previous
> one is a reliable way for having releases every 6 weeks.
>
> On Tue, Feb 20, 2018 at 2:30 PM, Robert Bradshaw 
> wrote:
>
>> Yep. I am starting the "Let's do a 2.4.0 release" thread almost
>> exactly 6 weeks after JB first started the 2.3.0 release thread.
>>
>> On Tue, Feb 20, 2018 at 2:20 PM, Charles Chen  wrote:
>> > I would like to +1 the faster release cycle process JB and Robert have
>> been
>> > advocating and implementing, and thank JB for releasing 2.3.0 smoothly.
>> > When we block for specific features and increase the time between
>> releases,
>> > we increase the urgency for PR authors to push for their change to go
>> into
>> > an upcoming release, which is a feedback loop that results in our
>> releases
>> > taking months instead of weeks.  We should however try to get pending
>> PRs
>> > wrapped up.
>> >
>> > On Tue, Feb 20, 2018 at 2:15 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com>
>> > wrote:
>> >>
>> >> Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just
>> out
>> >> so 1 week is a bit fast IMHO.
>> >>
>> >> Le 20 févr. 2018 23:13, "Robert Bradshaw"  a
>> écrit :
>> >>>
>> >>> One of the main shifts that I think helped this release was explicitly
>> >>> not being feature driven, rather releasing what's already in the
>> >>> branch. That doesn't mean it's not a good call to action to try and
>> >>> get long-pending PRs or similar wrapped up.
>> >>>
>> >>> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
>> >>>  wrote:
>> >>> > There are a lot of long pending PR, would be good to merge them
>> before
>> >>> > 2.4.
>> >>> > Some are bringing tests for the 2.3 release which can be critical to
>> >>> > include.
>> >>> >
>> >>> > Maybe we should list the pr and jira we want it before picking a
>> date?
>> >>> >
>> >>> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" <
>> katsia...@google.com>
>> >>> > a
>> >>> > écrit :
>> >>> >>
>> >>> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6
>> (and
>> >>> >> the
>> >>> >> latter already has an RC out, so we will likely be blocked on
>> Beam).
>> >>> >>
>> >>> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw
>> >>> >> 
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all
>> that
>> >>> >>> made this happen!) It'd be great to keep the ball rolling for a
>> >>> >>> similarly well-executed 2.4. A lot has gone in [1] since we made
>> the
>> >>> >>> 2.3 cut, and to keep our cadence up I would propose a time-based
>> cut
>> >>> >>> date early next week (say the 28th).
>> >>> >>>
>> >>> >>> I'll volunteer to do this release.
>> >>> >>>
>> >>> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Gus Katsiapis | Software Engineer | katsia...@google.com |
>> >>> >> 650-918-7487
>>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Beam 2.4.0

2018-02-20 Thread Ahmet Altay
+1 for having regular release cycles. Finalizing a release takes time in
the order of a few weeks and starting a new release soon after the previous
one is a reliable way for having releases every 6 weeks.

On Tue, Feb 20, 2018 at 2:30 PM, Robert Bradshaw 
wrote:

> Yep. I am starting the "Let's do a 2.4.0 release" thread almost
> exactly 6 weeks after JB first started the 2.3.0 release thread.
>
> On Tue, Feb 20, 2018 at 2:20 PM, Charles Chen  wrote:
> > I would like to +1 the faster release cycle process JB and Robert have
> been
> > advocating and implementing, and thank JB for releasing 2.3.0 smoothly.
> > When we block for specific features and increase the time between
> releases,
> > we increase the urgency for PR authors to push for their change to go
> into
> > an upcoming release, which is a feedback loop that results in our
> releases
> > taking months instead of weeks.  We should however try to get pending PRs
> > wrapped up.
> >
> > On Tue, Feb 20, 2018 at 2:15 PM Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >>
> >> Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just out
> >> so 1 week is a bit fast IMHO.
> >>
> >> Le 20 févr. 2018 23:13, "Robert Bradshaw"  a
> écrit :
> >>>
> >>> One of the main shifts that I think helped this release was explicitly
> >>> not being feature driven, rather releasing what's already in the
> >>> branch. That doesn't mean it's not a good call to action to try and
> >>> get long-pending PRs or similar wrapped up.
> >>>
> >>> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
> >>>  wrote:
> >>> > There are a lot of long pending PR, would be good to merge them
> before
> >>> > 2.4.
> >>> > Some are bringing tests for the 2.3 release which can be critical to
> >>> > include.
> >>> >
> >>> > Maybe we should list the pr and jira we want it before picking a
> date?
> >>> >
> >>> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" <
> katsia...@google.com>
> >>> > a
> >>> > écrit :
> >>> >>
> >>> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6
> (and
> >>> >> the
> >>> >> latter already has an RC out, so we will likely be blocked on Beam).
> >>> >>
> >>> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw
> >>> >> 
> >>> >> wrote:
> >>> >>>
> >>> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all that
> >>> >>> made this happen!) It'd be great to keep the ball rolling for a
> >>> >>> similarly well-executed 2.4. A lot has gone in [1] since we made
> the
> >>> >>> 2.3 cut, and to keep our cadence up I would propose a time-based
> cut
> >>> >>> date early next week (say the 28th).
> >>> >>>
> >>> >>> I'll volunteer to do this release.
> >>> >>>
> >>> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Gus Katsiapis | Software Engineer | katsia...@google.com |
> >>> >> 650-918-7487
>


Re: Beam 2.4.0

2018-02-20 Thread Robert Bradshaw
Yep. I am starting the "Let's do a 2.4.0 release" thread almost
exactly 6 weeks after JB first started the 2.3.0 release thread.

On Tue, Feb 20, 2018 at 2:20 PM, Charles Chen  wrote:
> I would like to +1 the faster release cycle process JB and Robert have been
> advocating and implementing, and thank JB for releasing 2.3.0 smoothly.
> When we block for specific features and increase the time between releases,
> we increase the urgency for PR authors to push for their change to go into
> an upcoming release, which is a feedback loop that results in our releases
> taking months instead of weeks.  We should however try to get pending PRs
> wrapped up.
>
> On Tue, Feb 20, 2018 at 2:15 PM Romain Manni-Bucau 
> wrote:
>>
>> Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just out
>> so 1 week is a bit fast IMHO.
>>
>> Le 20 févr. 2018 23:13, "Robert Bradshaw"  a écrit :
>>>
>>> One of the main shifts that I think helped this release was explicitly
>>> not being feature driven, rather releasing what's already in the
>>> branch. That doesn't mean it's not a good call to action to try and
>>> get long-pending PRs or similar wrapped up.
>>>
>>> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
>>>  wrote:
>>> > There are a lot of long pending PR, would be good to merge them before
>>> > 2.4.
>>> > Some are bringing tests for the 2.3 release which can be critical to
>>> > include.
>>> >
>>> > Maybe we should list the pr and jira we want it before picking a date?
>>> >
>>> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" 
>>> > a
>>> > écrit :
>>> >>
>>> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6 (and
>>> >> the
>>> >> latter already has an RC out, so we will likely be blocked on Beam).
>>> >>
>>> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw
>>> >> 
>>> >> wrote:
>>> >>>
>>> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all that
>>> >>> made this happen!) It'd be great to keep the ball rolling for a
>>> >>> similarly well-executed 2.4. A lot has gone in [1] since we made the
>>> >>> 2.3 cut, and to keep our cadence up I would propose a time-based cut
>>> >>> date early next week (say the 28th).
>>> >>>
>>> >>> I'll volunteer to do this release.
>>> >>>
>>> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Gus Katsiapis | Software Engineer | katsia...@google.com |
>>> >> 650-918-7487


Re: Beam 2.4.0

2018-02-20 Thread Charles Chen
I would like to +1 the faster release cycle process JB and Robert have been
advocating and implementing, and thank JB for releasing 2.3.0 smoothly.
When we block for specific features and increase the time between releases,
we increase the urgency for PR authors to push for their change to go into
an upcoming release, which is a feedback loop that results in our releases
taking months instead of weeks.  We should however try to get pending PRs
wrapped up.

On Tue, Feb 20, 2018 at 2:15 PM Romain Manni-Bucau 
wrote:

> Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just out
> so 1 week is a bit fast IMHO.
>
> Le 20 févr. 2018 23:13, "Robert Bradshaw"  a écrit :
>
>> One of the main shifts that I think helped this release was explicitly
>> not being feature driven, rather releasing what's already in the
>> branch. That doesn't mean it's not a good call to action to try and
>> get long-pending PRs or similar wrapped up.
>>
>> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
>>  wrote:
>> > There are a lot of long pending PR, would be good to merge them before
>> 2.4.
>> > Some are bringing tests for the 2.3 release which can be critical to
>> > include.
>> >
>> > Maybe we should list the pr and jira we want it before picking a date?
>> >
>> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" 
>> a
>> > écrit :
>> >>
>> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6 (and
>> the
>> >> latter already has an RC out, so we will likely be blocked on Beam).
>> >>
>> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw > >
>> >> wrote:
>> >>>
>> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all that
>> >>> made this happen!) It'd be great to keep the ball rolling for a
>> >>> similarly well-executed 2.4. A lot has gone in [1] since we made the
>> >>> 2.3 cut, and to keep our cadence up I would propose a time-based cut
>> >>> date early next week (say the 28th).
>> >>>
>> >>> I'll volunteer to do this release.
>> >>>
>> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Gus Katsiapis | Software Engineer | katsia...@google.com |
>> 650-918-7487 <(650)%20918-7487>
>>
>


Re: Beam 2.4.0

2018-02-20 Thread Romain Manni-Bucau
Kind of agree but rythm was supposed to be 6 weeks IIRC, 2.3 is just out so
1 week is a bit fast IMHO.

Le 20 févr. 2018 23:13, "Robert Bradshaw"  a écrit :

> One of the main shifts that I think helped this release was explicitly
> not being feature driven, rather releasing what's already in the
> branch. That doesn't mean it's not a good call to action to try and
> get long-pending PRs or similar wrapped up.
>
> On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
>  wrote:
> > There are a lot of long pending PR, would be good to merge them before
> 2.4.
> > Some are bringing tests for the 2.3 release which can be critical to
> > include.
> >
> > Maybe we should list the pr and jira we want it before picking a date?
> >
> > Le 20 févr. 2018 22:02, "Konstantinos Katsiapis" 
> a
> > écrit :
> >>
> >> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6 (and
> the
> >> latter already has an RC out, so we will likely be blocked on Beam).
> >>
> >> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw 
> >> wrote:
> >>>
> >>> Now that Beam 2.3.0 went out (and in record time, kudos to all that
> >>> made this happen!) It'd be great to keep the ball rolling for a
> >>> similarly well-executed 2.4. A lot has gone in [1] since we made the
> >>> 2.3 cut, and to keep our cadence up I would propose a time-based cut
> >>> date early next week (say the 28th).
> >>>
> >>> I'll volunteer to do this release.
> >>>
> >>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
> >>
> >>
> >>
> >>
> >> --
> >> Gus Katsiapis | Software Engineer | katsia...@google.com | 650-918-7487
>


Re: Beam 2.4.0

2018-02-20 Thread Robert Bradshaw
One of the main shifts that I think helped this release was explicitly
not being feature driven, rather releasing what's already in the
branch. That doesn't mean it's not a good call to action to try and
get long-pending PRs or similar wrapped up.

On Tue, Feb 20, 2018 at 2:10 PM, Romain Manni-Bucau
 wrote:
> There are a lot of long pending PR, would be good to merge them before 2.4.
> Some are bringing tests for the 2.3 release which can be critical to
> include.
>
> Maybe we should list the pr and jira we want it before picking a date?
>
> Le 20 févr. 2018 22:02, "Konstantinos Katsiapis"  a
> écrit :
>>
>> +1 since tf.transform 0.6 depends on Beam 2.4 and Tensorflow 1.6 (and the
>> latter already has an RC out, so we will likely be blocked on Beam).
>>
>> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw 
>> wrote:
>>>
>>> Now that Beam 2.3.0 went out (and in record time, kudos to all that
>>> made this happen!) It'd be great to keep the ball rolling for a
>>> similarly well-executed 2.4. A lot has gone in [1] since we made the
>>> 2.3 cut, and to keep our cadence up I would propose a time-based cut
>>> date early next week (say the 28th).
>>>
>>> I'll volunteer to do this release.
>>>
>>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
>>
>>
>>
>>
>> --
>> Gus Katsiapis | Software Engineer | katsia...@google.com | 650-918-7487


Re: Beam 2.4.0

2018-02-20 Thread Romain Manni-Bucau
There are a lot of long pending PR, would be good to merge them before 2.4.
Some are bringing tests for the 2.3 release which can be critical to
include.

Maybe we should list the pr and jira we want it before picking a date?

Le 20 févr. 2018 22:02, "Konstantinos Katsiapis"  a
écrit :

> +1 since tf.transform  0.6
> depends on Beam 2.4 and Tensorflow 1.6 (and the latter already has an RC
> out, so we will likely be blocked on Beam).
>
> On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw 
> wrote:
>
>> Now that Beam 2.3.0 went out (and in record time, kudos to all that
>> made this happen!) It'd be great to keep the ball rolling for a
>> similarly well-executed 2.4. A lot has gone in [1] since we made the
>> 2.3 cut, and to keep our cadence up I would propose a time-based cut
>> date early next week (say the 28th).
>>
>> I'll volunteer to do this release.
>>
>> [1] https://github.com/apache/beam/compare/release-2.3.0...master
>>
>
>
>
> --
> Gus Katsiapis | Software Engineer | katsia...@google.com | 650-918-7487
>


Re: Beam 2.4.0

2018-02-20 Thread Konstantinos Katsiapis
+1 since tf.transform  0.6 depends
on Beam 2.4 and Tensorflow 1.6 (and the latter already has an RC out, so we
will likely be blocked on Beam).

On Tue, Feb 20, 2018 at 12:50 PM, Robert Bradshaw 
wrote:

> Now that Beam 2.3.0 went out (and in record time, kudos to all that
> made this happen!) It'd be great to keep the ball rolling for a
> similarly well-executed 2.4. A lot has gone in [1] since we made the
> 2.3 cut, and to keep our cadence up I would propose a time-based cut
> date early next week (say the 28th).
>
> I'll volunteer to do this release.
>
> [1] https://github.com/apache/beam/compare/release-2.3.0...master
>



-- 
Gus Katsiapis | Software Engineer | katsia...@google.com | 650-918-7487


Re: Scala SDK for Apache Beam

2018-02-20 Thread Ankit Jhalaria
Thanks Eugene. I will look at it.

On Tue, Feb 20, 2018 at 12:54 PM, Eugene Kirpichov 
wrote:

> Have you looked at Scio https://github.com/spotify/scio ?
>
> On Tue, Feb 20, 2018 at 12:50 PM Ankit Jhalaria 
> wrote:
>
>> Hey guys,
>>
>> I am interested in writing an SDK in Scala for Apache Beam. I was
>> wondering if someone has already started working on it? If yes, I would
>> love to know how to contribute?
>> If work hasn't been started on the Scala SDK, what would be some good
>> pointers/tips to start an implementation?
>>
>> Thanks,
>> Ankit
>>
>


Re: Scala SDK for Apache Beam

2018-02-20 Thread Eugene Kirpichov
Have you looked at Scio https://github.com/spotify/scio ?

On Tue, Feb 20, 2018 at 12:50 PM Ankit Jhalaria 
wrote:

> Hey guys,
>
> I am interested in writing an SDK in Scala for Apache Beam. I was
> wondering if someone has already started working on it? If yes, I would
> love to know how to contribute?
> If work hasn't been started on the Scala SDK, what would be some good
> pointers/tips to start an implementation?
>
> Thanks,
> Ankit
>


Beam 2.4.0

2018-02-20 Thread Robert Bradshaw
Now that Beam 2.3.0 went out (and in record time, kudos to all that
made this happen!) It'd be great to keep the ball rolling for a
similarly well-executed 2.4. A lot has gone in [1] since we made the
2.3 cut, and to keep our cadence up I would propose a time-based cut
date early next week (say the 28th).

I'll volunteer to do this release.

[1] https://github.com/apache/beam/compare/release-2.3.0...master


Scala SDK for Apache Beam

2018-02-20 Thread Ankit Jhalaria
Hey guys,

I am interested in writing an SDK in Scala for Apache Beam. I was wondering
if someone has already started working on it? If yes, I would love to know
how to contribute?
If work hasn't been started on the Scala SDK, what would be some good
pointers/tips to start an implementation?

Thanks,
Ankit


Re: Code reviews in Beam

2018-02-20 Thread Reuven Lax
I do think we need something less generic than our current @Experimental. I
also like the idea of a separate package for unvetted contributions (though
Incubating might simply be confusing, given that Apache uses Incubating for
something else).

Good idea to have a standard way of marking such comments.


On Tue, Feb 20, 2018 at 11:56 AM, Robert Bradshaw 
wrote:

> Thanks, Reuven, for bringing this up. This is an area well worth
> investing time in. Specific comments below.
>
> On Mon, Feb 19, 2018 at 10:32 AM, Reuven Lax  wrote:
>
> > Pedantic
> >
> > Overly-pedantic comments (change variable names, etc.) can be
> frustrating.
> > The PR author can feel like they are being forced to make meaningless
> > changes just so the reviewer will allow merging. Note that this is
> sometimes
> > in the eye of the beholder - the reviewer may not think all these
> comments
> > are pedantic.
>
> I think it would be good to have a convention for comments that are
> suggestive, but not required. I usually prefix these with "Nit: ..."
> Maybe "Optional: ..." would be better. This makes it clear that we're
> not trying to put up unnecessary burdens for getting code in, but on
> the other hand helps give guidance (which is often appreciated,
> especially for newcomers (to the project, or coding in general)). For
> any comment, one should be able to answer the question "why" and a
> reviewer should feel free to ask this.
>
> > Don't Do This
> >
> > Sometimes a reviewer rejects an entire PR, saying that this should not be
> > done. There are various reasons given: this won't scale, this will break
> > backwards compatibility, this will break a specific runner, etc. The PR
> > author may not always understand or agree with these reasons, and this
> can
> > leave hurt feelings.
>
> As mentioned, being able to point to a doc when issues like this arise
> is a much better experience for the recipient.
>
> I like the PR dashboard idea as well.
>
> A separate package for less-well-vetted code is better than an
> @Expiremental attribute. Incubating? Is there an expectation of
> graduation at some point?
>


Re: Code reviews in Beam

2018-02-20 Thread Robert Bradshaw
Thanks, Reuven, for bringing this up. This is an area well worth
investing time in. Specific comments below.

On Mon, Feb 19, 2018 at 10:32 AM, Reuven Lax  wrote:

> Pedantic
>
> Overly-pedantic comments (change variable names, etc.) can be frustrating.
> The PR author can feel like they are being forced to make meaningless
> changes just so the reviewer will allow merging. Note that this is sometimes
> in the eye of the beholder - the reviewer may not think all these comments
> are pedantic.

I think it would be good to have a convention for comments that are
suggestive, but not required. I usually prefix these with "Nit: ..."
Maybe "Optional: ..." would be better. This makes it clear that we're
not trying to put up unnecessary burdens for getting code in, but on
the other hand helps give guidance (which is often appreciated,
especially for newcomers (to the project, or coding in general)). For
any comment, one should be able to answer the question "why" and a
reviewer should feel free to ask this.

> Don't Do This
>
> Sometimes a reviewer rejects an entire PR, saying that this should not be
> done. There are various reasons given: this won't scale, this will break
> backwards compatibility, this will break a specific runner, etc. The PR
> author may not always understand or agree with these reasons, and this can
> leave hurt feelings.

As mentioned, being able to point to a doc when issues like this arise
is a much better experience for the recipient.

I like the PR dashboard idea as well.

A separate package for less-well-vetted code is better than an
@Expiremental attribute. Incubating? Is there an expectation of
graduation at some point?


Re: Code reviews in Beam

2018-02-20 Thread Eugene Kirpichov
I'm ambivalent about the placement of less-stable contributions, but I'd be
very much in favor of being clear about how stable a given API is.

There's several axes of stability here, too:
- Is the API going to stay compatible
- Is the implementation going to stay compatible via pipeline updates
- Is the implementation considered well-tested enough for production use

The current @Experimental annotation is a blanket "no" to all of these, and
lack of the annotation is a blanket "yes" - we can probably do better. We
also sometimes overload the annotation to mean "uses a feature that doesn't
work in all runners" (e.g. splittable or stateful ParDo). We also sometimes
use an @Experimental API inside an API that's not marked this way, and
there's no good way to enforce that. Sometimes instability applies only to
a particular aspect of an API. And so on.

I'm not really sure to do about this (introducing more granularity would
also require some process for "graduating" along a given axis of
stability), but thought it's worth pointing out anyways.

On Tue, Feb 20, 2018 at 9:28 AM Kenneth Knowles  wrote:

> There was a suggestion of such a thing - a collection of less well-tested
> and vetted contributions. The one that has the spirit I think of is the
> Piggybank from Apache Pig. Most important to make sure all committers and
> users both understand the meaning of it. The biggest problem is if users
> really rely on it and then have problems. Related to that is contributors
> focusing there effort here and never doing the work to reach the higher
> bar. But overall I am in favor of it, too.
>
> Kenn
>
>
> On Tue, Feb 20, 2018 at 7:54 AM, Reuven Lax  wrote:
>
>> On further thought I like the separate package even more. It allows us to
>> easily isolate all those tests, and not block commits or releases on them.
>>
>> On Tue, Feb 20, 2018 at 7:45 AM, Reuven Lax  wrote:
>>
>>> Another idea: we could have a special package for these "unrefined"
>>> contributions. Once the contribution has had time to mature some, it can be
>>> moved to the regular package structure.
>>>
>>> On Tue, Feb 20, 2018 at 7:41 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
 I would add some @ToBeRefined or so 
 Le 20 févr. 2018, à 16:35, Reuven Lax  a écrit:
>
> Hi JB,
>
> You're right, I was thinking more about changes to core when talking
> about the technical-excellence bar.
>
> I think there still needs to be some bar for new features and
> extension, but I also think it can be much lower (as nobody is breaking
> anything by merging this). An example of where we still need a bar here is
> tests. If a new IO has a test that the reviewer thinks will be flaky, that
> flaky test will cause problems for _every_ Beam committer, and it's fair 
> to
> ask for the test to be changed.
>
> Given that the bar is lower for new extensions, I think we need a good
> way of marking these things so that Beam users know they are not as mature
> as other parts of Beam. Traditionally we've used @Experimental, but
> @Experimental has been overloaded to mean other things as well. Maybe we
> need to introduce a new annotation?
>
> Reuven
>
> On Tue, Feb 20, 2018 at 5:48 AM, Jean-Baptiste Onofré  > wrote:
>
>> Hi Reuven
>>
>> I agree with all your points except maybe in term of bar level,
>> especially on new features (like extensions or IOs). If the PRs on the 
>> core
>> should be heavily reviewed, I'm more in favor of merging the PR pretty 
>> fast
>> even if not perfect. It's not a technical topic, it's really a 
>> contribution
>> and community topic.
>>
>> Thanks anyway, the dashboard is a good idea !
>>
>> Regards
>> JB
>> Le 19 févr. 2018, à 19:33, Reuven Lax < re...@google.com> a écrit:
>>>
>>> There have been a number of threads on code reviews (most recently
>>> on a "State of the project" email). These threads have died out without
>>> much resolution, but I'm not sure that the concerns have gone away.
>>>
>>> First of all, I'm of the opinion that a code-review bar for Beam
>>> commits is critical to success of the project. This is a system with 
>>> many
>>> subtle semantics, which might not be obvious at first glance. Beam
>>> pipelines process user data, and the consequence of certain bugs might 
>>> mean
>>> corrupting user data and aggregations - something to avoid at all cost 
>>> if
>>> we want Beam to be trusted. Finally Beam pipelines often run at 
>>> extremely
>>> high scale; while many of our committers have a strong intuition for 
>>> what
>>> can go wrong when running at high scale, not everybody who wants to
>>> contribute will  have this experience.
>>>
>>>
>>> However, we 

Re: beam shade itself?

2018-02-20 Thread Kenneth Knowles
This comes from keeping forbidden deps off the API surface. It is probably
overkill, but I cannot recall the details.

On Tue, Feb 20, 2018 at 8:42 AM, Romain Manni-Bucau 
wrote:

> Hi guys,
>
> is it intended beam shades itself?
>
> $ jar tf ~/.m2/repository/org/apache/beam/beam-runners-direct-java/
> 2.3.0/beam-runners-direct-java-2.3.0.jar  | grep '/DoFnRunner.class'
> org/apache/beam/runners/direct/repackaged/runners/core/DoFnRunner.class
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>


Re: Code reviews in Beam

2018-02-20 Thread Reuven Lax
On further thought I like the separate package even more. It allows us to
easily isolate all those tests, and not block commits or releases on them.

On Tue, Feb 20, 2018 at 7:45 AM, Reuven Lax  wrote:

> Another idea: we could have a special package for these "unrefined"
> contributions. Once the contribution has had time to mature some, it can be
> moved to the regular package structure.
>
> On Tue, Feb 20, 2018 at 7:41 AM, Jean-Baptiste Onofré 
> wrote:
>
>> I would add some @ToBeRefined or so 
>> Le 20 févr. 2018, à 16:35, Reuven Lax  a écrit:
>>>
>>> Hi JB,
>>>
>>> You're right, I was thinking more about changes to core when talking
>>> about the technical-excellence bar.
>>>
>>> I think there still needs to be some bar for new features and extension,
>>> but I also think it can be much lower (as nobody is breaking anything by
>>> merging this). An example of where we still need a bar here is tests. If a
>>> new IO has a test that the reviewer thinks will be flaky, that flaky test
>>> will cause problems for _every_ Beam committer, and it's fair to ask for
>>> the test to be changed.
>>>
>>> Given that the bar is lower for new extensions, I think we need a good
>>> way of marking these things so that Beam users know they are not as mature
>>> as other parts of Beam. Traditionally we've used @Experimental, but
>>> @Experimental has been overloaded to mean other things as well. Maybe we
>>> need to introduce a new annotation?
>>>
>>> Reuven
>>>
>>> On Tue, Feb 20, 2018 at 5:48 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
 Hi Reuven

 I agree with all your points except maybe in term of bar level,
 especially on new features (like extensions or IOs). If the PRs on the core
 should be heavily reviewed, I'm more in favor of merging the PR pretty fast
 even if not perfect. It's not a technical topic, it's really a contribution
 and community topic.

 Thanks anyway, the dashboard is a good idea !

 Regards
 JB
 Le 19 févr. 2018, à 19:33, Reuven Lax < re...@google.com> a écrit:
>
> There have been a number of threads on code reviews (most recently on
> a "State of the project" email). These threads have died out without much
> resolution, but I'm not sure that the concerns have gone away.
>
> First of all, I'm of the opinion that a code-review bar for Beam
> commits is critical to success of the project. This is a system with many
> subtle semantics, which might not be obvious at first glance. Beam
> pipelines process user data, and the consequence of certain bugs might 
> mean
> corrupting user data and aggregations - something to avoid at all cost if
> we want Beam to be trusted. Finally Beam pipelines often run at extremely
> high scale; while many of our committers have a strong intuition for what
> can go wrong when running at high scale, not everybody who wants to
> contribute will  have this experience.
>
>
> However, we also cannot afford to let our policy get in the way of
> building a community. We *must* remain a friendly place to develop
> and contribute.
>
>
> When I look at concerns people have had on on code reviews (and I've
> been browsing most PRs this past year), I see a few common threads:
>
>
> *Review Latency*
>
> Latency on code reviews can be too high. At various times folks (most
> recently, Ahmet and I) have tried to regularly look for stale PRs and ping
> them, but latency still remains high.
>
>
> *Pedantic*
>
> Overly-pedantic comments (change variable names, etc.) can be
> frustrating. The PR author can feel like they are being forced to make
> meaningless changes just so the reviewer will allow merging. Note that 
> this
> is sometimes in the eye of the beholder - the reviewer may not think all
> these comments are pedantic.
>
>
> *Don't Do This*
>
> Sometimes a reviewer rejects an entire PR, saying that this should not
> be done. There are various reasons given: this won't scale, this will 
> break
> backwards compatibility, this will break a specific runner, etc. The PR
> author may not always understand or agree with these reasons, and this can
> leave hurt feelings.
>
>
> I would like open discussion about ways of making our code-review
> policy more welcoming. I'll seed the discussion with a few ideas:
>
>
> *Code Review Dashboard and Automation*
>
> We should invest in adding a code-review dashboard to our site,
> tracking stale PRs by reviewer. Quick turnaround on code reviews is
> essential building community, so all Beam committers should consider
> reviewing code as important as their own coding.  Spark has built a
> PR dashboard (https://spark-prs.appspot.com/) which they’ve found
> better 

Re: Jenkins job beam_PreCommit_Python_MavenInstall” still fails

2018-02-20 Thread Jean-Baptiste Onofré
I would wait a feedback from the original author of the commit.

Regards
JB

Le 20 févr. 2018 à 16:42, à 16:42, Alexey Romanenko  
a écrit:
>Yes, I switched back to BundleBasedDirectRunner as default direct
>runner on current master and now it passes.
>So, my question was rather if we need to revert this or someone, who
>aware of this change, can take a look and fix it properly?
>
>WBR,
>Alexey
>
>> On 20 Feb 2018, at 16:20, Jean-Baptiste Onofré 
>wrote:
>>
>> Yes it seems to be related to the change on the Python direct runner.
>>
>> Did you try to revert the change to see if it fixes the build ?
>>
>> Thanks
>> Regards
>> JB
>> Le 20 févr. 2018, à 16:13, Alexey Romanenko > a écrit:
>> Hi all,
>>
>> Jenkins job “beam_PreCommit_Python_MavenInstall” has been constantly
>failing for last 3 days.
>>
>> Last successful build (#3052) has been produced at Feb 16:
>>
>https://builds.apache.org/job/beam_PreCommit_Python_MavenInstall/lastSuccessfulBuild/
>
>>
>> “git bisect" says that the first commit, when it started to fail, was
>56081686bf7926b65a18dc7c7d2c4e4a9fd265e9
>
>>
>> Could someone take a look on this or I need to create new Jira for
>this?
>>
>> WBR,
>> Alexey


Re: Jenkins job beam_PreCommit_Python_MavenInstall” still fails

2018-02-20 Thread Alexey Romanenko
Yes, I switched back to BundleBasedDirectRunner as default direct runner on 
current master and now it passes.
So, my question was rather if we need to revert this or someone, who aware of 
this change, can take a look and fix it properly?

WBR,
Alexey

> On 20 Feb 2018, at 16:20, Jean-Baptiste Onofré  wrote:
> 
> Yes it seems to be related to the change on the Python direct runner.
> 
> Did you try to revert the change to see if it fixes the build ?
> 
> Thanks
> Regards
> JB
> Le 20 févr. 2018, à 16:13, Alexey Romanenko  > a écrit:
> Hi all,
> 
> Jenkins job “beam_PreCommit_Python_MavenInstall” has been constantly failing 
> for last 3 days. 
> 
> Last successful build (#3052) has been produced at Feb 16:
> https://builds.apache.org/job/beam_PreCommit_Python_MavenInstall/lastSuccessfulBuild/
>  
> 
> 
> “git bisect" says that the first commit, when it started to fail, was 
> 56081686bf7926b65a18dc7c7d2c4e4a9fd265e9 
> 
> 
> Could someone take a look on this or I need to create new Jira for this?
> 
> WBR,
> Alexey



Re: Code reviews in Beam

2018-02-20 Thread Jean-Baptiste Onofré
I would add some @ToBeRefined or so 

Le 20 févr. 2018 à 16:35, à 16:35, Reuven Lax  a écrit:
>Hi JB,
>
>You're right, I was thinking more about changes to core when talking
>about
>the technical-excellence bar.
>
>I think there still needs to be some bar for new features and
>extension,
>but I also think it can be much lower (as nobody is breaking anything
>by
>merging this). An example of where we still need a bar here is tests.
>If a
>new IO has a test that the reviewer thinks will be flaky, that flaky
>test
>will cause problems for _every_ Beam committer, and it's fair to ask
>for
>the test to be changed.
>
>Given that the bar is lower for new extensions, I think we need a good
>way
>of marking these things so that Beam users know they are not as mature
>as
>other parts of Beam. Traditionally we've used @Experimental, but
>@Experimental has been overloaded to mean other things as well. Maybe
>we
>need to introduce a new annotation?
>
>Reuven
>
>On Tue, Feb 20, 2018 at 5:48 AM, Jean-Baptiste Onofré 
>wrote:
>
>> Hi Reuven
>>
>> I agree with all your points except maybe in term of bar level,
>especially
>> on new features (like extensions or IOs). If the PRs on the core
>should be
>> heavily reviewed, I'm more in favor of merging the PR pretty fast
>even if
>> not perfect. It's not a technical topic, it's really a contribution
>and
>> community topic.
>>
>> Thanks anyway, the dashboard is a good idea !
>>
>> Regards
>> JB
>> Le 19 févr. 2018, à 19:33, Reuven Lax  a écrit:
>>>
>>> There have been a number of threads on code reviews (most recently
>on a
>>> "State of the project" email). These threads have died out without
>much
>>> resolution, but I'm not sure that the concerns have gone away.
>>>
>>> First of all, I'm of the opinion that a code-review bar for Beam
>commits
>>> is critical to success of the project. This is a system with many
>subtle
>>> semantics, which might not be obvious at first glance. Beam
>pipelines
>>> process user data, and the consequence of certain bugs might mean
>>> corrupting user data and aggregations - something to avoid at all
>cost if
>>> we want Beam to be trusted. Finally Beam pipelines often run at
>extremely
>>> high scale; while many of our committers have a strong intuition for
>what
>>> can go wrong when running at high scale, not everybody who wants to
>>> contribute will  have this experience.
>>>
>>>
>>> However, we also cannot afford to let our policy get in the way of
>>> building a community. We *must* remain a friendly place to develop
>and
>>> contribute.
>>>
>>>
>>> When I look at concerns people have had on on code reviews (and I've
>been
>>> browsing most PRs this past year), I see a few common threads:
>>>
>>>
>>> *Review Latency*
>>>
>>> Latency on code reviews can be too high. At various times folks
>(most
>>> recently, Ahmet and I) have tried to regularly look for stale PRs
>and ping
>>> them, but latency still remains high.
>>>
>>>
>>> *Pedantic*
>>>
>>> Overly-pedantic comments (change variable names, etc.) can be
>>> frustrating. The PR author can feel like they are being forced to
>make
>>> meaningless changes just so the reviewer will allow merging. Note
>that this
>>> is sometimes in the eye of the beholder - the reviewer may not think
>all
>>> these comments are pedantic.
>>>
>>>
>>> *Don't Do This*
>>>
>>> Sometimes a reviewer rejects an entire PR, saying that this should
>not be
>>> done. There are various reasons given: this won't scale, this will
>break
>>> backwards compatibility, this will break a specific runner, etc. The
>PR
>>> author may not always understand or agree with these reasons, and
>this can
>>> leave hurt feelings.
>>>
>>>
>>> I would like open discussion about ways of making our code-review
>policy
>>> more welcoming. I'll seed the discussion with a few ideas:
>>>
>>>
>>> *Code Review Dashboard and Automation*
>>>
>>> We should invest in adding a code-review dashboard to our site,
>tracking
>>> stale PRs by reviewer. Quick turnaround on code reviews is essential
>>> building community, so all Beam committers should consider reviewing
>code
>>> as important as their own coding.  Spark has built a PR dashboard (
>>> https://spark-prs.appspot.com/) which they’ve found better than
>Github’s
>>> dashboard; we could easily fork this dashboard. There are also tools
>that
>>> will automatically ping reviewers (mention-bot and hey there are two
>such
>>> tools). We can also make sure that new PRs are auto assigned a
>reviewer
>>> (e.g. https://github.com/imsky/pull-review)
>>>
>>>
>>> *Code Review Response SLA*
>>>
>>> It would be great if we could agree on a response-time SLA for Beam
>code
>>> reviews. The response might be “I am unable to do the review until
>next
>>> week,” however even that is better than getting no response.
>>>
>>>
>>> *Guideline Document*
>>>
>>> I think we should have a guideline document, explaining common
>reasons a
>>> reviewer might reject an 

Re: Code reviews in Beam

2018-02-20 Thread Jean-Baptiste Onofré
Fully agree !

Thanks Reuven
Regards
JB

Le 20 févr. 2018 à 16:35, à 16:35, Reuven Lax  a écrit:
>Hi JB,
>
>You're right, I was thinking more about changes to core when talking
>about
>the technical-excellence bar.
>
>I think there still needs to be some bar for new features and
>extension,
>but I also think it can be much lower (as nobody is breaking anything
>by
>merging this). An example of where we still need a bar here is tests.
>If a
>new IO has a test that the reviewer thinks will be flaky, that flaky
>test
>will cause problems for _every_ Beam committer, and it's fair to ask
>for
>the test to be changed.
>
>Given that the bar is lower for new extensions, I think we need a good
>way
>of marking these things so that Beam users know they are not as mature
>as
>other parts of Beam. Traditionally we've used @Experimental, but
>@Experimental has been overloaded to mean other things as well. Maybe
>we
>need to introduce a new annotation?
>
>Reuven
>
>On Tue, Feb 20, 2018 at 5:48 AM, Jean-Baptiste Onofré 
>wrote:
>
>> Hi Reuven
>>
>> I agree with all your points except maybe in term of bar level,
>especially
>> on new features (like extensions or IOs). If the PRs on the core
>should be
>> heavily reviewed, I'm more in favor of merging the PR pretty fast
>even if
>> not perfect. It's not a technical topic, it's really a contribution
>and
>> community topic.
>>
>> Thanks anyway, the dashboard is a good idea !
>>
>> Regards
>> JB
>> Le 19 févr. 2018, à 19:33, Reuven Lax  a écrit:
>>>
>>> There have been a number of threads on code reviews (most recently
>on a
>>> "State of the project" email). These threads have died out without
>much
>>> resolution, but I'm not sure that the concerns have gone away.
>>>
>>> First of all, I'm of the opinion that a code-review bar for Beam
>commits
>>> is critical to success of the project. This is a system with many
>subtle
>>> semantics, which might not be obvious at first glance. Beam
>pipelines
>>> process user data, and the consequence of certain bugs might mean
>>> corrupting user data and aggregations - something to avoid at all
>cost if
>>> we want Beam to be trusted. Finally Beam pipelines often run at
>extremely
>>> high scale; while many of our committers have a strong intuition for
>what
>>> can go wrong when running at high scale, not everybody who wants to
>>> contribute will  have this experience.
>>>
>>>
>>> However, we also cannot afford to let our policy get in the way of
>>> building a community. We *must* remain a friendly place to develop
>and
>>> contribute.
>>>
>>>
>>> When I look at concerns people have had on on code reviews (and I've
>been
>>> browsing most PRs this past year), I see a few common threads:
>>>
>>>
>>> *Review Latency*
>>>
>>> Latency on code reviews can be too high. At various times folks
>(most
>>> recently, Ahmet and I) have tried to regularly look for stale PRs
>and ping
>>> them, but latency still remains high.
>>>
>>>
>>> *Pedantic*
>>>
>>> Overly-pedantic comments (change variable names, etc.) can be
>>> frustrating. The PR author can feel like they are being forced to
>make
>>> meaningless changes just so the reviewer will allow merging. Note
>that this
>>> is sometimes in the eye of the beholder - the reviewer may not think
>all
>>> these comments are pedantic.
>>>
>>>
>>> *Don't Do This*
>>>
>>> Sometimes a reviewer rejects an entire PR, saying that this should
>not be
>>> done. There are various reasons given: this won't scale, this will
>break
>>> backwards compatibility, this will break a specific runner, etc. The
>PR
>>> author may not always understand or agree with these reasons, and
>this can
>>> leave hurt feelings.
>>>
>>>
>>> I would like open discussion about ways of making our code-review
>policy
>>> more welcoming. I'll seed the discussion with a few ideas:
>>>
>>>
>>> *Code Review Dashboard and Automation*
>>>
>>> We should invest in adding a code-review dashboard to our site,
>tracking
>>> stale PRs by reviewer. Quick turnaround on code reviews is essential
>>> building community, so all Beam committers should consider reviewing
>code
>>> as important as their own coding.  Spark has built a PR dashboard (
>>> https://spark-prs.appspot.com/) which they’ve found better than
>Github’s
>>> dashboard; we could easily fork this dashboard. There are also tools
>that
>>> will automatically ping reviewers (mention-bot and hey there are two
>such
>>> tools). We can also make sure that new PRs are auto assigned a
>reviewer
>>> (e.g. https://github.com/imsky/pull-review)
>>>
>>>
>>> *Code Review Response SLA*
>>>
>>> It would be great if we could agree on a response-time SLA for Beam
>code
>>> reviews. The response might be “I am unable to do the review until
>next
>>> week,” however even that is better than getting no response.
>>>
>>>
>>> *Guideline Document*
>>>
>>> I think we should have a guideline document, explaining common
>reasons a
>>> reviewer might reject 

Re: Code reviews in Beam

2018-02-20 Thread Reuven Lax
Hi JB,

You're right, I was thinking more about changes to core when talking about
the technical-excellence bar.

I think there still needs to be some bar for new features and extension,
but I also think it can be much lower (as nobody is breaking anything by
merging this). An example of where we still need a bar here is tests. If a
new IO has a test that the reviewer thinks will be flaky, that flaky test
will cause problems for _every_ Beam committer, and it's fair to ask for
the test to be changed.

Given that the bar is lower for new extensions, I think we need a good way
of marking these things so that Beam users know they are not as mature as
other parts of Beam. Traditionally we've used @Experimental, but
@Experimental has been overloaded to mean other things as well. Maybe we
need to introduce a new annotation?

Reuven

On Tue, Feb 20, 2018 at 5:48 AM, Jean-Baptiste Onofré 
wrote:

> Hi Reuven
>
> I agree with all your points except maybe in term of bar level, especially
> on new features (like extensions or IOs). If the PRs on the core should be
> heavily reviewed, I'm more in favor of merging the PR pretty fast even if
> not perfect. It's not a technical topic, it's really a contribution and
> community topic.
>
> Thanks anyway, the dashboard is a good idea !
>
> Regards
> JB
> Le 19 févr. 2018, à 19:33, Reuven Lax  a écrit:
>>
>> There have been a number of threads on code reviews (most recently on a
>> "State of the project" email). These threads have died out without much
>> resolution, but I'm not sure that the concerns have gone away.
>>
>> First of all, I'm of the opinion that a code-review bar for Beam commits
>> is critical to success of the project. This is a system with many subtle
>> semantics, which might not be obvious at first glance. Beam pipelines
>> process user data, and the consequence of certain bugs might mean
>> corrupting user data and aggregations - something to avoid at all cost if
>> we want Beam to be trusted. Finally Beam pipelines often run at extremely
>> high scale; while many of our committers have a strong intuition for what
>> can go wrong when running at high scale, not everybody who wants to
>> contribute will  have this experience.
>>
>>
>> However, we also cannot afford to let our policy get in the way of
>> building a community. We *must* remain a friendly place to develop and
>> contribute.
>>
>>
>> When I look at concerns people have had on on code reviews (and I've been
>> browsing most PRs this past year), I see a few common threads:
>>
>>
>> *Review Latency*
>>
>> Latency on code reviews can be too high. At various times folks (most
>> recently, Ahmet and I) have tried to regularly look for stale PRs and ping
>> them, but latency still remains high.
>>
>>
>> *Pedantic*
>>
>> Overly-pedantic comments (change variable names, etc.) can be
>> frustrating. The PR author can feel like they are being forced to make
>> meaningless changes just so the reviewer will allow merging. Note that this
>> is sometimes in the eye of the beholder - the reviewer may not think all
>> these comments are pedantic.
>>
>>
>> *Don't Do This*
>>
>> Sometimes a reviewer rejects an entire PR, saying that this should not be
>> done. There are various reasons given: this won't scale, this will break
>> backwards compatibility, this will break a specific runner, etc. The PR
>> author may not always understand or agree with these reasons, and this can
>> leave hurt feelings.
>>
>>
>> I would like open discussion about ways of making our code-review policy
>> more welcoming. I'll seed the discussion with a few ideas:
>>
>>
>> *Code Review Dashboard and Automation*
>>
>> We should invest in adding a code-review dashboard to our site, tracking
>> stale PRs by reviewer. Quick turnaround on code reviews is essential
>> building community, so all Beam committers should consider reviewing code
>> as important as their own coding.  Spark has built a PR dashboard (
>> https://spark-prs.appspot.com/) which they’ve found better than Github’s
>> dashboard; we could easily fork this dashboard. There are also tools that
>> will automatically ping reviewers (mention-bot and hey there are two such
>> tools). We can also make sure that new PRs are auto assigned a reviewer
>> (e.g. https://github.com/imsky/pull-review)
>>
>>
>> *Code Review Response SLA*
>>
>> It would be great if we could agree on a response-time SLA for Beam code
>> reviews. The response might be “I am unable to do the review until next
>> week,” however even that is better than getting no response.
>>
>>
>> *Guideline Document*
>>
>> I think we should have a guideline document, explaining common reasons a
>> reviewer might reject an approach in a  PR. e.g. "This will cause scaling
>> problems," "This will cause problems for XXX runner," "This is backwards
>> incompatible."  Reviewers can point to this doc as part of their comments,
>> along with extra flavor. e.g. “as per the guideline doc, this 

Re: Jenkins job beam_PreCommit_Python_MavenInstall” still fails

2018-02-20 Thread Jean-Baptiste Onofré
Yes it seems to be related to the change on the Python direct runner.

Did you try to revert the change to see if it fixes the build ?

Thanks
Regards
JB

Le 20 févr. 2018 à 16:13, à 16:13, Alexey Romanenko  
a écrit:
>Hi all,
>
>Jenkins job “beam_PreCommit_Python_MavenInstall” has been constantly
>failing for last 3 days.
>
>Last successful build (#3052) has been produced at Feb 16:
>https://builds.apache.org/job/beam_PreCommit_Python_MavenInstall/lastSuccessfulBuild/
>
>
>“git bisect" says that the first commit, when it started to fail, was
>56081686bf7926b65a18dc7c7d2c4e4a9fd265e9
>
>
>Could someone take a look on this or I need to create new Jira for
>this?
>
>WBR,
>Alexey


Jenkins job beam_PreCommit_Python_MavenInstall” still fails

2018-02-20 Thread Alexey Romanenko
Hi all,

Jenkins job “beam_PreCommit_Python_MavenInstall” has been constantly failing 
for last 3 days. 

Last successful build (#3052) has been produced at Feb 16:
https://builds.apache.org/job/beam_PreCommit_Python_MavenInstall/lastSuccessfulBuild/
 


“git bisect" says that the first commit, when it started to fail, was 
56081686bf7926b65a18dc7c7d2c4e4a9fd265e9 


Could someone take a look on this or I need to create new Jira for this?

WBR,
Alexey

Re: force the coder for a pardo

2018-02-20 Thread Jean-Baptiste Onofré
Agree. It makes sense to me in order to be consistent between the PTransforms.

Regards
JB

Le 20 févr. 2018 à 16:08, à 16:08, Eugene Kirpichov  a 
écrit:
>Something similar was discussed a while ago, and lead to the suggestion
>in
>PTransform Style Guide:
>https://beam.apache.org/contribute/ptransform-style-guide/#setting-coders-on-output-collections
>
>This suggestion is currently not followed by ParDo, but your plan moves
>in
>that direction, so +1 to that.
>Remembering that ParDo's may have multiple outputs, though, I'd suggest
>to
>organize it using builder methods:
>ParDo.of(new MyFn())
>  .withOutputTags(...)
>  .withCoder(...coder for main output...)
>  .withCoder(tag1, coder1)
>  .withCoder(tag2, coder2)
>
>This would bring ParDo to be similar to all other transforms that allow
>specifying a coder.
>
>On Tue, Feb 20, 2018 at 3:41 AM Jean-Baptiste Onofré 
>wrote:
>
>> Got the point.
>>
>> No problem for me. Could be a new core PTransform in core or
>extension.
>>
>> Regards
>> JB
>> Le 20 févr. 2018, à 11:17, Romain Manni-Bucau 
>a
>> écrit:
>>>
>>> Yep, idea is to encapsulate the transfo. Today if you dev a dofn you
>are
>>> enforced to do a ptransform to force the coder which is a bit
>overkill in
>>> general. Being able to do it on the pardo would increase the
>composability
>>> on the user side and reduce the boilerplate needed for that.
>>>
>>> public class MyFn extends DoFn {
>>>
>>>   // ...impl
>>>
>>>  public static PTransform of(*...*)
>{
>>>   return ParDo.of(MyCoder.of(), new MyFn());
>>>   }
>>>
>>> }
>>>
>>> Instead of having to also impl in all fn library:
>>>
>>> @NoArgsConstructor(access = PROTECTED)
>>> @AllArgsConstructor
>>> class ParDoTransformCoderProvider extends
>PTransform {
>>>
>>> private Coder coder;
>>>
>>> private DoFn fn;
>>>
>>> @Override
>>> public PCollection expand(final PCollection input) {
>>> return input.apply(ParDo.of(fn)).setCoder(coder);
>>> }
>>> }
>>>
>>>
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |   Blog
>>>  | Old Blog
>>>  |  Github
>>>  | LinkedIn
>>>  | Book
>>>
>
>>>
>>> 2018-02-20 11:03 GMT+01:00 Jean-Baptiste Onofré :
>>>
 Not on the PCollection ? Only ParDo ?
 Le 20 févr. 2018, à 10:50, Romain Manni-Bucau <
>rmannibu...@gmail.com>
 a écrit:
>
> Hi guys,
>
> any objection to allow to pass with the pardo a coder? Idea is to
>avoid
> to have to write your own transform to be able to configure the
>coder when
> you start from a dofn and just do something like
>
> ParDo.of(new MyFn(), new MyCoder()) which is directly integrable
>into a
> pipeline properly.
>
> wdyt?
>
> Romain Manni-Bucau
> @rmannibucau  |   Blog
>  | Old Blog
>  |  Github
>  | LinkedIn
>  | Book
>
>
>

>>>


Re: force the coder for a pardo

2018-02-20 Thread Eugene Kirpichov
Something similar was discussed a while ago, and lead to the suggestion in
PTransform Style Guide:
https://beam.apache.org/contribute/ptransform-style-guide/#setting-coders-on-output-collections

This suggestion is currently not followed by ParDo, but your plan moves in
that direction, so +1 to that.
Remembering that ParDo's may have multiple outputs, though, I'd suggest to
organize it using builder methods:
ParDo.of(new MyFn())
  .withOutputTags(...)
  .withCoder(...coder for main output...)
  .withCoder(tag1, coder1)
  .withCoder(tag2, coder2)

This would bring ParDo to be similar to all other transforms that allow
specifying a coder.

On Tue, Feb 20, 2018 at 3:41 AM Jean-Baptiste Onofré 
wrote:

> Got the point.
>
> No problem for me. Could be a new core PTransform in core or extension.
>
> Regards
> JB
> Le 20 févr. 2018, à 11:17, Romain Manni-Bucau  a
> écrit:
>>
>> Yep, idea is to encapsulate the transfo. Today if you dev a dofn you are
>> enforced to do a ptransform to force the coder which is a bit overkill in
>> general. Being able to do it on the pardo would increase the composability
>> on the user side and reduce the boilerplate needed for that.
>>
>> public class MyFn extends DoFn {
>>
>>   // ...impl
>>
>>  public static PTransform of(*...*) {
>>   return ParDo.of(MyCoder.of(), new MyFn());
>>   }
>>
>> }
>>
>> Instead of having to also impl in all fn library:
>>
>> @NoArgsConstructor(access = PROTECTED)
>> @AllArgsConstructor
>> class ParDoTransformCoderProvider extends PTransform> PCollection> {
>>
>> private Coder coder;
>>
>> private DoFn fn;
>>
>> @Override
>> public PCollection expand(final PCollection input) {
>> return input.apply(ParDo.of(fn)).setCoder(coder);
>> }
>> }
>>
>>
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |   Blog
>>  | Old Blog
>>  |  Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-02-20 11:03 GMT+01:00 Jean-Baptiste Onofré :
>>
>>> Not on the PCollection ? Only ParDo ?
>>> Le 20 févr. 2018, à 10:50, Romain Manni-Bucau < rmannibu...@gmail.com>
>>> a écrit:

 Hi guys,

 any objection to allow to pass with the pardo a coder? Idea is to avoid
 to have to write your own transform to be able to configure the coder when
 you start from a dofn and just do something like

 ParDo.of(new MyFn(), new MyCoder()) which is directly integrable into a
 pipeline properly.

 wdyt?

 Romain Manni-Bucau
 @rmannibucau  |   Blog
  | Old Blog
  |  Github
  | LinkedIn
  | Book
 

>>>
>>


Re: Code reviews in Beam

2018-02-20 Thread Jean-Baptiste Onofré
+1. It's a fair idea to have dedicated guides.

Regards
JB

Le 20 févr. 2018 à 14:43, à 14:43, Alexey Romanenko  
a écrit:
>Reuven, thank you for bringing this topic.
>
>As a new contributor to Beam codebase I raise two my hands for such
>guideline document and I'd propose to add it as a new guide into
>section “Other Guides” on web site documentation.
>
>For sure, there are already several very helpful and detailed guides,
>like “PTransform style guide” and “Runner authoring guide” that help a
>lot. However, IMO, it would make sense, perhaps, to have a new guide
>which is dedicated only to Code Review process and will be helpful as
>for new contributors so for reviewers too. Probably, it might look like
>a top list of common mistakes because of them some PRs were rejected
>and places where it is required to pay attention but, of course, format
>is open and need to be discussed.
>
>I believe that it should reduce the number of common mistakes for
>newcomers like me and keep common the guide lines for all participants
>of review process.
>
>WBR,
>Alexey
>
>> On 20 Feb 2018, at 14:01, Aljoscha Krettek 
>wrote:
>>
>> This is excellent!
>>
>> I can't really add anything right now but I think having a PR
>dashboard is one of the most important points because it also
>indirectly solves "Review Latency" and "Code Review Response SLA" by
>making things more visible.
>>
>> --
>> Aljoscha
>>
>>> On 19. Feb 2018, at 19:32, Reuven Lax > wrote:
>>>
>>> There have been a number of threads on code reviews (most recently
>on a "State of the project" email). These threads have died out without
>much resolution, but I'm not sure that the concerns have gone away. 
>>>
>>> First of all, I'm of the opinion that a code-review bar for Beam
>commits is critical to success of the project. This is a system with
>many subtle semantics, which might not be obvious at first glance. Beam
>pipelines process user data, and the consequence of certain bugs might
>mean corrupting user data and aggregations - something to avoid at all
>cost if we want Beam to be trusted. Finally Beam pipelines often run at
>extremely high scale; while many of our committers have a strong
>intuition for what can go wrong when running at high scale, not
>everybody who wants to contribute will  have this experience.
>>>
>>> However, we also cannot afford to let our policy get in the way of
>building a community. We must remain a friendly place to develop and
>contribute.
>>>
>>> When I look at concerns people have had on on code reviews (and I've
>been browsing most PRs this past year), I see a few common threads:
>>>
>>> Review Latency
>>> Latency on code reviews can be too high. At various times folks
>(most recently, Ahmet and I) have tried to regularly look for stale PRs
>and ping them, but latency still remains high.
>>>
>>> Pedantic
>>> Overly-pedantic comments (change variable names, etc.) can be
>frustrating. The PR author can feel like they are being forced to make
>meaningless changes just so the reviewer will allow merging. Note that
>this is sometimes in the eye of the beholder - the reviewer may not
>think all these comments are pedantic.
>>>
>>> Don't Do This
>>> Sometimes a reviewer rejects an entire PR, saying that this should
>not be done. There are various reasons given: this won't scale, this
>will break backwards compatibility, this will break a specific runner,
>etc. The PR author may not always understand or agree with these
>reasons, and this can leave hurt feelings.
>>>
>>> I would like open discussion about ways of making our code-review
>policy more welcoming. I'll seed the discussion with a few ideas:
>>>
>>> Code Review Dashboard and Automation
>>> We should invest in adding a code-review dashboard to our site,
>tracking stale PRs by reviewer. Quick turnaround on code reviews is
>essential building community, so all Beam committers should consider
>reviewing code as important as their own coding.  Spark has built a PR
>dashboard (https://spark-prs.appspot.com/
>) which they’ve found better than
>Github’s dashboard; we could easily fork this dashboard. There are also
>tools that will automatically ping reviewers (mention-bot and hey there
>are two such tools). We can also make sure that new PRs are auto
>assigned a reviewer (e.g. https://github.com/imsky/pull-review
>)
>>>
>>> Code Review Response SLA
>>> It would be great if we could agree on a response-time SLA for Beam
>code reviews. The response might be “I am unable to do the review until
>next week,” however even that is better than getting no response.
>>>
>>> Guideline Document
>>> I think we should have a guideline document, explaining common
>reasons a reviewer might reject an approach in a  PR. e.g. "This will
>cause scaling problems," "This will cause problems for XXX runner,"
>"This is backwards incompatible."  

Re: Code reviews in Beam

2018-02-20 Thread Jean-Baptiste Onofré
Hi Reuven

I agree with all your points except maybe in term of bar level, especially on 
new features (like extensions or IOs). If the PRs on the core should be heavily 
reviewed, I'm more in favor of merging the PR pretty fast even if not perfect. 
It's not a technical topic, it's really a contribution and community topic.

Thanks anyway, the dashboard is a good idea !

Regards
JB

Le 19 févr. 2018 à 19:33, à 19:33, Reuven Lax  a écrit:
>There have been a number of threads on code reviews (most recently on a
>"State of the project" email). These threads have died out without much
>resolution, but I'm not sure that the concerns have gone away.
>
>First of all, I'm of the opinion that a code-review bar for Beam
>commits is
>critical to success of the project. This is a system with many subtle
>semantics, which might not be obvious at first glance. Beam pipelines
>process user data, and the consequence of certain bugs might mean
>corrupting user data and aggregations - something to avoid at all cost
>if
>we want Beam to be trusted. Finally Beam pipelines often run at
>extremely
>high scale; while many of our committers have a strong intuition for
>what
>can go wrong when running at high scale, not everybody who wants to
>contribute will  have this experience.
>
>
>However, we also cannot afford to let our policy get in the way of
>building
>a community. We *must* remain a friendly place to develop and
>contribute.
>
>
>When I look at concerns people have had on on code reviews (and I've
>been
>browsing most PRs this past year), I see a few common threads:
>
>
>*Review Latency*
>
>Latency on code reviews can be too high. At various times folks (most
>recently, Ahmet and I) have tried to regularly look for stale PRs and
>ping
>them, but latency still remains high.
>
>
>*Pedantic*
>
>Overly-pedantic comments (change variable names, etc.) can be
>frustrating.
>The PR author can feel like they are being forced to make meaningless
>changes just so the reviewer will allow merging. Note that this is
>sometimes in the eye of the beholder - the reviewer may not think all
>these
>comments are pedantic.
>
>
>*Don't Do This*
>
>Sometimes a reviewer rejects an entire PR, saying that this should not
>be
>done. There are various reasons given: this won't scale, this will
>break
>backwards compatibility, this will break a specific runner, etc. The PR
>author may not always understand or agree with these reasons, and this
>can
>leave hurt feelings.
>
>
>I would like open discussion about ways of making our code-review
>policy
>more welcoming. I'll seed the discussion with a few ideas:
>
>
>*Code Review Dashboard and Automation*
>
>We should invest in adding a code-review dashboard to our site,
>tracking
>stale PRs by reviewer. Quick turnaround on code reviews is essential
>building community, so all Beam committers should consider reviewing
>code
>as important as their own coding.  Spark has built a PR dashboard (
>https://spark-prs.appspot.com/) which they’ve found better than
>Github’s
>dashboard; we could easily fork this dashboard. There are also tools
>that
>will automatically ping reviewers (mention-bot and hey there are two
>such
>tools). We can also make sure that new PRs are auto assigned a reviewer
>(e.g. https://github.com/imsky/pull-review)
>
>
>*Code Review Response SLA*
>
>It would be great if we could agree on a response-time SLA for Beam
>code
>reviews. The response might be “I am unable to do the review until next
>week,” however even that is better than getting no response.
>
>
>*Guideline Document*
>
>I think we should have a guideline document, explaining common reasons
>a
>reviewer might reject an approach in a  PR. e.g. "This will cause
>scaling
>problems," "This will cause problems for XXX runner," "This is
>backwards
>incompatible."  Reviewers can point to this doc as part of their
>comments,
>along with extra flavor. e.g. “as per the guideline doc, this will
>cause
>scaling problems, and here’s why.”
>
>
>*Guidelines on Comments*
>
>Not everyone agrees on which comments are pedantic, which makes it hard
>to
>have specific guidelines here. One general guideline me might adopt: if
>it'll take less time for the reviewer to make the changes themselves,
>it's
>not an appropriate comment. The reviewer should fix those issues  a
>follow-on PR. This adds a bit more burden on reviewers, but IMO is
>worth it
>to keep Beam a friendly environment. This is especially important for
>first-time contributors, who might otherwise lost interest. If the
>author
>is a seasoned Beam contributor, we can expect more out of them.
>
>
>We should also make sure that these fixups serve as educational moments
>for
>the new contributor. “Thanks for the contribution! I’ll be making some
>changes during the merge so that the code stays consistent across the
>codebase - keep an eye on them.”
>
>
>Would love to hear more thoughts.
>
>
>Reuven


Re: Code reviews in Beam

2018-02-20 Thread Alexey Romanenko
Reuven, thank you for bringing this topic.

As a new contributor to Beam codebase I raise two my hands for such guideline 
document and I'd propose to add it as a new guide into section “Other Guides” 
on web site documentation. 

For sure, there are already several very helpful and detailed guides, like 
“PTransform style guide” and “Runner authoring guide” that help a lot. However, 
IMO, it would make sense, perhaps, to have a new guide which is dedicated only 
to Code Review process and will be helpful as for new contributors so for 
reviewers too. Probably, it might look like a top list of common mistakes 
because of them some PRs were rejected and places where it is required to pay 
attention but, of course, format is open and need to be discussed.

I believe that it should reduce the number of common mistakes for newcomers 
like me and keep common the guide lines for all participants of review process.

WBR,
Alexey

> On 20 Feb 2018, at 14:01, Aljoscha Krettek  wrote:
> 
> This is excellent!
> 
> I can't really add anything right now but I think having a PR dashboard is 
> one of the most important points because it also indirectly solves "Review 
> Latency" and "Code Review Response SLA" by making things more visible.
> 
> --
> Aljoscha
> 
>> On 19. Feb 2018, at 19:32, Reuven Lax > > wrote:
>> 
>> There have been a number of threads on code reviews (most recently on a 
>> "State of the project" email). These threads have died out without much 
>> resolution, but I'm not sure that the concerns have gone away. 
>> 
>> First of all, I'm of the opinion that a code-review bar for Beam commits is 
>> critical to success of the project. This is a system with many subtle 
>> semantics, which might not be obvious at first glance. Beam pipelines 
>> process user data, and the consequence of certain bugs might mean corrupting 
>> user data and aggregations - something to avoid at all cost if we want Beam 
>> to be trusted. Finally Beam pipelines often run at extremely high scale; 
>> while many of our committers have a strong intuition for what can go wrong 
>> when running at high scale, not everybody who wants to contribute will  have 
>> this experience.
>> 
>> However, we also cannot afford to let our policy get in the way of building 
>> a community. We must remain a friendly place to develop and contribute.
>> 
>> When I look at concerns people have had on on code reviews (and I've been 
>> browsing most PRs this past year), I see a few common threads:
>> 
>> Review Latency
>> Latency on code reviews can be too high. At various times folks (most 
>> recently, Ahmet and I) have tried to regularly look for stale PRs and ping 
>> them, but latency still remains high. 
>> 
>> Pedantic
>> Overly-pedantic comments (change variable names, etc.) can be frustrating. 
>> The PR author can feel like they are being forced to make meaningless 
>> changes just so the reviewer will allow merging. Note that this is sometimes 
>> in the eye of the beholder - the reviewer may not think all these comments 
>> are pedantic.
>> 
>> Don't Do This
>> Sometimes a reviewer rejects an entire PR, saying that this should not be 
>> done. There are various reasons given: this won't scale, this will break 
>> backwards compatibility, this will break a specific runner, etc. The PR 
>> author may not always understand or agree with these reasons, and this can 
>> leave hurt feelings.
>> 
>> I would like open discussion about ways of making our code-review policy 
>> more welcoming. I'll seed the discussion with a few ideas:
>> 
>> Code Review Dashboard and Automation
>> We should invest in adding a code-review dashboard to our site, tracking 
>> stale PRs by reviewer. Quick turnaround on code reviews is essential 
>> building community, so all Beam committers should consider reviewing code as 
>> important as their own coding.  Spark has built a PR dashboard 
>> (https://spark-prs.appspot.com/ ) which 
>> they’ve found better than Github’s dashboard; we could easily fork this 
>> dashboard. There are also tools that will automatically ping reviewers 
>> (mention-bot and hey there are two such tools). We can also make sure that 
>> new PRs are auto assigned a reviewer (e.g. 
>> https://github.com/imsky/pull-review )
>> 
>> Code Review Response SLA
>> It would be great if we could agree on a response-time SLA for Beam code 
>> reviews. The response might be “I am unable to do the review until next 
>> week,” however even that is better than getting no response.
>> 
>> Guideline Document
>> I think we should have a guideline document, explaining common reasons a 
>> reviewer might reject an approach in a  PR. e.g. "This will cause scaling 
>> problems," "This will cause problems for XXX runner," "This is backwards 
>> incompatible."  Reviewers can point to this doc as part of their comments, 
>> along 

Re: Code reviews in Beam

2018-02-20 Thread Aljoscha Krettek
This is excellent!

I can't really add anything right now but I think having a PR dashboard is one 
of the most important points because it also indirectly solves "Review Latency" 
and "Code Review Response SLA" by making things more visible.

--
Aljoscha

> On 19. Feb 2018, at 19:32, Reuven Lax  wrote:
> 
> There have been a number of threads on code reviews (most recently on a 
> "State of the project" email). These threads have died out without much 
> resolution, but I'm not sure that the concerns have gone away. 
> 
> First of all, I'm of the opinion that a code-review bar for Beam commits is 
> critical to success of the project. This is a system with many subtle 
> semantics, which might not be obvious at first glance. Beam pipelines process 
> user data, and the consequence of certain bugs might mean corrupting user 
> data and aggregations - something to avoid at all cost if we want Beam to be 
> trusted. Finally Beam pipelines often run at extremely high scale; while many 
> of our committers have a strong intuition for what can go wrong when running 
> at high scale, not everybody who wants to contribute will  have this 
> experience.
> 
> However, we also cannot afford to let our policy get in the way of building a 
> community. We must remain a friendly place to develop and contribute.
> 
> When I look at concerns people have had on on code reviews (and I've been 
> browsing most PRs this past year), I see a few common threads:
> 
> Review Latency
> Latency on code reviews can be too high. At various times folks (most 
> recently, Ahmet and I) have tried to regularly look for stale PRs and ping 
> them, but latency still remains high. 
> 
> Pedantic
> Overly-pedantic comments (change variable names, etc.) can be frustrating. 
> The PR author can feel like they are being forced to make meaningless changes 
> just so the reviewer will allow merging. Note that this is sometimes in the 
> eye of the beholder - the reviewer may not think all these comments are 
> pedantic.
> 
> Don't Do This
> Sometimes a reviewer rejects an entire PR, saying that this should not be 
> done. There are various reasons given: this won't scale, this will break 
> backwards compatibility, this will break a specific runner, etc. The PR 
> author may not always understand or agree with these reasons, and this can 
> leave hurt feelings.
> 
> I would like open discussion about ways of making our code-review policy more 
> welcoming. I'll seed the discussion with a few ideas:
> 
> Code Review Dashboard and Automation
> We should invest in adding a code-review dashboard to our site, tracking 
> stale PRs by reviewer. Quick turnaround on code reviews is essential building 
> community, so all Beam committers should consider reviewing code as important 
> as their own coding.  Spark has built a PR dashboard 
> (https://spark-prs.appspot.com/ ) which 
> they’ve found better than Github’s dashboard; we could easily fork this 
> dashboard. There are also tools that will automatically ping reviewers 
> (mention-bot and hey there are two such tools). We can also make sure that 
> new PRs are auto assigned a reviewer (e.g. 
> https://github.com/imsky/pull-review )
> 
> Code Review Response SLA
> It would be great if we could agree on a response-time SLA for Beam code 
> reviews. The response might be “I am unable to do the review until next 
> week,” however even that is better than getting no response.
> 
> Guideline Document
> I think we should have a guideline document, explaining common reasons a 
> reviewer might reject an approach in a  PR. e.g. "This will cause scaling 
> problems," "This will cause problems for XXX runner," "This is backwards 
> incompatible."  Reviewers can point to this doc as part of their comments, 
> along with extra flavor. e.g. “as per the guideline doc, this will cause 
> scaling problems, and here’s why.”
> 
> Guidelines on Comments
> Not everyone agrees on which comments are pedantic, which makes it hard to 
> have specific guidelines here. One general guideline me might adopt: if it'll 
> take less time for the reviewer to make the changes themselves, it's not an 
> appropriate comment. The reviewer should fix those issues  a follow-on PR. 
> This adds a bit more burden on reviewers, but IMO is worth it to keep Beam a 
> friendly environment. This is especially important for first-time 
> contributors, who might otherwise lost interest. If the author is a seasoned 
> Beam contributor, we can expect more out of them.  
> 
> We should also make sure that these fixups serve as educational moments for 
> the new contributor. “Thanks for the contribution! I’ll be making some 
> changes during the merge so that the code stays consistent across the 
> codebase - keep an eye on them.” 
> 
> Would love to hear more thoughts.
> 
> Reuven
> 



Re: force the coder for a pardo

2018-02-20 Thread Jean-Baptiste Onofré
Not on the PCollection ? Only ParDo ?

Le 20 févr. 2018 à 10:50, à 10:50, Romain Manni-Bucau  a 
écrit:
>Hi guys,
>
>any objection to allow to pass with the pardo a coder? Idea is to avoid
>to
>have to write your own transform to be able to configure the coder when
>you
>start from a dofn and just do something like
>
>ParDo.of(new MyFn(), new MyCoder()) which is directly integrable into a
>pipeline properly.
>
>wdyt?
>
>Romain Manni-Bucau
>@rmannibucau  |  Blog
> | Old Blog
> | Github
> |
>LinkedIn  | Book
>


force the coder for a pardo

2018-02-20 Thread Romain Manni-Bucau
Hi guys,

any objection to allow to pass with the pardo a coder? Idea is to avoid to
have to write your own transform to be able to configure the coder when you
start from a dofn and just do something like

ParDo.of(new MyFn(), new MyCoder()) which is directly integrable into a
pipeline properly.

wdyt?

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Build failed in Jenkins: beam_Release_NightlySnapshot #691

2018-02-20 Thread Apache Jenkins Server
See 


Changes:

[lcwik] [BEAM-3690] swapping to use mockito-core, hamcrest-core and

[github] Updates javadocs of Setup and Teardown

--
[...truncated 3.09 MB...]
2018-02-20T08:44:56.485 [INFO] Excluding org.json4s:json4s-ast_2.11:jar:3.2.11 
from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding org.scala-lang:scalap:jar:2.11.0 from 
the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.scala-lang:scala-compiler:jar:2.11.0 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.scala-lang.modules:scala-xml_2.11:jar:1.0.1 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.jersey.core:jersey-client:jar:2.22.2 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding javax.ws.rs:javax.ws.rs-api:jar:2.0.1 
from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.hk2:hk2-api:jar:2.4.0-b34 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.hk2:hk2-utils:jar:2.4.0-b34 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34 from the shaded 
jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.hk2:hk2-locator:jar:2.4.0-b34 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.jersey.core:jersey-common:jar:2.22.2 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
javax.annotation:javax.annotation-api:jar:1.2 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.22.2 from the shaded 
jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.hk2:osgi-resource-locator:jar:1.0.1 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.jersey.core:jersey-server:jar:2.22.2 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2 from the 
shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2 from 
the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding io.netty:netty-all:jar:4.0.43.Final 
from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding io.netty:netty:jar:3.9.9.Final from 
the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
io.dropwizard.metrics:metrics-jvm:jar:3.1.2 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
io.dropwizard.metrics:metrics-json:jar:3.1.2 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
io.dropwizard.metrics:metrics-graphite:jar:3.1.2 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding org.apache.ivy:ivy:jar:2.4.0 from the 
shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding oro:oro:jar:2.0.8 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding net.razorvine:pyrolite:jar:4.13 from 
the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding net.sf.py4j:py4j:jar:0.10.4 from the 
shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.apache.spark:spark-tags_2.11:jar:2.2.1 from the shaded jar.
2018-02-20T08:44:56.485 [INFO] Excluding 
org.apache.commons:commons-crypto:jar:1.0.0 from the shaded jar.
2018-02-20T08:44:56.486 [INFO] Excluding 
org.spark-project.spark:unused:jar:1.0.0 from the shaded jar.
2018-02-20T08:44:56.486 [INFO] Excluding 
org.apache.spark:spark-streaming_2.11:jar:2.2.1 from the shaded jar.
2018-02-20T08:44:58.325 [INFO] Replacing original artifact with shaded artifact.
2018-02-20T08:44:58.432 [INFO] 
2018-02-20T08:44:58.432 [INFO] --- maven-assembly-plugin:3.1.0:single 
(source-release-assembly) @ beam-sdks-java-javadoc ---
2018-02-20T08:44:58.435 [INFO] Skipping the assembly in this project because 
it's not the Execution Root
2018-02-20T08:44:58.543 [INFO] 
2018-02-20T08:44:58.543 [INFO] --- maven-source-plugin:3.0.1:jar-no-fork 
(attach-sources) @ beam-sdks-java-javadoc ---
2018-02-20T08:44:58.652 [INFO] 
2018-02-20T08:44:58.652 [INFO] --- maven-source-plugin:3.0.1:test-jar-no-fork 
(attach-test-sources) @ beam-sdks-java-javadoc ---
2018-02-20T08:44:58.759 [INFO] 
2018-02-20T08:44:58.759 [INFO] --- maven-javadoc-plugin:3.0.0-M1:jar 
(attach-javadocs) @ beam-sdks-java-javadoc ---
2018-02-20T08:44:58.763 [INFO] Not executing Javadoc as the project is not a 
Java classpath-capable package
2018-02-20T08:44:58.871 [INFO] 
2018-02-20T08:44:58.871 [INFO] --- 
reproducible-build-maven-plugin:0.4:strip-jar (default) @ 
beam-sdks-java-javadoc ---
2018-02-20T08:44:58.872 [INFO] Stripping 

2018-02-20T08:44:59.047