date:20190710

Re: Phrase triggering jobs problem

2019-07-10 Thread Udi Meiri

Thanks Kenn.

On Wed, Jul 10, 2019 at 3:31 PM Kenneth Knowles  wrote:

> Just noticed this thread. Infra turned off one of the GitHub plugins - the
> one we use. I forwarded the announcement. I'll see if we can get it back on
> for a bit so we can migrate off. I'm not sure if they have identical job
> DSL or not.
>
> On Wed, Jul 10, 2019 at 12:32 PM Udi Meiri  wrote:
>
>> Still happening for me too.
>>
>> On Wed, Jul 10, 2019 at 10:40 AM Lukasz Cwik  wrote:
>>
>>> This has happened in the past. Usually there is some issue where Jenkins
>>> isn't notified of new PRs by Github or doesn't see the PR phrases and hence
>>> Jenkins sits around idle. This is usually fixed after a few hours without
>>> any action on our part.
>>>
>>> On Wed, Jul 10, 2019 at 10:28 AM Katarzyna Kucharczyk <
>>> ka.kucharc...@gmail.com> wrote:
>>>
 Hi all,

 Hope it's not duplicate but I can't find if any issue with phrase
 triggering in Jenkins was already here.
 Currently, I started third PR and no test were triggered there. I tried
 to trigger some tests manually, but with no effect.

 Am I missing something?

 Here are links to my problematic PRs:
 https://github.com/apache/beam/pull/9033
 https://github.com/apache/beam/pull/9034
 https://github.com/apache/beam/pull/9035

 Thanks,
 Kasia

>>>


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Phrase triggering jobs problem

2019-07-10 Thread Kenneth Knowles

Just noticed this thread. Infra turned off one of the GitHub plugins - the
one we use. I forwarded the announcement. I'll see if we can get it back on
for a bit so we can migrate off. I'm not sure if they have identical job
DSL or not.

On Wed, Jul 10, 2019 at 12:32 PM Udi Meiri  wrote:

> Still happening for me too.
>
> On Wed, Jul 10, 2019 at 10:40 AM Lukasz Cwik  wrote:
>
>> This has happened in the past. Usually there is some issue where Jenkins
>> isn't notified of new PRs by Github or doesn't see the PR phrases and hence
>> Jenkins sits around idle. This is usually fixed after a few hours without
>> any action on our part.
>>
>> On Wed, Jul 10, 2019 at 10:28 AM Katarzyna Kucharczyk <
>> ka.kucharc...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Hope it's not duplicate but I can't find if any issue with phrase
>>> triggering in Jenkins was already here.
>>> Currently, I started third PR and no test were triggered there. I tried
>>> to trigger some tests manually, but with no effect.
>>>
>>> Am I missing something?
>>>
>>> Here are links to my problematic PRs:
>>> https://github.com/apache/beam/pull/9033
>>> https://github.com/apache/beam/pull/9034
>>> https://github.com/apache/beam/pull/9035
>>>
>>> Thanks,
>>> Kasia
>>>
>>

Re: [VOTE] Vendored Dependencies Release

2019-07-10 Thread Lukasz Cwik

No, the classes weren't expected inside of Guava. Cancelling this release
candidate.

I don't believe the protos/certs/keys matter and were part of our prior
1.13.1 release as well[1].

I found out that we stopped validating the contents of the vendored jar as
part of the release process and opened up pr/9036[2] to fix our validation.

Kai, I will try to add the vendored bytebuddy to the next release, it would
be useful if you could help review the PRs related to vendoring.

1:
https://repo1.maven.org/maven2/org/apache/beam/beam-vendor-grpc-1_13_1/0.2/beam-vendor-grpc-1_13_1-0.2.jar
2: https://github.com/apache/beam/pull/9036

On Wed, Jul 10, 2019 at 1:34 PM Kai Jiang  wrote:

> pull/8357  proposes to vendor
> bytebuddy artifact.
> Is it possible to release "beam-vendor-bytebuddy-1_9_3" in next release
> candidate?
>
> Best,
> Kai
>
> On Wed, Jul 10, 2019 at 11:31 AM Kenneth Knowles  wrote:
>
>> grpc: jar contains certs, keys, protos at the top level; intended?
>>
>> guava: jar contains classes not in vendored prefix, with prefixes such as
>> com/google/j2objc, org/codehaus/mojo, com/google/errorprone,
>> org/checkerframework, javax/annotation
>>
>> On Tue, Jul 9, 2019 at 3:34 PM Lukasz Cwik  wrote:
>>
>>> Please review the release of the following artifacts that we vendor:
>>>  * beam-vendor-grpc_1_21_0
>>>  * beam-vendor-guava-26_0-jre
>>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the
>>> org.apache.beam:beam-vendor-grpc_1_21_0:0.1 and
>>> org.apache.beam:beam-vendor-guava-26_0-jre:0.1, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which includes:
>>> * the official Apache source release to be deployed to dist.apache.org
>>> 
>>> [1], which is signed with the key with fingerprint
>>> EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
>>> * all artifacts to be deployed to the Maven Central Repository [3],
>>> * commit hash "b4efbb23cc5dec80b8bbd8745c62efecdadfa236" [4],
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> Luke
>>>
>>> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
>>> 
>>> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> 
>>> [3]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1076/
>>> 
>>> [4]
>>> https://github.com/apache/beam/commit/b4efbb23cc5dec80b8bbd8745c62efecdadfa236
>>> 
>>>
>>

New Design Doc for Cost Based Optimization

2019-07-10 Thread Alireza Samadian

Dear Members of Beam Community,

Previously I had shared a document discussing row count estimation for the
source tables in a query.
https://docs.google.com/document/d/1vi1PBBu5IqSy-qZl1Gk-49CcANOpbNs1UAud6LnOaiY/edit

I wrote another document that discusses the Cost Model and Join Reordering,
and I am probably going to polish and merge the previous one as a section
of this document. The following is the link to the new document.
https://docs.google.com/document/d/1DM_bcfFbIoc_vEoqQxhC7AvHBUDVCAwToC8TYGukkII/edit?usp=sharing
I will appreciate your comments and suggestions.

Best,
Alireza Samadian

Re: Python Utilities

2019-07-10 Thread Reuven Lax

On Wed, Jul 10, 2019 at 9:56 AM Rui Wang  wrote:

> The second link points to the first join utility in Beam. The idea is
> similar: people can use the utility to do joins without writing them own.
> BeamSQL also uses it.
>
> The first link points to Schema API. I actually thought Schema API also
> uses the join utility, and turns out it doesn't (I am not sure what's the
> reason though).
>

The Schema one is more general as well, in that it supports joining N
inputs.


>
> Basically I think it's encouraged to reuse the same join utility if
> possible.
>
> -Rui
>
> On Wed, Jul 10, 2019 at 8:01 AM Shannon Duncan 
> wrote:
>
>> So it seams that the Java SDK has two different Join libraries?
>>
>> With Schema:
>> https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
>> And Another one:
>> https://github.com/apache/beam/blob/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java
>>
>> So how does it handle that?
>>
>> On Mon, Jul 8, 2019 at 12:39 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Yeah these are for local testing right now. I was hoping to gain insight
>>> on better naming.
>>>
>>> I was thinking of creating an "extras" module.
>>>
>>> On Mon, Jul 8, 2019, 12:28 PM Robin Qiu  wrote:
>>>
 Hi Shannon,

 Thanks for sharing the repo! I took a quick look and I have a concern
 with the naming of the transforms.

 Currently, Beam Java already have "Select" and "Join" transforms.
 However, they work on schemas, a feature that is not yet implemented in
 Beam Python. (See
 https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
 )

 To maintain consistency between SDKs, I think it is good to avoid
 having two different transforms with the same name but different functions.
 So maybe you can consider renaming the transforms or/and putting it in an
 extension Python module, instead of the main ones?

 Best,
 Robin

 On Mon, Jul 8, 2019 at 9:19 AM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> As a follow up. Here is the repo that contains the utilities for now.
> https://github.com/shadowcodex/apache-beam-utilities. Will put
> together a proper PR as code gets closer to production quality.
>
> - Shannon
>
> On Mon, Jul 8, 2019 at 9:20 AM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> Thanks Frederik,
>>
>> That's exactly where I was looking. I did get permission to open
>> source the utilities module. So I'm going to throw them up on my personal
>> github soon and share with the email group for a look over.
>>
>> I'm going to work on the utilities there because it's a quick dev
>> environment and then once they are ready for proper PR I'll begin working
>> them into the actual SDK for a PR.
>>
>> I also joined the slack #beam and #beam-python channels, I was unsure
>> of where most collaborators discussed items.
>>
>> - Shannon
>>
>> On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode 
>> wrote:
>>
>>> Hi Shannon,
>>>
>>> This is probably a good starting point:
>>> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
>>> .
>>>
>>> Frederik
>>>
>>> [image: https://ml6.eu]
>>> 
>>>
>>>
>>> * Frederik Bode*
>>>
>>> ML6 Ghent
>>> 
>>> +32 4 92 78 96 18
>>>
>>>
>>>  DISCLAIMER 
>>>
>>> This email and any files transmitted with it are confidential and
>>> intended solely for the use of the individual or entity to whom they are
>>> addressed. If you have received this email in error please notify the
>>> system manager. This message contains confidential information and is
>>> intended only for the individual named. If you are not the named 
>>> addressee
>>> you should not disseminate, distribute or copy this e-mail. Please 
>>> notify
>>> the sender immedi

Fwd: [NOTICE] Jenkins GHPRB deprecated, please switch :)

2019-07-10 Thread Kenneth Knowles

This applies to our jobs.

Kenn

-- Forwarded message -
From: Daniel Gruno 
Date: Wed, Jul 10, 2019 at 1:33 PM
Subject: [NOTICE] Jenkins GHPRB deprecated, please switch :)
To: 

Hi folks,
as part of some cleanup and consolidation (essentially we don't want to
maintain two different plugins that do the same thing), we have removed
support for the old GitHub PR Builder on Jenkins, and are focusing on the
modern variant. If your build previously made use of the old one (It's
called "GitHub Pull Request Builder" in the job configuration), we ask that
you please switch to the newer one (called "Build pull requests to the
repository" in the same config). There should be no other changes, but if
your builds start acting up, do let infra know :).

As an added bonus, you no longer need to contact infra about webhooks when
setting up PR builds for new repositories, it should just work(tm).

With regards,
Daniel on behalf of ASF Infra.

Re: [VOTE] Vendored Dependencies Release

2019-07-10 Thread Kai Jiang

pull/8357  proposes to vendor
bytebuddy artifact.
Is it possible to release "beam-vendor-bytebuddy-1_9_3" in next release
candidate?

Best,
Kai

On Wed, Jul 10, 2019 at 11:31 AM Kenneth Knowles  wrote:

> grpc: jar contains certs, keys, protos at the top level; intended?
>
> guava: jar contains classes not in vendored prefix, with prefixes such as
> com/google/j2objc, org/codehaus/mojo, com/google/errorprone,
> org/checkerframework, javax/annotation
>
> On Tue, Jul 9, 2019 at 3:34 PM Lukasz Cwik  wrote:
>
>> Please review the release of the following artifacts that we vendor:
>>  * beam-vendor-grpc_1_21_0
>>  * beam-vendor-guava-26_0-jre
>>
>> Hi everyone,
>> Please review and vote on the release candidate #2 for the
>> org.apache.beam:beam-vendor-grpc_1_21_0:0.1 and
>> org.apache.beam:beam-vendor-guava-26_0-jre:0.1, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which includes:
>> * the official Apache source release to be deployed to dist.apache.org
>> 
>> [1], which is signed with the key with fingerprint
>> EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
>> * all artifacts to be deployed to the Maven Central Repository [3],
>> * commit hash "b4efbb23cc5dec80b8bbd8745c62efecdadfa236" [4],
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Luke
>>
>> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
>> 
>> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>> 
>> [3]
>> https://repository.apache.org/content/repositories/orgapachebeam-1076/
>> 
>> [4]
>> https://github.com/apache/beam/commit/b4efbb23cc5dec80b8bbd8745c62efecdadfa236
>> 
>>
>

Re: Hazelcast Jet Runner

2019-07-10 Thread Ismaël Mejía

Yes please!

On Wed, Jul 10, 2019 at 8:38 PM Kenneth Knowles  wrote:
>
> Just to make sure we have closed on the Jet runner, my understanding is: I 
> was the main person asking for "runners-jet-experimental" but I am convinced 
> to go with plain "runners-jet". It seems everyone else is already fine with 
> this, so go ahead?
>
> On Tue, Jul 9, 2019 at 1:23 PM Maximilian Michels  wrote:
>>
>> We should fork the discussion around removing instances of @Experimental, 
>> but it was good to mention it here.
>>
>> As for the Jet runner, I can only second Ismael: The Jet runner is the first 
>> runner I can think of that came with ValidatesRunner and Nexmark out of the 
>> box. Of course that doesn't mean the runner is "battled-tested", but we do 
>> not have other means to test its maturity.
>>
>> For the future, we could come up with other criteria, e.g. a "probation 
>> period", but enforcing this now seems arbitrary.
>>
>> If the authors of the Runners decide that it is experimental, so be it. 
>> Otherwise I would leave it to the user to decide (it might be helpful to 
>> list the inception date of each runner). That said, I value your concern 
>> Kenn. I can see that we establish a consistent onboarding of new runners 
>> which may involve marking them experimental for a while.
>>
>> -Max
>>
>> On 01.07.19 22:20, Kenneth Knowles wrote:
>> >
>> >
>> > On Wed, Jun 12, 2019 at 2:32 AM Ismaël Mejía > > > wrote:
>> >
>> > Seems the discussion moved a bit of my original intent that was to
>> > make the Jet runner directory to be just called runners/jet in the
>> > directory and mark the 'experimental' part of it in documentation as
>> > we do for all other things in Beam.
>> >
>> >
>> > Thanks for returning to the one question at hand. We don't have to make
>> > an overall decision about all "experimental" things.
>> >
>> >
>> > Can we do this or is there still any considerable argument to not do 
>> > it?
>> >
>> >
>> > I think we actually have some competing goals:
>> >
>> > I agree 100% on the arguments, but let’s think in the reverse terms,
>> > highlighting lack of maturity can play against the intended goal of
>> > use and adoption even if for a noble reason. It is basic priming 101
>> > [1].
>> >
>> >
>> > _My_ goal is exactly to highlight lack of maturity so that users are not
>> > harmed by either (1) necessary breaking changes or (2) permanent low
>> > quality. Only users who are willing to follow along with the project and
>> > update their own code regularly should use experimental features.
>> >
>> > Evaluating the Jet runner I am convinced by your arguments, because
>> > looking at the two dangers:
>> > (1) necessary breaking changes -- runners don't really have their own
>> > APIs to break, except their own small set of APIs and pipeline options
>> > (2) permanent low quality -- because there is no API design possible,
>> > there's no risk of permanent low quality except by fundamental
>> > mismatches. Plus as you mention the testing is already quite good.
>> >
>> > So I am OK to not call it experimental. But I have a slight remaining
>> > concern that it did not really go through what other runners went
>> > through. I hope this just means it is more mature. I hope it does not
>> > indicate that we are reducing rigor.
>> >
>> > Kenn
>> >
>> >
>> > On Wed, May 29, 2019 at 3:02 PM Reza Rokni > > > wrote:
>> > >
>> > > Hi,
>> > >
>> > > Over 800 usages under java, might be worth doing a few PR...
>> > >
>> > > Also suggest we use a very light review process: First round go
>> > for low hanging fruit, if anyone does a -1 against a change then we
>> > leave that for round two.
>> > >
>> > > Thoughts?
>> > >
>> > > Cheers
>> > >
>> > > Reza
>> > >
>> > > On Wed, 29 May 2019 at 12:05, Kenneth Knowles > > > wrote:
>> > >>
>> > >>
>> > >>
>> > >> On Mon, May 27, 2019 at 4:05 PM Reza Rokni > > > wrote:
>> > >>>
>> > >>> "Many APIs that have been in place for years and are used by
>> > most Beam users are still marked Experimental."
>> > >>>
>> > >>> Should there be a formal process in place to start 'graduating'
>> > features out of @Experimental? Perhaps even target an up coming
>> > release with a PR to remove the annotation from well established API's?
>> > >>
>> > >>
>> > >> Good idea. I think a PR like this would be an opportunity to
>> > discuss whether the feature is non-experimental. Probably many of
>> > them are ready. It would help to address Ismael's very good point
>> > that this new practice could make users think the old Experimental
>> > stuff is not experimental. Maybe it is true that it is not really
>> > still Experimental.
>> > >>
>> > >> Kenn
>> > >>
>> > >>
>> > >>>
>> > >>>

Re: Phrase triggering jobs problem

2019-07-10 Thread Udi Meiri

Still happening for me too.

On Wed, Jul 10, 2019 at 10:40 AM Lukasz Cwik  wrote:

> This has happened in the past. Usually there is some issue where Jenkins
> isn't notified of new PRs by Github or doesn't see the PR phrases and hence
> Jenkins sits around idle. This is usually fixed after a few hours without
> any action on our part.
>
> On Wed, Jul 10, 2019 at 10:28 AM Katarzyna Kucharczyk <
> ka.kucharc...@gmail.com> wrote:
>
>> Hi all,
>>
>> Hope it's not duplicate but I can't find if any issue with phrase
>> triggering in Jenkins was already here.
>> Currently, I started third PR and no test were triggered there. I tried
>> to trigger some tests manually, but with no effect.
>>
>> Am I missing something?
>>
>> Here are links to my problematic PRs:
>> https://github.com/apache/beam/pull/9033
>> https://github.com/apache/beam/pull/9034
>> https://github.com/apache/beam/pull/9035
>>
>> Thanks,
>> Kasia
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Hazelcast Jet Runner

2019-07-10 Thread Kenneth Knowles

Just to make sure we have closed on the Jet runner, my understanding is: I
was the main person asking for "runners-jet-experimental" but I am
convinced to go with plain "runners-jet". It seems everyone else is already
fine with this, so go ahead?

On Tue, Jul 9, 2019 at 1:23 PM Maximilian Michels  wrote:

> We should fork the discussion around removing instances of @Experimental,
> but it was good to mention it here.
>
> As for the Jet runner, I can only second Ismael: The Jet runner is the
> first runner I can think of that came with ValidatesRunner and Nexmark out
> of the box. Of course that doesn't mean the runner is "battled-tested", but
> we do not have other means to test its maturity.
>
> For the future, we could come up with other criteria, e.g. a "probation
> period", but enforcing this now seems arbitrary.
>
> If the authors of the Runners decide that it is experimental, so be it.
> Otherwise I would leave it to the user to decide (it might be helpful to
> list the inception date of each runner). That said, I value your concern
> Kenn. I can see that we establish a consistent onboarding of new runners
> which may involve marking them experimental for a while.
>
> -Max
>
> On 01.07.19 22:20, Kenneth Knowles wrote:
> >
> >
> > On Wed, Jun 12, 2019 at 2:32 AM Ismaël Mejía  > > wrote:
> >
> > Seems the discussion moved a bit of my original intent that was to
> > make the Jet runner directory to be just called runners/jet in the
> > directory and mark the 'experimental' part of it in documentation as
> > we do for all other things in Beam.
> >
> >
> > Thanks for returning to the one question at hand. We don't have to make
> > an overall decision about all "experimental" things.
> >
> >
> > Can we do this or is there still any considerable argument to not do
> it?
> >
> >
> > I think we actually have some competing goals:
> >
> > I agree 100% on the arguments, but let’s think in the reverse terms,
> > highlighting lack of maturity can play against the intended goal of
> > use and adoption even if for a noble reason. It is basic priming 101
> > [1].
> >
> >
> > _My_ goal is exactly to highlight lack of maturity so that users are not
> > harmed by either (1) necessary breaking changes or (2) permanent low
> > quality. Only users who are willing to follow along with the project and
> > update their own code regularly should use experimental features.
> >
> > Evaluating the Jet runner I am convinced by your arguments, because
> > looking at the two dangers:
> > (1) necessary breaking changes -- runners don't really have their own
> > APIs to break, except their own small set of APIs and pipeline options
> > (2) permanent low quality -- because there is no API design possible,
> > there's no risk of permanent low quality except by fundamental
> > mismatches. Plus as you mention the testing is already quite good.
> >
> > So I am OK to not call it experimental. But I have a slight remaining
> > concern that it did not really go through what other runners went
> > through. I hope this just means it is more mature. I hope it does not
> > indicate that we are reducing rigor.
> >
> > Kenn
> >
> >
> > On Wed, May 29, 2019 at 3:02 PM Reza Rokni  > > wrote:
> > >
> > > Hi,
> > >
> > > Over 800 usages under java, might be worth doing a few PR...
> > >
> > > Also suggest we use a very light review process: First round go
> > for low hanging fruit, if anyone does a -1 against a change then we
> > leave that for round two.
> > >
> > > Thoughts?
> > >
> > > Cheers
> > >
> > > Reza
> > >
> > > On Wed, 29 May 2019 at 12:05, Kenneth Knowles  > > wrote:
> > >>
> > >>
> > >>
> > >> On Mon, May 27, 2019 at 4:05 PM Reza Rokni  > > wrote:
> > >>>
> > >>> "Many APIs that have been in place for years and are used by
> > most Beam users are still marked Experimental."
> > >>>
> > >>> Should there be a formal process in place to start 'graduating'
> > features out of @Experimental? Perhaps even target an up coming
> > release with a PR to remove the annotation from well established
> API's?
> > >>
> > >>
> > >> Good idea. I think a PR like this would be an opportunity to
> > discuss whether the feature is non-experimental. Probably many of
> > them are ready. It would help to address Ismael's very good point
> > that this new practice could make users think the old Experimental
> > stuff is not experimental. Maybe it is true that it is not really
> > still Experimental.
> > >>
> > >> Kenn
> > >>
> > >>
> > >>>
> > >>> On Tue, 28 May 2019 at 06:44, Reuven Lax  > > wrote:
> > 
> >  We generally use Experimental for two different things, which
> > leads to confusion.
> > >>>

Re: [VOTE] Vendored Dependencies Release

2019-07-10 Thread Kenneth Knowles

grpc: jar contains certs, keys, protos at the top level; intended?

guava: jar contains classes not in vendored prefix, with prefixes such as
com/google/j2objc, org/codehaus/mojo, com/google/errorprone,
org/checkerframework, javax/annotation

On Tue, Jul 9, 2019 at 3:34 PM Lukasz Cwik  wrote:

> Please review the release of the following artifacts that we vendor:
>  * beam-vendor-grpc_1_21_0
>  * beam-vendor-guava-26_0-jre
>
> Hi everyone,
> Please review and vote on the release candidate #2 for the
> org.apache.beam:beam-vendor-grpc_1_21_0:0.1 and
> org.apache.beam:beam-vendor-guava-26_0-jre:0.1, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * the official Apache source release to be deployed to dist.apache.org
> [1], which is signed with the key with fingerprint
> EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
> * all artifacts to be deployed to the Maven Central Repository [3],
> * commit hash "b4efbb23cc5dec80b8bbd8745c62efecdadfa236" [4],
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Luke
>
> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1076/
> [4]
> https://github.com/apache/beam/commit/b4efbb23cc5dec80b8bbd8745c62efecdadfa236
>

Re: Beam/Samza Ensuring At Least Once semantics

2019-07-10 Thread Lukasz Cwik

When you restart the application, are you resuming it from Samza's last
commit?

Since the exception is thrown after the GBK, all the data could be read
from Kafka and forwarded to the GBK operator inside of Samza and
checkpointed in Kafka before the exception is ever thrown.

On Tue, Jul 9, 2019 at 8:34 PM Benenson, Mikhail <
mikhail_benen...@intuit.com> wrote:

> Hi
>
>
>
> I have run a few experiments to verify if 'at least once' processing is
> guarantee on Beam 2.13.0 with Samza Runner 1.1.0
>
>
>
> Beam application is a slightly modified Stream Word Count from Beam
> examples:
>
>- read strings from input Kafka topic, print (topic, partition,
>offset, value)
>- convert values to pairs (value, 1)
>- grouping in Fixed Windows with duration 30 sec
>- sum per key
>- throw exception, if key starts with 'm'
>- write (key, sum) to output Kafka topic
>
>
>
> Tried KafkaIO.read() with and without commitOffsetsInFinalize() there is
> no difference in results.
>
>
>
> Please, see src code attached.
>
>
>
> Environment:
>
>- Run with local zk & kafka, pre-create input & output topics with 1
>partition.
>- samza.properties contains "task.commit.ms=2000". According to samza
>doc "this property determines how often a checkpoint is written. The value
>is the time between checkpoints, in milliseconds". See
>
> http://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html#checkpointing.
>Please, see samza config file and run script attached.
>
>
>
>
>
> *Scenario 1: Exception in transformation*
>
>
>
> Run
>
>- Write 'a', 'b', 'c', 'm', 'd', 'e' into input topic
>- start Beam app
>- verify, that app log contains "read from topic=XXX, part=0,
>offset=100, val: e". Because input topic has only one partition, this means
>all data have been read from Kafka.
>- wait, until app terminates, because of the exception, while
>processing 'm'
>
>
>
> Expectation
>
> The order of processing after grouping is not specified, so some data
> could be written to output topic before application terminates, but I
> expect that value=m with offset 98 and all later records must NOT be marked
> as processed, so if I restart Beam app, I expect it again throws the
> exception when processing value=m.
>
> Comment: throwing exception in transformation is not a good idea, but such
> exception could be the result of application error. So, expectation is that
> after fixing the error, and restarting Beam app, it should process the
> record that cause an error.
>
>
>
> Results
>
> After I restarted app, it does NOT re-processing value m and does not
> throws an exception. If I add new value 'f' into input topic, I see  "read
> from topic=XXX, part=0, offset=101, val: f", and after some time I see 'm'
> in the output topic. So, the record with value 'm' is NOT processed.
>
>
>
>
>
> *Scenario 2: App termination*
>
>
>
> Run
>
>- Write 'g', 'h', 'i', 'j' into input topic
>- start Beam app
>- verify, that app log contains "read from topic=XXX, part=0,
>offset=105, val: j". Because input topic has only one partition, this means
>that all data has been read from Kafka.
>- wait about 10 sec, then terminate Beam app. The idea is to terminate
>app, when, ''g', 'h', 'i', 'j' are waiting in the 30 sec Fixed Windows, but
>after  task.commit.ms=2000 pass, so offsets are committed.
>
>
>
> Expectation
>
> As records 'g', 'h', 'i', 'j'  are NOT processed, I expect that after app
> restarted, it again reads ‘g’, ‘h’, ‘I’, ‘j’ from input topic and process
> these records.
>
>
>
> Results
>
> After I restarted app, it does NOT re-process  ‘g’, ‘h’, ‘I’, ‘j’ values.
> If I add new value ‘k’ into input topic, I see  “read from topic=XXX,
> part=0, offset=106, val: k”, and after some time I see ‘k’ in the output
> topic. So, the records with values ‘g’, ‘h’, ‘I’, ‘j’ are NOT processed.
>
>
>
>
>
> Based on these results I’m incline to conclude that Beam with Samza runner
> does NOT provides 'at least once' guarantee for processing.
>
>
>
> If I missed something?
>
>
>
> --
>
> Michael Benenson
>
>
>
>
>
> *From: *"LeVeck, Matt" 
> *Date: *Monday, July 1, 2019 at 5:28 PM
> *To: *"Deshpande, Omkar" , "Benenson,
> Mikhail" , Xinyu Liu ,
> Xinyu Liu , Samarth Shetty ,
> "Audo, Nicholas" 
> *Subject: *Beam/Samza Ensuring At Least Once semantics
>
>
>
> We’re seeing some behavior when using Beam’s KafkaIO and Samza as the
> runner that suggests checkpoints are getting committed even when an error
> gets throwing in the Beam Pipline while processing a batch.  Do you all
> have a recommended set of settings/patterns for using Beam with Samza to
> ensure that checkpoints are only updated after successful processing (i.e.
> the transforms succeed and the message is sent to the Beam pipeline’s final
> output sink)?
>
>
>
> Our current settings for Samza are:
>
> task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheck

[Discuss] Retractions in Beam

2019-07-10 Thread Rui Wang

Hi Community,

Retractions is a part of core Beam model [1]. I come up with a doc to
discuss retractions about use cases, model and API (see the link below).
This is a very beginning discussion on retractions but I do hope we can
have a consensus and make retractions implemented in a useful way
eventually.


doc link:
https://docs.google.com/document/d/14WRfxwk_iLUHGPty3C6ZenddPsp_d6jhmx0vuafXqmE/edit?usp=sharing


[1]: https://issues.apache.org/jira/browse/BEAM-91


-Rui

Re: Phrase triggering jobs problem

2019-07-10 Thread Lukasz Cwik

This has happened in the past. Usually there is some issue where Jenkins
isn't notified of new PRs by Github or doesn't see the PR phrases and hence
Jenkins sits around idle. This is usually fixed after a few hours without
any action on our part.

On Wed, Jul 10, 2019 at 10:28 AM Katarzyna Kucharczyk <
ka.kucharc...@gmail.com> wrote:

> Hi all,
>
> Hope it's not duplicate but I can't find if any issue with phrase
> triggering in Jenkins was already here.
> Currently, I started third PR and no test were triggered there. I tried to
> trigger some tests manually, but with no effect.
>
> Am I missing something?
>
> Here are links to my problematic PRs:
> https://github.com/apache/beam/pull/9033
> https://github.com/apache/beam/pull/9034
> https://github.com/apache/beam/pull/9035
>
> Thanks,
> Kasia
>

Phrase triggering jobs problem

2019-07-10 Thread Katarzyna Kucharczyk

Hi all,

Hope it's not duplicate but I can't find if any issue with phrase
triggering in Jenkins was already here.
Currently, I started third PR and no test were triggered there. I tried to
trigger some tests manually, but with no effect.

Am I missing something?

Here are links to my problematic PRs:
https://github.com/apache/beam/pull/9033
https://github.com/apache/beam/pull/9034
https://github.com/apache/beam/pull/9035

Thanks,
Kasia

Re: Python Utilities

2019-07-10 Thread Rui Wang

The second link points to the first join utility in Beam. The idea is
similar: people can use the utility to do joins without writing them own.
BeamSQL also uses it.

The first link points to Schema API. I actually thought Schema API also
uses the join utility, and turns out it doesn't (I am not sure what's the
reason though).

Basically I think it's encouraged to reuse the same join utility if
possible.

-Rui

On Wed, Jul 10, 2019 at 8:01 AM Shannon Duncan 
wrote:

> So it seams that the Java SDK has two different Join libraries?
>
> With Schema:
> https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
> And Another one:
> https://github.com/apache/beam/blob/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java
>
> So how does it handle that?
>
> On Mon, Jul 8, 2019 at 12:39 PM Shannon Duncan 
> wrote:
>
>> Yeah these are for local testing right now. I was hoping to gain insight
>> on better naming.
>>
>> I was thinking of creating an "extras" module.
>>
>> On Mon, Jul 8, 2019, 12:28 PM Robin Qiu  wrote:
>>
>>> Hi Shannon,
>>>
>>> Thanks for sharing the repo! I took a quick look and I have a concern
>>> with the naming of the transforms.
>>>
>>> Currently, Beam Java already have "Select" and "Join" transforms.
>>> However, they work on schemas, a feature that is not yet implemented in
>>> Beam Python. (See
>>> https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
>>> )
>>>
>>> To maintain consistency between SDKs, I think it is good to avoid having
>>> two different transforms with the same name but different functions. So
>>> maybe you can consider renaming the transforms or/and putting it in an
>>> extension Python module, instead of the main ones?
>>>
>>> Best,
>>> Robin
>>>
>>> On Mon, Jul 8, 2019 at 9:19 AM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 As a follow up. Here is the repo that contains the utilities for now.
 https://github.com/shadowcodex/apache-beam-utilities. Will put
 together a proper PR as code gets closer to production quality.

 - Shannon

 On Mon, Jul 8, 2019 at 9:20 AM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> Thanks Frederik,
>
> That's exactly where I was looking. I did get permission to open
> source the utilities module. So I'm going to throw them up on my personal
> github soon and share with the email group for a look over.
>
> I'm going to work on the utilities there because it's a quick dev
> environment and then once they are ready for proper PR I'll begin working
> them into the actual SDK for a PR.
>
> I also joined the slack #beam and #beam-python channels, I was unsure
> of where most collaborators discussed items.
>
> - Shannon
>
> On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode 
> wrote:
>
>> Hi Shannon,
>>
>> This is probably a good starting point:
>> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
>> .
>>
>> Frederik
>>
>> [image: https://ml6.eu]
>> 
>>
>>
>> * Frederik Bode*
>>
>> ML6 Ghent
>> 
>> +32 4 92 78 96 18
>>
>>
>>  DISCLAIMER 
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they are
>> addressed. If you have received this email in error please notify the
>> system manager. This message contains confidential information and is
>> intended only for the individual named. If you are not the named 
>> addressee
>> you should not disseminate, distribute or copy this e-mail. Please notify
>> the sender immediately by e-mail if you have received this e-mail by
>> mistake and delete this e-mail from your system. If you are not the
>> intended recipient you are notified that disclosing, copying, 
>> distributing
>> or taking any action in reliance on the contents of this information is

Re: Python Utilities

2019-07-10 Thread Shannon Duncan

So it seams that the Java SDK has two different Join libraries?

With Schema:
https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
And Another one:
https://github.com/apache/beam/blob/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java

So how does it handle that?

On Mon, Jul 8, 2019 at 12:39 PM Shannon Duncan 
wrote:

> Yeah these are for local testing right now. I was hoping to gain insight
> on better naming.
>
> I was thinking of creating an "extras" module.
>
> On Mon, Jul 8, 2019, 12:28 PM Robin Qiu  wrote:
>
>> Hi Shannon,
>>
>> Thanks for sharing the repo! I took a quick look and I have a concern
>> with the naming of the transforms.
>>
>> Currently, Beam Java already have "Select" and "Join" transforms.
>> However, they work on schemas, a feature that is not yet implemented in
>> Beam Python. (See
>> https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
>> )
>>
>> To maintain consistency between SDKs, I think it is good to avoid having
>> two different transforms with the same name but different functions. So
>> maybe you can consider renaming the transforms or/and putting it in an
>> extension Python module, instead of the main ones?
>>
>> Best,
>> Robin
>>
>> On Mon, Jul 8, 2019 at 9:19 AM Shannon Duncan 
>> wrote:
>>
>>> As a follow up. Here is the repo that contains the utilities for now.
>>> https://github.com/shadowcodex/apache-beam-utilities. Will put together
>>> a proper PR as code gets closer to production quality.
>>>
>>> - Shannon
>>>
>>> On Mon, Jul 8, 2019 at 9:20 AM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 Thanks Frederik,

 That's exactly where I was looking. I did get permission to open source
 the utilities module. So I'm going to throw them up on my personal github
 soon and share with the email group for a look over.

 I'm going to work on the utilities there because it's a quick dev
 environment and then once they are ready for proper PR I'll begin working
 them into the actual SDK for a PR.

 I also joined the slack #beam and #beam-python channels, I was unsure
 of where most collaborators discussed items.

 - Shannon

 On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode 
 wrote:

> Hi Shannon,
>
> This is probably a good starting point:
> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
> .
>
> Frederik
>
> [image: https://ml6.eu]
> 
>
>
> * Frederik Bode*
>
> ML6 Ghent
> 
> +32 4 92 78 96 18
>
>
>  DISCLAIMER 
>
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error please notify the
> system manager. This message contains confidential information and is
> intended only for the individual named. If you are not the named addressee
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately by e-mail if you have received this e-mail by
> mistake and delete this e-mail from your system. If you are not the
> intended recipient you are notified that disclosing, copying, distributing
> or taking any action in reliance on the contents of this information is
> strictly prohibited.
>
>
> On Mon, 8 Jul 2019 at 15:40, Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> I'm sure I could use some of the existing aggregations as a guide on
>> how to make aggregations to fill the gap of missing ones. Such as 
>> creating
>> Sum/Max/Min.
>>
>> GroupBy is really already handled with GroupByKey and CoGroupByKey
>> unless you are thinking of a different type of GroupBy?
>>
>> - Shannon
>>
>> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang  wrote:
>>
>>> Maybe also adding Aggregation/GroupBy as utilities?

Re: [Python] Read Hadoop Sequence File?

2019-07-10 Thread Shannon Duncan

If I wanted to go ahead and include this within a new Java Pipeline, what
would I be looking at for level of work to integrate?

On Wed, Jul 3, 2019 at 3:54 AM Ismaël Mejía  wrote:

> That's great. I can help whenever you need. We just need to choose its
> destination. Both the `hadoop-format` and `hadoop-file-system` modules
> are good candidates, I would even feel inclined to put it in its own
> module `sdks/java/extensions/sequencefile` to make it more easy to
> discover by the final users.
>
> A thing to consider is the SeekableByteChannel adapters, we can move
> that into hadoop-common if needed and refactor the modules to share
> code. Worth to take a look at
>
> org.apache.beam.sdk.io.hdfs.HadoopFileSystem.HadoopSeekableByteChannel#HadoopSeekableByteChannel
> to see if some of it could be useful.
>
> On Tue, Jul 2, 2019 at 11:46 PM Igor Bernstein 
> wrote:
> >
> > Hi all,
> >
> > I wrote those classes with the intention of upstreaming them to Beam. I
> can try to make some time this quarter to clean them up. I would need a bit
> of guidance from a beam expert in how to make them coexist with
> HadoopFormatIO though.
> >
> >
> > On Tue, Jul 2, 2019 at 10:55 AM Solomon Duskis 
> wrote:
> >>
> >> +Igor Bernstein who wrote the Cloud Bigtable Sequence File classes.
> >>
> >> Solomon Duskis | Google Cloud clients | sdus...@google.com |
> 914-462-0531
> >>
> >>
> >> On Tue, Jul 2, 2019 at 4:57 AM Ismaël Mejía  wrote:
> >>>
> >>> (Adding dev@ and Solomon Duskis to the discussion)
> >>>
> >>> I was not aware of these thanks for sharing David. Definitely it would
> >>> be a great addition if we could have those donated as an extension in
> >>> the Beam side. We can even evolve them in the future to be more FileIO
> >>> like. Any chance this can happen? Maybe Solomon and his team?
> >>>
> >>>
> >>>
> >>> On Tue, Jul 2, 2019 at 9:39 AM David Morávek  wrote:
> >>> >
> >>> > Hi, you can use SequenceFileSink and Source, from a BigTable client.
> Those works nice with FileIO.
> >>> >
> >>> >
> https://github.com/googleapis/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSink.java
> >>> >
> https://github.com/googleapis/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java
> >>> >
> >>> > It would be really cool to move these into Beam, but that's up to
> Googlers to decide, whether they want to donate this.
> >>> >
> >>> > D.
> >>> >
> >>> > On Tue, Jul 2, 2019 at 2:07 AM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
> >>> >>
> >>> >> It's not outside the realm of possibilities. For now I've created
> an intermediary step of a hadoop job that converts from sequence to text
> file.
> >>> >>
> >>> >> Looking into better options.
> >>> >>
> >>> >> On Mon, Jul 1, 2019, 5:50 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
> >>> >>>
> >>> >>> Java SDK has a HadoopInputFormatIO using which you should be able
> to read Sequence files:
> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
> >>> >>> I don't think there's a direct alternative for this for Python.
> >>> >>>
> >>> >>> Is it possible to write to a well-known format such as Avro
> instead of a Hadoop specific format which will allow you to read from both
> Dataproc/Hadoop and Beam Python SDK ?
> >>> >>>
> >>> >>> Thanks,
> >>> >>> Cham
> >>> >>>
> >>> >>> On Mon, Jul 1, 2019 at 3:37 PM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
> >>> 
> >>>  That's a pretty big hole for a missing source/sink when looking
> at transitioning from Dataproc to Dataflow using GCS as storage buffer
> instead of a traditional hdfs.
> >>> 
> >>>  From what I've been able to tell from source code and
> documentation, Java is able to but not Python?
> >>> 
> >>>  Thanks,
> >>>  Shannon
> >>> 
> >>>  On Mon, Jul 1, 2019 at 5:29 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
> >>> >
> >>> > I don't think we have a source/sink for reading Hadoop sequence
> files. Your best bet currently will probably be to use FileSystem
> abstraction to create a file from a ParDo and read directly from there
> using a library that can read sequence files.
> >>> >
> >>> > Thanks,
> >>> > Cham
> >>> >
> >>> > On Mon, Jul 1, 2019 at 8:42 AM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
> >>> >>
> >>> >> I'm wanting to read a Sequence/Map file from Hadoop stored on
> Google Cloud Storage via a " gs://bucket/link/SequenceFile-* " via the
> Python SDK.
> >>> >>
> >>> >> I cannot locate any good adapters for this, and the one Hadoop
> Filesystem reader seems to only read from a "hdfs://" url.
> >>> >>
> >>> >> I'm wanting to use Dataflow and GCS exclusively to start mixing
> in Beam pipelines

Re: pickling typing types in Python 3.5+

2019-07-10 Thread Robert Bradshaw

I looked into CloudPickle a while back, and would be supportive of the change.

On Mon, Jul 1, 2019 at 11:06 PM Valentyn Tymofieiev  wrote:
>
> I have checked that cloudpickle (an alternative to dill) is able to pickle 
> and unpickle typing types on Python 3.5, 3.6, which seems to be a recent 
> change, see: 
> https://github.com/cloudpipe/cloudpickle/issues/63#issuecomment-501624383.
>
> I am evaluating cloudpickle as a potential avenue to address several other 
> issues we found in Beam while working on Python 3 support, such as:
>
> https://issues.apache.org/jira/browse/BEAM-6522
> https://issues.apache.org/jira/browse/BEAM-7284
> https://issues.apache.org/jira/browse/BEAM-5878?focusedCommentId=16834554
> https://github.com/uqfoundation/dill/issues/300
> https://issues.apache.org/jira/browse/BEAM-7540
>
> Once I have more information on cloudpickle vs dill in Beam, I'll bring it to 
> the mailing list.
>
> On Wed, May 15, 2019 at 5:25 AM Robert Bradshaw  wrote:
>>
>> (2) seems reasonable.
>>
>> On Tue, May 14, 2019 at 3:15 AM Udi Meiri  wrote:
>> >
>> > It seems like pickling of typing types is broken in 3.5 and 3.6, fixed in 
>> > 3.7:
>> > https://github.com/python/typing/issues/511
>> >
>> > Here are my attempts:
>> > https://gist.github.com/udim/ec213305ca865390c391001e8778e91d
>> >
>> >
>> > My ideas:
>> > 1. I know that we override type object handling in pickler.py 
>> > (_nested_type_wrapper), and perhaps this mechanism can be used to pickle 
>> > typing classes correctly. The question is how.
>> >
>> > 2. Exclude/stub out these classes when pickling a pipeline - they are only 
>> > used for verification during pipeline construction anyway. This could be a 
>> > temporary solution for versions 3.5 and 3.6.
>> >
>> > Any ideas / opinions?

Re: [VOTE] Vendored Dependencies Release

2019-07-10 Thread Jens Nyman

+1

On 2019/07/09 22:33:48, Lukasz Cwik  wrote:
> Please review the release of the following artifacts that we vendor:>
>  * beam-vendor-grpc_1_21_0>
>  * beam-vendor-guava-26_0-jre>
>
> Hi everyone,>
> Please review and vote on the release candidate #2 for the>
> org.apache.beam:beam-vendor-grpc_1_21_0:0.1 and>
> org.apache.beam:beam-vendor-guava-26_0-jre:0.1, as follows:>
> [ ] +1, Approve the release>
> [ ] -1, Do not approve the release (please provide specific comments)>
>
>
> The complete staging area is available for your review, which includes:>
> * the official Apache source release to be deployed to dist.apache.org
[1],>
> which is signed with the key with fingerprint>
> EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],>
> * all artifacts to be deployed to the Maven Central Repository [3],>
> * commit hash "b4efbb23cc5dec80b8bbd8745c62efecdadfa236" [4],>
>
> The vote will be open for at least 72 hours. It is adopted by majority>
> approval, with at least 3 PMC affirmative votes.>
>
> Thanks,>
> Luke>
>
> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/>
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS>
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1076/>

> [4]>
>
https://github.com/apache/beam/commit/b4efbb23cc5dec80b8bbd8745c62efecdadfa236>

>

Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-10 Thread Robert Bradshaw

On Wed, Jul 10, 2019 at 5:06 AM Kenneth Knowles  wrote:
>
> My opinion: what is important is that we have a policy for what goes into the 
> master commit history. This is very simple IMO: each commit should clearly do 
> something that it states, and a commit should do just one thing.

Exactly how I feel as well.

> Personally, I don't feel a need to set a rule for who does the squashing (or 
> non-squashing) or other actions necessary to maintain a clear history.

I think it is on the person who does the merge to make sure the
history is clean.

> In PRs I review the question of who should squash has never come up as an 
> issue. Most PRs are either a bunch of random commits obviously meant for 
> squash, or carefully managed commits with good messages using the 
> git-supported "fixup!" syntax or clear "fixup:" commit messages. It is a 
> polarizing issue, which is a good thing in this case as it makes it very 
> clear how to merge.

This is my experience too.

Unfortunately, GitHub only offers squash-everything vs. squash-nothing
via the UI.

> Your original concern was authors force pushing during review making it hard 
> to review. For your point "3. After a force-push, comments made by reviewers 
> on earlier commit are hard to find." I thought GitHub had fixed that. These 
> comments used to vanish entirely, but now they are still on the main PR page 
> IIRC. If it is not fixed, it would make sense to add this to the contribution 
> guide, and even to the PR template.

I find it harder to review when authors force-push as well. When
commits are separate, the reviewer can choose what subset of changes
(including all of them, or just since the last review) to inspect, but
when they're squashed this ability is lost (and comments, though now
not dropped, don't tie back to code). I would be in favor of a
recommendation to not force-pushing *reviewed* commits during review.

I think this also requires committers to make a conscious effort when
merging (which is not being consistently done now). Something like a
simple github hook that advises (based on the commit messages) which
would be best could go a long way here.


> On Tue, Jul 9, 2019 at 2:18 PM Valentyn Tymofieiev  
> wrote:
>>
>> Ok, I think if authors mark fixup commits with "fixup" prefix and committers 
>> routinely fixup commits before the merge without asking the contributors to 
>> do so, the authors should not have a particular reason to fixup/squash + 
>> force-push all changes into one commit after addressing review comments. 
>> This will make the review easier, however committers will have to take 
>> responsibility for merging fixup commits.
>>
>> Currently both committer guide[1] and contributor guide[2] assume that it is 
>> the author's responsibility to merge fixup commit.
>>
>>> The reviewer should give the LGTM and then request that the author of the 
>>> pull request rebase, squash, split, etc, the commit
>>
>>
>>> "After review is complete and the PR accepted, multiple commits should be 
>>> squashed (see Git workflow tips)".
>>
>>
>> Should we explicitly make squashing review-related commits a responsibility 
>> of committers?
>>
>> [1] https://beam.apache.org/contribute/committer-guide
>> [2] https://beam.apache.org/contribute/
>>
>>
>> On Tue, Jul 9, 2019 at 12:22 PM Rui Wang  wrote:
>>>
>>> "allow maintainers to edit" by default is enabled. Then the proposed 
>>> workflow looks reasonable to me now.
>>>
>>>
>>> -Rui
>>>
>>> On Tue, Jul 9, 2019 at 11:26 AM Kenneth Knowles  wrote:

 If you "allow maintainers to edit" the PR, it is easy for any committer to 
 fix up the commits and merge. They should not have to ask you to do it, 
 unless it is not obvious what to do.

 Kenn

 On Tue, Jul 9, 2019 at 11:05 AM Rui Wang  wrote:
>
> At least for me, because I usually don't know when PR review is done, in 
> order to make PR to be merged into Beam repo faster, I keep squashing 
> commits every time so that committers can review and then merge at a 
> time, otherwise committers could approve a PR but then ask squashing 
> commits, which leads to another ping and wait round.
>
> Thus I prefer committers do squash and merge, which will reduce PR 
> authors' load during PR review process.
>
>
> -Rui
>
>
> On Mon, Jul 8, 2019 at 5:44 PM Valentyn Tymofieiev  
> wrote:
>>
>> Rui, committer guide[1] does say that all commits are standalone changes:
>>
>>> We prefer small independent, incremental PRs with descriptive, isolated 
>>> commits. Each commit is a single clear change.
>>
>>
>> However in my opinion, this recommendation applies to moments when a PR 
>> is first sent for review, and when a PR is being merged. Committer guide 
>> also mentions that during review iterations authors may add 
>> review-related commits.
>>
>>> the pull request may have a collection of review-related commits

Re: Phrase triggering jobs problem

Re: Phrase triggering jobs problem

Re: [VOTE] Vendored Dependencies Release

New Design Doc for Cost Based Optimization

Re: Python Utilities

Fwd: [NOTICE] Jenkins GHPRB deprecated, please switch :)

Re: [VOTE] Vendored Dependencies Release

Re: Hazelcast Jet Runner

Re: Phrase triggering jobs problem

Re: Hazelcast Jet Runner

Re: [VOTE] Vendored Dependencies Release

Re: Beam/Samza Ensuring At Least Once semantics

[Discuss] Retractions in Beam

Re: Phrase triggering jobs problem

Phrase triggering jobs problem

Re: Python Utilities

Re: Python Utilities

Re: [Python] Read Hadoop Sequence File?

Re: pickling typing types in Python 3.5+

Re: [VOTE] Vendored Dependencies Release

Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

21 matches

Site Navigation

Mail list logo

Footer information