Re: [VOTE] Vendored Dependencies Release

2019-09-04 Thread Rui Wang
Thanks Pablo for jumping in for help.

Now the sources are moved to [1]. Please let me know if it is ok.

[1]: https://dist.apache.org/repos/dist/release/beam/vendor/calcite/1_20_0/

-Rui

On Wed, Sep 4, 2019 at 4:15 PM Pablo Estrada  wrote:

> I can help.
>
> On Wed, Sep 4, 2019 at 1:09 PM Rui Wang  wrote:
>
>> There is a step of releasing requires PMC permission:
>>
>> """
>>
>> Copy the source release from the dev repository to the release
>> repository at dist.apache.org using Subversion.
>> Move last release artifacts from dist.apache.org to archive.apache.org
>> using Subversion. """
>>
>> Is there a PMC member could help on this operation to move [1] to
>> "release" repo?
>>
>> [1]: [1]
>> https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0
>>
>> -Rui
>>
>> On Wed, Sep 4, 2019 at 10:16 AM Rui Wang  wrote:
>>
>>> I'm happy to announce that we have unanimously approved this release.
>>>
>>> There are 5 approving votes, 3 of which are binding:
>>>
>>> * Lukasz Cwik
>>>
>>> * Kenneth Knowles
>>>
>>> * Ahmet Altay
>>>
>>> There are no disapproving votes.
>>>
>>> Thanks everyone!
>>>
>>> On Tue, Sep 3, 2019 at 1:29 PM Lukasz Cwik  wrote:
>>>
 +1

 On Tue, Sep 3, 2019 at 1:22 PM Kenneth Knowles  wrote:

> +1
>
> On Tue, Sep 3, 2019 at 11:00 AM Ahmet Altay  wrote:
>
>> +1
>>
>> On Tue, Sep 3, 2019 at 10:52 AM Andrew Pilloud 
>> wrote:
>>
>>> +1
>>>
>>> Inspected the jar it looked reasonable.
>>>
>>> Andrew
>>>
>>> On Tue, Sep 3, 2019 at 9:06 AM Rui Wang  wrote:
>>>
 Friendly ping.


 -Rui

 On Thu, Aug 29, 2019 at 9:50 AM Rui Wang  wrote:

> Thanks Kai and Andrew. Now prgapachebeam-1083 is publicly exposed.
>
> I also found a useful link[1] to explain staging repos in Apache
> Nexus
>
>
> [1]:
> https://help.sonatype.com/repomanager2/staging-releases/managing-staging-repositories#ManagingStagingRepositories-ClosinganOpenRepository
>
> -Rui
>
> On Wed, Aug 28, 2019 at 9:19 PM Andrew Pilloud <
> apill...@google.com> wrote:
>
>> You need to close the release for it to be published to the
>> staging server. I can help if you still have questions.
>>
>> Andrew
>>
>> On Wed, Aug 28, 2019, 8:48 PM Rui Wang  wrote:
>>
>>> I can see prgapachebeam-1083 is in open status in staging
>>> repository. I am not sure why it is not public exposed. I probably 
>>> need
>>> some guidance on it.
>>>
>>>
>>> -Rui
>>>
>>> On Wed, Aug 28, 2019 at 3:50 PM Kai Jiang 
>>> wrote:
>>>
 Hi Rui,

 For accessing artifacts [1] in Maven Central Repository, is
 this intent to be not public exposed?

 Best,
 Kai

 [1]
 https://repository.apache.org/content/repositories/orgapachebeam-1083/

 On Wed, Aug 28, 2019 at 11:57 AM Kai Jiang 
 wrote:

> +1 (non-binding)Thanks Rui!
>
> On Tue, Aug 27, 2019 at 10:46 PM Rui Wang 
> wrote:
>
>> Please review the release of the following artifacts that we
>> vendor:
>>
>>  * beam-vendor-calcite-1_20_0
>>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the
>> org.apache.beam:beam-vendor-calcite-1_20_0:0.1, as follows:
>>
>> [ ] +1, Approve the release
>>
>> [ ] -1, Do not approve the release (please provide specific
>> comments)
>>
>>
>> The complete staging area is available for your review, which
>> includes:
>>
>> * the official Apache source release to be deployed to
>> dist.apache.org [1], which is signed with the key with
>> fingerprint 0D7BE1A252DBCEE89F6491BBDFA64862B703F5C8 [2],
>>
>> * all artifacts to be deployed to the Maven Central
>> Repository [3],
>>
>> * commit hash "664e25019fc1977e7041e4b834e8d9628b912473" [4],
>>
>> The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>>
>> Rui
>>
>> [1]
>> https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0
>>
>> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>>
>> [3]
>> https://repository.apac

Re: [VOTE] Vendored Dependencies Release

2019-09-04 Thread Pablo Estrada
I can help.

On Wed, Sep 4, 2019 at 1:09 PM Rui Wang  wrote:

> There is a step of releasing requires PMC permission:
>
> """
>
> Copy the source release from the dev repository to the release repository
> at dist.apache.org using Subversion.
> Move last release artifacts from dist.apache.org to archive.apache.org
> using Subversion. """
>
> Is there a PMC member could help on this operation to move [1] to
> "release" repo?
>
> [1]: [1] https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0
>
> -Rui
>
> On Wed, Sep 4, 2019 at 10:16 AM Rui Wang  wrote:
>
>> I'm happy to announce that we have unanimously approved this release.
>>
>> There are 5 approving votes, 3 of which are binding:
>>
>> * Lukasz Cwik
>>
>> * Kenneth Knowles
>>
>> * Ahmet Altay
>>
>> There are no disapproving votes.
>>
>> Thanks everyone!
>>
>> On Tue, Sep 3, 2019 at 1:29 PM Lukasz Cwik  wrote:
>>
>>> +1
>>>
>>> On Tue, Sep 3, 2019 at 1:22 PM Kenneth Knowles  wrote:
>>>
 +1

 On Tue, Sep 3, 2019 at 11:00 AM Ahmet Altay  wrote:

> +1
>
> On Tue, Sep 3, 2019 at 10:52 AM Andrew Pilloud 
> wrote:
>
>> +1
>>
>> Inspected the jar it looked reasonable.
>>
>> Andrew
>>
>> On Tue, Sep 3, 2019 at 9:06 AM Rui Wang  wrote:
>>
>>> Friendly ping.
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Aug 29, 2019 at 9:50 AM Rui Wang  wrote:
>>>
 Thanks Kai and Andrew. Now prgapachebeam-1083 is publicly exposed.

 I also found a useful link[1] to explain staging repos in Apache
 Nexus


 [1]:
 https://help.sonatype.com/repomanager2/staging-releases/managing-staging-repositories#ManagingStagingRepositories-ClosinganOpenRepository

 -Rui

 On Wed, Aug 28, 2019 at 9:19 PM Andrew Pilloud 
 wrote:

> You need to close the release for it to be published to the
> staging server. I can help if you still have questions.
>
> Andrew
>
> On Wed, Aug 28, 2019, 8:48 PM Rui Wang  wrote:
>
>> I can see prgapachebeam-1083 is in open status in staging
>> repository. I am not sure why it is not public exposed. I probably 
>> need
>> some guidance on it.
>>
>>
>> -Rui
>>
>> On Wed, Aug 28, 2019 at 3:50 PM Kai Jiang 
>> wrote:
>>
>>> Hi Rui,
>>>
>>> For accessing artifacts [1] in Maven Central Repository, is this
>>> intent to be not public exposed?
>>>
>>> Best,
>>> Kai
>>>
>>> [1]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1083/
>>>
>>> On Wed, Aug 28, 2019 at 11:57 AM Kai Jiang 
>>> wrote:
>>>
 +1 (non-binding)Thanks Rui!

 On Tue, Aug 27, 2019 at 10:46 PM Rui Wang 
 wrote:

> Please review the release of the following artifacts that we
> vendor:
>
>  * beam-vendor-calcite-1_20_0
>
> Hi everyone,
>
> Please review and vote on the release candidate #1 for the
> org.apache.beam:beam-vendor-calcite-1_20_0:0.1, as follows:
>
> [ ] +1, Approve the release
>
> [ ] -1, Do not approve the release (please provide specific
> comments)
>
>
> The complete staging area is available for your review, which
> includes:
>
> * the official Apache source release to be deployed to
> dist.apache.org [1], which is signed with the key with
> fingerprint 0D7BE1A252DBCEE89F6491BBDFA64862B703F5C8 [2],
>
> * all artifacts to be deployed to the Maven Central Repository
> [3],
>
> * commit hash "664e25019fc1977e7041e4b834e8d9628b912473" [4],
>
> The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
>
> Rui
>
> [1]
> https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0
>
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>
> [3]
> https://repository.apache.org/content/repositories/orgapachebeam-1083/
>
> [4]
> https://github.com/apache/beam/commit/664e25019fc1977e7041e4b834e8d9628b912473
>
>


Re: Improve container support

2019-09-04 Thread Thomas Weise
This will greatly simplify trying out portable runners:
https://beam.apache.org/documentation/runners/flink/#executing-a-beam-pipeline-on-a-flink-cluster

Can't wait for following to disappear from the instructions page: ./gradlew
:sdks:python:container:docker

On Wed, Sep 4, 2019 at 3:35 PM Thomas Weise  wrote:

> Awesome, thank you!
>
>
> On Wed, Sep 4, 2019 at 3:22 PM Hannah Jiang 
> wrote:
>
>> Hi Thomas
>>
>> I created snapshot images from head as of around 2PM today.
>> You can pull images from gcr.io/apache-beam-testing/beam/sdks/snapshot.
>>
>> Thanks,
>> Hannah
>>
>> On Wed, Sep 4, 2019 at 1:41 PM Thomas Weise  wrote:
>>
>>> Hi Hannah,
>>>
>>> Thank you, I know how to build the containers locally, but not how to
>>> publish them!
>>>
>>> The cwiki says "Publishing images to gcr.io/beam requires permissions
>>> in apache-beam-testing project."
>>>
>>> Can I get access to the testing project (at least temporarily) and what
>>> would I need to setup to run the publish target that is shown on cwiki?
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Wed, Sep 4, 2019 at 11:06 AM Hannah Jiang 
>>> wrote:
>>>
 Hi Thomas

 I haven't uploaded any snapshot images yet. Here is how you can create
 one from head.
 > cd [...]/beam/
 # For Python
 > ./gradlew :sdks:python:container:py{version}:docker *where version
 is {2,35,36,37}*
 # For Java
 > ./gradlew -p sdks/java/container docker
 # For Go
 > ./gradlew -p sdks/go/container docker

 The 2.15 one is just for testing, not a real 2.15.0, nor a snapshot
 from head.

 Please let me know if you have any questions.
 Hannah

 On Wed, Sep 4, 2019 at 10:57 AM Thomas Weise  wrote:

> I actually found something in [1], but it is 2.15 unfortunately.
>
> [1]
> https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release/python2.7?gcrImageListsize=30
>
> On Wed, Sep 4, 2019 at 10:35 AM Thomas Weise  wrote:
>
>> Thanks for working on this. Do you happen to have publicly accessible
>> snapshots published for your testing currently (even when the final
>> location isn't sorted out)?
>>
>> I would like to use a 2.16 based Python SDK image for working on my
>> downstream project, but could not find anything in
>> gcr.io/apache-beam-testing/beam/sdks/rc/snapshot
>>
>> Thanks,
>> Thomas
>>
>> On Fri, Aug 30, 2019 at 10:56 AM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
>>> wrote:
>>>
 Hi team

 I am working on improving docker container support for Beam. We
 would like to publish prebuilt containers for each release version and
 daily snapshot. Current work focuses on release images only and it 
 would be
 part of the release process.

 The release images will be pushed to GCR which is publicly
 accessible(pullable). We will use the following locations.
 *Repository*: gcr.io/beam
 *Project*: apache-beam-testing
 More details, including naming and tagging scheme, can be found at
 wiki
 
  which
 is written by several contributors.

 I would like to discuss these two questions.
 *1. How many tests do we need to run before pushing images to gcr*?
 Publishing artifacts is the last step of the release process, so at
 this moment, we already verified all codebase. In addition, many 
 Jenkins
 tests use containers, so it is already verified several times. Do we 
 need
 to run it again?

>>>
>>> In a docker repository, one container image can have multiple tags.
>>> One possibility is that  on the last step of the release process, after
>>> sufficient testing,  we place a production tag on an image that was 
>>> already
>>> pushed with a dev tag.
>>>
>>> For example a dev tag may look like:
>>> gcr.io/apache-beam/python37:2.16.0-RC4, and production tag may look
>>> like:
>>> gcr.io/apache-beam/python37:2.16.0 and both will refer to the same
>>> image at the end.
>>>
>>> We should also plan what the process of updating the container image
>>> will look like, if we need to release the image with additional changes,
>>> and how we will test these changes before the final push (or placing
>>> production tag).
>>>
>>>

 *2. How many tests do we need to run to validate pushed images?*
 When we push the images, we assume the images would work and pass
 all the tests. After pushing, we should confirm the images are 
 pullable and
 useable. I suggest we run several tests on dataflow with each pushed 

Re: Improve container support

2019-09-04 Thread Thomas Weise
Awesome, thank you!


On Wed, Sep 4, 2019 at 3:22 PM Hannah Jiang  wrote:

> Hi Thomas
>
> I created snapshot images from head as of around 2PM today.
> You can pull images from gcr.io/apache-beam-testing/beam/sdks/snapshot.
>
> Thanks,
> Hannah
>
> On Wed, Sep 4, 2019 at 1:41 PM Thomas Weise  wrote:
>
>> Hi Hannah,
>>
>> Thank you, I know how to build the containers locally, but not how to
>> publish them!
>>
>> The cwiki says "Publishing images to gcr.io/beam requires permissions in
>> apache-beam-testing project."
>>
>> Can I get access to the testing project (at least temporarily) and what
>> would I need to setup to run the publish target that is shown on cwiki?
>>
>> Thanks,
>> Thomas
>>
>>
>> On Wed, Sep 4, 2019 at 11:06 AM Hannah Jiang 
>> wrote:
>>
>>> Hi Thomas
>>>
>>> I haven't uploaded any snapshot images yet. Here is how you can create
>>> one from head.
>>> > cd [...]/beam/
>>> # For Python
>>> > ./gradlew :sdks:python:container:py{version}:docker *where version is
>>> {2,35,36,37}*
>>> # For Java
>>> > ./gradlew -p sdks/java/container docker
>>> # For Go
>>> > ./gradlew -p sdks/go/container docker
>>>
>>> The 2.15 one is just for testing, not a real 2.15.0, nor a snapshot from
>>> head.
>>>
>>> Please let me know if you have any questions.
>>> Hannah
>>>
>>> On Wed, Sep 4, 2019 at 10:57 AM Thomas Weise  wrote:
>>>
 I actually found something in [1], but it is 2.15 unfortunately.

 [1]
 https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release/python2.7?gcrImageListsize=30

 On Wed, Sep 4, 2019 at 10:35 AM Thomas Weise  wrote:

> Thanks for working on this. Do you happen to have publicly accessible
> snapshots published for your testing currently (even when the final
> location isn't sorted out)?
>
> I would like to use a 2.16 based Python SDK image for working on my
> downstream project, but could not find anything in
> gcr.io/apache-beam-testing/beam/sdks/rc/snapshot
>
> Thanks,
> Thomas
>
> On Fri, Aug 30, 2019 at 10:56 AM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
>> wrote:
>>
>>> Hi team
>>>
>>> I am working on improving docker container support for Beam. We
>>> would like to publish prebuilt containers for each release version and
>>> daily snapshot. Current work focuses on release images only and it 
>>> would be
>>> part of the release process.
>>>
>>> The release images will be pushed to GCR which is publicly
>>> accessible(pullable). We will use the following locations.
>>> *Repository*: gcr.io/beam
>>> *Project*: apache-beam-testing
>>> More details, including naming and tagging scheme, can be found at
>>> wiki
>>> 
>>>  which
>>> is written by several contributors.
>>>
>>> I would like to discuss these two questions.
>>> *1. How many tests do we need to run before pushing images to gcr*?
>>> Publishing artifacts is the last step of the release process, so at
>>> this moment, we already verified all codebase. In addition, many Jenkins
>>> tests use containers, so it is already verified several times. Do we 
>>> need
>>> to run it again?
>>>
>>
>> In a docker repository, one container image can have multiple tags.
>> One possibility is that  on the last step of the release process, after
>> sufficient testing,  we place a production tag on an image that was 
>> already
>> pushed with a dev tag.
>>
>> For example a dev tag may look like:
>> gcr.io/apache-beam/python37:2.16.0-RC4, and production tag may look
>> like:
>> gcr.io/apache-beam/python37:2.16.0 and both will refer to the same
>> image at the end.
>>
>> We should also plan what the process of updating the container image
>> will look like, if we need to release the image with additional changes,
>> and how we will test these changes before the final push (or placing
>> production tag).
>>
>>
>>>
>>> *2. How many tests do we need to run to validate pushed images?*
>>> When we push the images, we assume the images would work and pass
>>> all the tests. After pushing, we should confirm the images are pullable 
>>> and
>>> useable. I suggest we run several tests on dataflow with each pushed 
>>> image.
>>> What do you think?
>>>
>>
>> I think it makes sense to do -  Beam runners that use SDK container
>> images should have some continuously running tests, which periodically
>> check that all supported images  are pullable and still compatible with 
>> the
>> runner.
>>
>> This work can be refined later as we explore more during our release
>>> process.
>>> Please comment or edit the wiki page o

Re: Improve container support

2019-09-04 Thread Hannah Jiang
Hi Thomas

I created snapshot images from head as of around 2PM today.
You can pull images from gcr.io/apache-beam-testing/beam/sdks/snapshot.

Thanks,
Hannah

On Wed, Sep 4, 2019 at 1:41 PM Thomas Weise  wrote:

> Hi Hannah,
>
> Thank you, I know how to build the containers locally, but not how to
> publish them!
>
> The cwiki says "Publishing images to gcr.io/beam requires permissions in
> apache-beam-testing project."
>
> Can I get access to the testing project (at least temporarily) and what
> would I need to setup to run the publish target that is shown on cwiki?
>
> Thanks,
> Thomas
>
>
> On Wed, Sep 4, 2019 at 11:06 AM Hannah Jiang 
> wrote:
>
>> Hi Thomas
>>
>> I haven't uploaded any snapshot images yet. Here is how you can create
>> one from head.
>> > cd [...]/beam/
>> # For Python
>> > ./gradlew :sdks:python:container:py{version}:docker *where version is
>> {2,35,36,37}*
>> # For Java
>> > ./gradlew -p sdks/java/container docker
>> # For Go
>> > ./gradlew -p sdks/go/container docker
>>
>> The 2.15 one is just for testing, not a real 2.15.0, nor a snapshot from
>> head.
>>
>> Please let me know if you have any questions.
>> Hannah
>>
>> On Wed, Sep 4, 2019 at 10:57 AM Thomas Weise  wrote:
>>
>>> I actually found something in [1], but it is 2.15 unfortunately.
>>>
>>> [1]
>>> https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release/python2.7?gcrImageListsize=30
>>>
>>> On Wed, Sep 4, 2019 at 10:35 AM Thomas Weise  wrote:
>>>
 Thanks for working on this. Do you happen to have publicly accessible
 snapshots published for your testing currently (even when the final
 location isn't sorted out)?

 I would like to use a 2.16 based Python SDK image for working on my
 downstream project, but could not find anything in
 gcr.io/apache-beam-testing/beam/sdks/rc/snapshot

 Thanks,
 Thomas

 On Fri, Aug 30, 2019 at 10:56 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
> wrote:
>
>> Hi team
>>
>> I am working on improving docker container support for Beam. We would
>> like to publish prebuilt containers for each release version and daily
>> snapshot. Current work focuses on release images only and it would be 
>> part
>> of the release process.
>>
>> The release images will be pushed to GCR which is publicly
>> accessible(pullable). We will use the following locations.
>> *Repository*: gcr.io/beam
>> *Project*: apache-beam-testing
>> More details, including naming and tagging scheme, can be found at
>> wiki
>> 
>>  which
>> is written by several contributors.
>>
>> I would like to discuss these two questions.
>> *1. How many tests do we need to run before pushing images to gcr*?
>> Publishing artifacts is the last step of the release process, so at
>> this moment, we already verified all codebase. In addition, many Jenkins
>> tests use containers, so it is already verified several times. Do we need
>> to run it again?
>>
>
> In a docker repository, one container image can have multiple tags.
> One possibility is that  on the last step of the release process, after
> sufficient testing,  we place a production tag on an image that was 
> already
> pushed with a dev tag.
>
> For example a dev tag may look like:
> gcr.io/apache-beam/python37:2.16.0-RC4, and production tag may look
> like:
> gcr.io/apache-beam/python37:2.16.0 and both will refer to the same
> image at the end.
>
> We should also plan what the process of updating the container image
> will look like, if we need to release the image with additional changes,
> and how we will test these changes before the final push (or placing
> production tag).
>
>
>>
>> *2. How many tests do we need to run to validate pushed images?*
>> When we push the images, we assume the images would work and pass all
>> the tests. After pushing, we should confirm the images are pullable and
>> useable. I suggest we run several tests on dataflow with each pushed 
>> image.
>> What do you think?
>>
>
> I think it makes sense to do -  Beam runners that use SDK container
> images should have some continuously running tests, which periodically
> check that all supported images  are pullable and still compatible with 
> the
> runner.
>
> This work can be refined later as we explore more during our release
>> process.
>> Please comment or edit the wiki page or reply to this email with your
>> opinions.
>>
>> Thanks,
>> Hannah
>>
>


Re: Improve container support

2019-09-04 Thread Thomas Weise
Hi Hannah,

Thank you, I know how to build the containers locally, but not how to
publish them!

The cwiki says "Publishing images to gcr.io/beam requires permissions in
apache-beam-testing project."

Can I get access to the testing project (at least temporarily) and what
would I need to setup to run the publish target that is shown on cwiki?

Thanks,
Thomas


On Wed, Sep 4, 2019 at 11:06 AM Hannah Jiang  wrote:

> Hi Thomas
>
> I haven't uploaded any snapshot images yet. Here is how you can create one
> from head.
> > cd [...]/beam/
> # For Python
> > ./gradlew :sdks:python:container:py{version}:docker *where version is
> {2,35,36,37}*
> # For Java
> > ./gradlew -p sdks/java/container docker
> # For Go
> > ./gradlew -p sdks/go/container docker
>
> The 2.15 one is just for testing, not a real 2.15.0, nor a snapshot from
> head.
>
> Please let me know if you have any questions.
> Hannah
>
> On Wed, Sep 4, 2019 at 10:57 AM Thomas Weise  wrote:
>
>> I actually found something in [1], but it is 2.15 unfortunately.
>>
>> [1]
>> https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release/python2.7?gcrImageListsize=30
>>
>> On Wed, Sep 4, 2019 at 10:35 AM Thomas Weise  wrote:
>>
>>> Thanks for working on this. Do you happen to have publicly accessible
>>> snapshots published for your testing currently (even when the final
>>> location isn't sorted out)?
>>>
>>> I would like to use a 2.16 based Python SDK image for working on my
>>> downstream project, but could not find anything in
>>> gcr.io/apache-beam-testing/beam/sdks/rc/snapshot
>>>
>>> Thanks,
>>> Thomas
>>>
>>> On Fri, Aug 30, 2019 at 10:56 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
 wrote:

> Hi team
>
> I am working on improving docker container support for Beam. We would
> like to publish prebuilt containers for each release version and daily
> snapshot. Current work focuses on release images only and it would be part
> of the release process.
>
> The release images will be pushed to GCR which is publicly
> accessible(pullable). We will use the following locations.
> *Repository*: gcr.io/beam
> *Project*: apache-beam-testing
> More details, including naming and tagging scheme, can be found at
> wiki
> 
>  which
> is written by several contributors.
>
> I would like to discuss these two questions.
> *1. How many tests do we need to run before pushing images to gcr*?
> Publishing artifacts is the last step of the release process, so at
> this moment, we already verified all codebase. In addition, many Jenkins
> tests use containers, so it is already verified several times. Do we need
> to run it again?
>

 In a docker repository, one container image can have multiple tags. One
 possibility is that  on the last step of the release process, after
 sufficient testing,  we place a production tag on an image that was already
 pushed with a dev tag.

 For example a dev tag may look like:
 gcr.io/apache-beam/python37:2.16.0-RC4, and production tag may look
 like:
 gcr.io/apache-beam/python37:2.16.0 and both will refer to the same
 image at the end.

 We should also plan what the process of updating the container image
 will look like, if we need to release the image with additional changes,
 and how we will test these changes before the final push (or placing
 production tag).


>
> *2. How many tests do we need to run to validate pushed images?*
> When we push the images, we assume the images would work and pass all
> the tests. After pushing, we should confirm the images are pullable and
> useable. I suggest we run several tests on dataflow with each pushed 
> image.
> What do you think?
>

 I think it makes sense to do -  Beam runners that use SDK container
 images should have some continuously running tests, which periodically
 check that all supported images  are pullable and still compatible with the
 runner.

 This work can be refined later as we explore more during our release
> process.
> Please comment or edit the wiki page or reply to this email with your
> opinions.
>
> Thanks,
> Hannah
>



Re: [VOTE] Vendored Dependencies Release

2019-09-04 Thread Rui Wang
There is a step of releasing requires PMC permission:

"""

Copy the source release from the dev repository to the release repository
at dist.apache.org using Subversion.
Move last release artifacts from dist.apache.org to archive.apache.org
using Subversion. """

Is there a PMC member could help on this operation to move [1] to "release"
repo?

[1]: [1] https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0

-Rui

On Wed, Sep 4, 2019 at 10:16 AM Rui Wang  wrote:

> I'm happy to announce that we have unanimously approved this release.
>
> There are 5 approving votes, 3 of which are binding:
>
> * Lukasz Cwik
>
> * Kenneth Knowles
>
> * Ahmet Altay
>
> There are no disapproving votes.
>
> Thanks everyone!
>
> On Tue, Sep 3, 2019 at 1:29 PM Lukasz Cwik  wrote:
>
>> +1
>>
>> On Tue, Sep 3, 2019 at 1:22 PM Kenneth Knowles  wrote:
>>
>>> +1
>>>
>>> On Tue, Sep 3, 2019 at 11:00 AM Ahmet Altay  wrote:
>>>
 +1

 On Tue, Sep 3, 2019 at 10:52 AM Andrew Pilloud 
 wrote:

> +1
>
> Inspected the jar it looked reasonable.
>
> Andrew
>
> On Tue, Sep 3, 2019 at 9:06 AM Rui Wang  wrote:
>
>> Friendly ping.
>>
>>
>> -Rui
>>
>> On Thu, Aug 29, 2019 at 9:50 AM Rui Wang  wrote:
>>
>>> Thanks Kai and Andrew. Now prgapachebeam-1083 is publicly exposed.
>>>
>>> I also found a useful link[1] to explain staging repos in Apache
>>> Nexus
>>>
>>>
>>> [1]:
>>> https://help.sonatype.com/repomanager2/staging-releases/managing-staging-repositories#ManagingStagingRepositories-ClosinganOpenRepository
>>>
>>> -Rui
>>>
>>> On Wed, Aug 28, 2019 at 9:19 PM Andrew Pilloud 
>>> wrote:
>>>
 You need to close the release for it to be published to the staging
 server. I can help if you still have questions.

 Andrew

 On Wed, Aug 28, 2019, 8:48 PM Rui Wang  wrote:

> I can see prgapachebeam-1083 is in open status in staging
> repository. I am not sure why it is not public exposed. I probably 
> need
> some guidance on it.
>
>
> -Rui
>
> On Wed, Aug 28, 2019 at 3:50 PM Kai Jiang 
> wrote:
>
>> Hi Rui,
>>
>> For accessing artifacts [1] in Maven Central Repository, is this
>> intent to be not public exposed?
>>
>> Best,
>> Kai
>>
>> [1]
>> https://repository.apache.org/content/repositories/orgapachebeam-1083/
>>
>> On Wed, Aug 28, 2019 at 11:57 AM Kai Jiang 
>> wrote:
>>
>>> +1 (non-binding)Thanks Rui!
>>>
>>> On Tue, Aug 27, 2019 at 10:46 PM Rui Wang 
>>> wrote:
>>>
 Please review the release of the following artifacts that we
 vendor:

  * beam-vendor-calcite-1_20_0

 Hi everyone,

 Please review and vote on the release candidate #1 for the
 org.apache.beam:beam-vendor-calcite-1_20_0:0.1, as follows:

 [ ] +1, Approve the release

 [ ] -1, Do not approve the release (please provide specific
 comments)


 The complete staging area is available for your review, which
 includes:

 * the official Apache source release to be deployed to
 dist.apache.org [1], which is signed with the key with
 fingerprint 0D7BE1A252DBCEE89F6491BBDFA64862B703F5C8 [2],

 * all artifacts to be deployed to the Maven Central Repository
 [3],

 * commit hash "664e25019fc1977e7041e4b834e8d9628b912473" [4],

 The vote will be open for at least 72 hours. It is adopted by
 majority approval, with at least 3 PMC affirmative votes.

 Thanks,

 Rui

 [1]
 https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0

 [2] https://dist.apache.org/repos/dist/release/beam/KEYS

 [3]
 https://repository.apache.org/content/repositories/orgapachebeam-1083/

 [4]
 https://github.com/apache/beam/commit/664e25019fc1977e7041e4b834e8d9628b912473




Re: Beam at Google Summer of Code 2019

2019-09-04 Thread Aizhamal Nurmamat kyzy
Thank you Tanay! I hope you had a truly wonderful GSoC experience, and will
stay involved in the community for years to come :)

On Wed, Sep 4, 2019 at 10:42 AM Ahmet Altay  wrote:

> Thank you Tanay for all your contributions during summer and looking
> forward to more of it :)
>
> On Wed, Sep 4, 2019 at 10:38 AM Tanay Tummalapalli 
> wrote:
>
>> Hi everyone,
>>
>> I've completed Google Summer of Code '19[1].
>> I had fun working on Beam for the past 3 months and learning about Beam
>> internals.
>>
>> Thank you Pablo for everything! None of it would have been possible
>> without you.
>> I'd also like to thank the Beam community for the code reviews and being
>> supportive and encouraging.
>>
>> I'm moving to Bangalore this month. I'll be back to contributing to Beam
>> next month.
>>
>> Thank You
>>  - Tanay
>>
>> [1] https://gist.github.com/ttanay/80f84b7b852e0867d5a00d3b345e1dad
>>
>> On Fri, May 24, 2019 at 12:47 AM Tanay Tummalapalli 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I made a Kanban board[1] on Github, on my fork of apache/beam to keep
>>> track of progress for GSoC '19.
>>>
>>> Regards,
>>> Tanay Tummalapalli
>>>
>>> [1] https://github.com/ttanay/beam/projects/1
>>>
>>> On Tue, May 7, 2019 at 6:39 PM Tanay Tummalapalli 
>>> wrote:
>>>
 Thank You!

 I'm really excited to work on Beam!
 I'd like to thank Pablo, Chamikara Jayalath and Tim Robertson for
 helping out with my proposal[1].

 Looking forward to working with everyone and learning a great deal.

 Regards
 Tanay Tummalapalli
 LinkedIn  | Github
 

 [1]
 https://docs.google.com/document/d/15Peyd3Z_wu5rvGWw8lMLpZuTyyreM_JOAEFFWvF97YY/edit?usp=sharing

 On Tue, May 7, 2019 at 12:04 AM Pablo Estrada 
 wrote:

> Hello all,
> it is my pleasure to share with everyone that Tanay Tummalapalli has
> been accepted as a GSoC student with Beam, to implement support for File
> Loads into BigQuery for streaming pipelines[1].
>
> Tanay wrote a very strong proposal, and showed understanding of the
> tricky streaming considerations that will play out in this project.
>
> I speak on behalf of everyone welcoming you Tanay, and we'll be happy
> to see your contributions to Beam. : )
> Best
> -P.
>
> [1]
> https://summerofcode.withgoogle.com/projects/?sp-search=Tanay#4999837794172928
>



Re: [discuss] Auto-close issues with a PR associated?

2019-09-04 Thread Tanay Tummalapalli
Hey Pablo,

I think we should add the workflow to the contributor's guide[1] since most
new contributors start from there.

Best,
- Tanay

[1] https://beam.apache.org/contribute/

On Wed, Sep 4, 2019 at 11:03 PM Pablo Estrada  wrote:

> Hello all,
> this has been discussed before, and I believe we concluded that we did not
> want to auto-close JIRA issues referenced by a PR.
> I wanted us to revisit this decision, because I don't think everyone has
> adopted the correct workflow:
>
> Workflow A:
> 1. Send a PR to fix BEAM-X
> 2. Go to JIRA, and mark BEAM-X as resolved.
>
> Many of us do (1), but not (2) as much. The main argument was to avoid
> messing with this workflow:
>
> Workflow B:
> 1. Send a PR related to BEAM-X
> 2. Send a PR related to BEAM-X
> 
> N-1. Send a PR that fully fixes BEAM-X
> N. Go to JIRA, and mark BEAM-X as resolved.
>
> But, perhaps we should optimize for the very common case of Workflow A,
> and let community members manually manage the less common case of Workflow
> B.
>
> What do others think?
> -P.
>


Re: Improve container support

2019-09-04 Thread Hannah Jiang
Hi Thomas

I haven't uploaded any snapshot images yet. Here is how you can create one
from head.
> cd [...]/beam/
# For Python
> ./gradlew :sdks:python:container:py{version}:docker *where version is
{2,35,36,37}*
# For Java
> ./gradlew -p sdks/java/container docker
# For Go
> ./gradlew -p sdks/go/container docker

The 2.15 one is just for testing, not a real 2.15.0, nor a snapshot from
head.

Please let me know if you have any questions.
Hannah

On Wed, Sep 4, 2019 at 10:57 AM Thomas Weise  wrote:

> I actually found something in [1], but it is 2.15 unfortunately.
>
> [1]
> https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release/python2.7?gcrImageListsize=30
>
> On Wed, Sep 4, 2019 at 10:35 AM Thomas Weise  wrote:
>
>> Thanks for working on this. Do you happen to have publicly accessible
>> snapshots published for your testing currently (even when the final
>> location isn't sorted out)?
>>
>> I would like to use a 2.16 based Python SDK image for working on my
>> downstream project, but could not find anything in
>> gcr.io/apache-beam-testing/beam/sdks/rc/snapshot
>>
>> Thanks,
>> Thomas
>>
>> On Fri, Aug 30, 2019 at 10:56 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
>>> wrote:
>>>
 Hi team

 I am working on improving docker container support for Beam. We would
 like to publish prebuilt containers for each release version and daily
 snapshot. Current work focuses on release images only and it would be part
 of the release process.

 The release images will be pushed to GCR which is publicly
 accessible(pullable). We will use the following locations.
 *Repository*: gcr.io/beam
 *Project*: apache-beam-testing
 More details, including naming and tagging scheme, can be found at wiki
 
  which
 is written by several contributors.

 I would like to discuss these two questions.
 *1. How many tests do we need to run before pushing images to gcr*?
 Publishing artifacts is the last step of the release process, so at
 this moment, we already verified all codebase. In addition, many Jenkins
 tests use containers, so it is already verified several times. Do we need
 to run it again?

>>>
>>> In a docker repository, one container image can have multiple tags. One
>>> possibility is that  on the last step of the release process, after
>>> sufficient testing,  we place a production tag on an image that was already
>>> pushed with a dev tag.
>>>
>>> For example a dev tag may look like:
>>> gcr.io/apache-beam/python37:2.16.0-RC4, and production tag may look
>>> like:
>>> gcr.io/apache-beam/python37:2.16.0 and both will refer to the same
>>> image at the end.
>>>
>>> We should also plan what the process of updating the container image
>>> will look like, if we need to release the image with additional changes,
>>> and how we will test these changes before the final push (or placing
>>> production tag).
>>>
>>>

 *2. How many tests do we need to run to validate pushed images?*
 When we push the images, we assume the images would work and pass all
 the tests. After pushing, we should confirm the images are pullable and
 useable. I suggest we run several tests on dataflow with each pushed image.
 What do you think?

>>>
>>> I think it makes sense to do -  Beam runners that use SDK container
>>> images should have some continuously running tests, which periodically
>>> check that all supported images  are pullable and still compatible with the
>>> runner.
>>>
>>> This work can be refined later as we explore more during our release
 process.
 Please comment or edit the wiki page or reply to this email with your
 opinions.

 Thanks,
 Hannah

>>>


Re: Improve container support

2019-09-04 Thread Thomas Weise
I actually found something in [1], but it is 2.15 unfortunately.

[1]
https://console.cloud.google.com/gcr/images/apache-beam-testing/GLOBAL/beam/sdks/release/python2.7?gcrImageListsize=30

On Wed, Sep 4, 2019 at 10:35 AM Thomas Weise  wrote:

> Thanks for working on this. Do you happen to have publicly accessible
> snapshots published for your testing currently (even when the final
> location isn't sorted out)?
>
> I would like to use a 2.16 based Python SDK image for working on my
> downstream project, but could not find anything in
> gcr.io/apache-beam-testing/beam/sdks/rc/snapshot
>
> Thanks,
> Thomas
>
> On Fri, Aug 30, 2019 at 10:56 AM Valentyn Tymofieiev 
> wrote:
>
>> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
>> wrote:
>>
>>> Hi team
>>>
>>> I am working on improving docker container support for Beam. We would
>>> like to publish prebuilt containers for each release version and daily
>>> snapshot. Current work focuses on release images only and it would be part
>>> of the release process.
>>>
>>> The release images will be pushed to GCR which is publicly
>>> accessible(pullable). We will use the following locations.
>>> *Repository*: gcr.io/beam
>>> *Project*: apache-beam-testing
>>> More details, including naming and tagging scheme, can be found at wiki
>>> 
>>>  which
>>> is written by several contributors.
>>>
>>> I would like to discuss these two questions.
>>> *1. How many tests do we need to run before pushing images to gcr*?
>>> Publishing artifacts is the last step of the release process, so at this
>>> moment, we already verified all codebase. In addition, many Jenkins tests
>>> use containers, so it is already verified several times. Do we need to run
>>> it again?
>>>
>>
>> In a docker repository, one container image can have multiple tags. One
>> possibility is that  on the last step of the release process, after
>> sufficient testing,  we place a production tag on an image that was already
>> pushed with a dev tag.
>>
>> For example a dev tag may look like:
>> gcr.io/apache-beam/python37:2.16.0-RC4, and production tag may look
>> like:
>> gcr.io/apache-beam/python37:2.16.0 and both will refer to the same image
>> at the end.
>>
>> We should also plan what the process of updating the container image will
>> look like, if we need to release the image with additional changes, and how
>> we will test these changes before the final push (or placing production
>> tag).
>>
>>
>>>
>>> *2. How many tests do we need to run to validate pushed images?*
>>> When we push the images, we assume the images would work and pass all
>>> the tests. After pushing, we should confirm the images are pullable and
>>> useable. I suggest we run several tests on dataflow with each pushed image.
>>> What do you think?
>>>
>>
>> I think it makes sense to do -  Beam runners that use SDK container
>> images should have some continuously running tests, which periodically
>> check that all supported images  are pullable and still compatible with the
>> runner.
>>
>> This work can be refined later as we explore more during our release
>>> process.
>>> Please comment or edit the wiki page or reply to this email with your
>>> opinions.
>>>
>>> Thanks,
>>> Hannah
>>>
>>


Re: Beam at Google Summer of Code 2019

2019-09-04 Thread Ahmet Altay
Thank you Tanay for all your contributions during summer and looking
forward to more of it :)

On Wed, Sep 4, 2019 at 10:38 AM Tanay Tummalapalli 
wrote:

> Hi everyone,
>
> I've completed Google Summer of Code '19[1].
> I had fun working on Beam for the past 3 months and learning about Beam
> internals.
>
> Thank you Pablo for everything! None of it would have been possible
> without you.
> I'd also like to thank the Beam community for the code reviews and being
> supportive and encouraging.
>
> I'm moving to Bangalore this month. I'll be back to contributing to Beam
> next month.
>
> Thank You
>  - Tanay
>
> [1] https://gist.github.com/ttanay/80f84b7b852e0867d5a00d3b345e1dad
>
> On Fri, May 24, 2019 at 12:47 AM Tanay Tummalapalli 
> wrote:
>
>> Hi everyone,
>>
>> I made a Kanban board[1] on Github, on my fork of apache/beam to keep
>> track of progress for GSoC '19.
>>
>> Regards,
>> Tanay Tummalapalli
>>
>> [1] https://github.com/ttanay/beam/projects/1
>>
>> On Tue, May 7, 2019 at 6:39 PM Tanay Tummalapalli 
>> wrote:
>>
>>> Thank You!
>>>
>>> I'm really excited to work on Beam!
>>> I'd like to thank Pablo, Chamikara Jayalath and Tim Robertson for
>>> helping out with my proposal[1].
>>>
>>> Looking forward to working with everyone and learning a great deal.
>>>
>>> Regards
>>> Tanay Tummalapalli
>>> LinkedIn  | Github
>>> 
>>>
>>> [1]
>>> https://docs.google.com/document/d/15Peyd3Z_wu5rvGWw8lMLpZuTyyreM_JOAEFFWvF97YY/edit?usp=sharing
>>>
>>> On Tue, May 7, 2019 at 12:04 AM Pablo Estrada 
>>> wrote:
>>>
 Hello all,
 it is my pleasure to share with everyone that Tanay Tummalapalli has
 been accepted as a GSoC student with Beam, to implement support for File
 Loads into BigQuery for streaming pipelines[1].

 Tanay wrote a very strong proposal, and showed understanding of the
 tricky streaming considerations that will play out in this project.

 I speak on behalf of everyone welcoming you Tanay, and we'll be happy
 to see your contributions to Beam. : )
 Best
 -P.

 [1]
 https://summerofcode.withgoogle.com/projects/?sp-search=Tanay#4999837794172928

>>>


Re: Beam at Google Summer of Code 2019

2019-09-04 Thread Tanay Tummalapalli
Hi everyone,

I've completed Google Summer of Code '19[1].
I had fun working on Beam for the past 3 months and learning about Beam
internals.

Thank you Pablo for everything! None of it would have been possible without
you.
I'd also like to thank the Beam community for the code reviews and being
supportive and encouraging.

I'm moving to Bangalore this month. I'll be back to contributing to Beam
next month.

Thank You
 - Tanay

[1] https://gist.github.com/ttanay/80f84b7b852e0867d5a00d3b345e1dad

On Fri, May 24, 2019 at 12:47 AM Tanay Tummalapalli 
wrote:

> Hi everyone,
>
> I made a Kanban board[1] on Github, on my fork of apache/beam to keep
> track of progress for GSoC '19.
>
> Regards,
> Tanay Tummalapalli
>
> [1] https://github.com/ttanay/beam/projects/1
>
> On Tue, May 7, 2019 at 6:39 PM Tanay Tummalapalli 
> wrote:
>
>> Thank You!
>>
>> I'm really excited to work on Beam!
>> I'd like to thank Pablo, Chamikara Jayalath and Tim Robertson for helping
>> out with my proposal[1].
>>
>> Looking forward to working with everyone and learning a great deal.
>>
>> Regards
>> Tanay Tummalapalli
>> LinkedIn  | Github
>> 
>>
>> [1]
>> https://docs.google.com/document/d/15Peyd3Z_wu5rvGWw8lMLpZuTyyreM_JOAEFFWvF97YY/edit?usp=sharing
>>
>> On Tue, May 7, 2019 at 12:04 AM Pablo Estrada  wrote:
>>
>>> Hello all,
>>> it is my pleasure to share with everyone that Tanay Tummalapalli has
>>> been accepted as a GSoC student with Beam, to implement support for File
>>> Loads into BigQuery for streaming pipelines[1].
>>>
>>> Tanay wrote a very strong proposal, and showed understanding of the
>>> tricky streaming considerations that will play out in this project.
>>>
>>> I speak on behalf of everyone welcoming you Tanay, and we'll be happy to
>>> see your contributions to Beam. : )
>>> Best
>>> -P.
>>>
>>> [1]
>>> https://summerofcode.withgoogle.com/projects/?sp-search=Tanay#4999837794172928
>>>
>>


Re: Improve container support

2019-09-04 Thread Thomas Weise
Thanks for working on this. Do you happen to have publicly accessible
snapshots published for your testing currently (even when the final
location isn't sorted out)?

I would like to use a 2.16 based Python SDK image for working on my
downstream project, but could not find anything in
gcr.io/apache-beam-testing/beam/sdks/rc/snapshot

Thanks,
Thomas

On Fri, Aug 30, 2019 at 10:56 AM Valentyn Tymofieiev 
wrote:

> On Tue, Aug 27, 2019 at 3:35 PM Hannah Jiang 
> wrote:
>
>> Hi team
>>
>> I am working on improving docker container support for Beam. We would
>> like to publish prebuilt containers for each release version and daily
>> snapshot. Current work focuses on release images only and it would be part
>> of the release process.
>>
>> The release images will be pushed to GCR which is publicly
>> accessible(pullable). We will use the following locations.
>> *Repository*: gcr.io/beam
>> *Project*: apache-beam-testing
>> More details, including naming and tagging scheme, can be found at wiki
>> 
>>  which
>> is written by several contributors.
>>
>> I would like to discuss these two questions.
>> *1. How many tests do we need to run before pushing images to gcr*?
>> Publishing artifacts is the last step of the release process, so at this
>> moment, we already verified all codebase. In addition, many Jenkins tests
>> use containers, so it is already verified several times. Do we need to run
>> it again?
>>
>
> In a docker repository, one container image can have multiple tags. One
> possibility is that  on the last step of the release process, after
> sufficient testing,  we place a production tag on an image that was already
> pushed with a dev tag.
>
> For example a dev tag may look like:
> gcr.io/apache-beam/python37:2.16.0-RC4, and production tag may look like:
> gcr.io/apache-beam/python37:2.16.0 and both will refer to the same image
> at the end.
>
> We should also plan what the process of updating the container image will
> look like, if we need to release the image with additional changes, and how
> we will test these changes before the final push (or placing production
> tag).
>
>
>>
>> *2. How many tests do we need to run to validate pushed images?*
>> When we push the images, we assume the images would work and pass all the
>> tests. After pushing, we should confirm the images are pullable and
>> useable. I suggest we run several tests on dataflow with each pushed image.
>> What do you think?
>>
>
> I think it makes sense to do -  Beam runners that use SDK container images
> should have some continuously running tests, which periodically check that
> all supported images  are pullable and still compatible with the runner.
>
> This work can be refined later as we explore more during our release
>> process.
>> Please comment or edit the wiki page or reply to this email with your
>> opinions.
>>
>> Thanks,
>> Hannah
>>
>


[discuss] Auto-close issues with a PR associated?

2019-09-04 Thread Pablo Estrada
Hello all,
this has been discussed before, and I believe we concluded that we did not
want to auto-close JIRA issues referenced by a PR.
I wanted us to revisit this decision, because I don't think everyone has
adopted the correct workflow:

Workflow A:
1. Send a PR to fix BEAM-X
2. Go to JIRA, and mark BEAM-X as resolved.

Many of us do (1), but not (2) as much. The main argument was to avoid
messing with this workflow:

Workflow B:
1. Send a PR related to BEAM-X
2. Send a PR related to BEAM-X

N-1. Send a PR that fully fixes BEAM-X
N. Go to JIRA, and mark BEAM-X as resolved.

But, perhaps we should optimize for the very common case of Workflow A, and
let community members manually manage the less common case of Workflow B.

What do others think?
-P.


Re: [VOTE] Vendored Dependencies Release

2019-09-04 Thread Rui Wang
I'm happy to announce that we have unanimously approved this release.

There are 5 approving votes, 3 of which are binding:

* Lukasz Cwik

* Kenneth Knowles

* Ahmet Altay

There are no disapproving votes.

Thanks everyone!

On Tue, Sep 3, 2019 at 1:29 PM Lukasz Cwik  wrote:

> +1
>
> On Tue, Sep 3, 2019 at 1:22 PM Kenneth Knowles  wrote:
>
>> +1
>>
>> On Tue, Sep 3, 2019 at 11:00 AM Ahmet Altay  wrote:
>>
>>> +1
>>>
>>> On Tue, Sep 3, 2019 at 10:52 AM Andrew Pilloud 
>>> wrote:
>>>
 +1

 Inspected the jar it looked reasonable.

 Andrew

 On Tue, Sep 3, 2019 at 9:06 AM Rui Wang  wrote:

> Friendly ping.
>
>
> -Rui
>
> On Thu, Aug 29, 2019 at 9:50 AM Rui Wang  wrote:
>
>> Thanks Kai and Andrew. Now prgapachebeam-1083 is publicly exposed.
>>
>> I also found a useful link[1] to explain staging repos in Apache Nexus
>>
>>
>> [1]:
>> https://help.sonatype.com/repomanager2/staging-releases/managing-staging-repositories#ManagingStagingRepositories-ClosinganOpenRepository
>>
>> -Rui
>>
>> On Wed, Aug 28, 2019 at 9:19 PM Andrew Pilloud 
>> wrote:
>>
>>> You need to close the release for it to be published to the staging
>>> server. I can help if you still have questions.
>>>
>>> Andrew
>>>
>>> On Wed, Aug 28, 2019, 8:48 PM Rui Wang  wrote:
>>>
 I can see prgapachebeam-1083 is in open status in staging
 repository. I am not sure why it is not public exposed. I probably need
 some guidance on it.


 -Rui

 On Wed, Aug 28, 2019 at 3:50 PM Kai Jiang 
 wrote:

> Hi Rui,
>
> For accessing artifacts [1] in Maven Central Repository, is this
> intent to be not public exposed?
>
> Best,
> Kai
>
> [1]
> https://repository.apache.org/content/repositories/orgapachebeam-1083/
>
> On Wed, Aug 28, 2019 at 11:57 AM Kai Jiang 
> wrote:
>
>> +1 (non-binding)Thanks Rui!
>>
>> On Tue, Aug 27, 2019 at 10:46 PM Rui Wang 
>> wrote:
>>
>>> Please review the release of the following artifacts that we
>>> vendor:
>>>
>>>  * beam-vendor-calcite-1_20_0
>>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #1 for the
>>> org.apache.beam:beam-vendor-calcite-1_20_0:0.1, as follows:
>>>
>>> [ ] +1, Approve the release
>>>
>>> [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>>
>>>
>>> The complete staging area is available for your review, which
>>> includes:
>>>
>>> * the official Apache source release to be deployed to
>>> dist.apache.org [1], which is signed with the key with
>>> fingerprint 0D7BE1A252DBCEE89F6491BBDFA64862B703F5C8 [2],
>>>
>>> * all artifacts to be deployed to the Maven Central Repository
>>> [3],
>>>
>>> * commit hash "664e25019fc1977e7041e4b834e8d9628b912473" [4],
>>>
>>> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>>
>>> Rui
>>>
>>> [1]
>>> https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0
>>>
>>> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>
>>> [3]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1083/
>>>
>>> [4]
>>> https://github.com/apache/beam/commit/664e25019fc1977e7041e4b834e8d9628b912473
>>>
>>>


Re: Problems with KafkaIOIT in Beam

2019-09-04 Thread Chamikara Jayalath
On Wed, Sep 4, 2019 at 4:23 AM Michał Walenia 
wrote:

> Hi all,
> recently I've been struggling to adapt Java's KafkaIOIT to work with a
> large dataset generated by a SyntheticSource. I want to push 100M records
> through a Kafka topic and verify data correctness and at the same time
> check the performance of KafkaIO.Write and KafkaIO.Read.
>
> To perform the tests I'm using a Kafka cluster on Kubernetes from the Beam
> repo (here
> 
> ).
>
> To give you an overview of what should happen, first the records are
> generated in a deterministic way (using hashes of list positions as Random
> seeds), next they are written to Kafka - this concludes the write pipeline.
> As for reading and correctness checking - first, the data is read from the
> topic and after being decoded into String representations, a hashcode of
> the whole PCollection is calculated (For details, check KafkaIOIT.java).
>
> During the testing I ran into several problems:
> 1. When all the records are read from the Kafka topic, the hash is
> different each time.
> 2. Sometimes not all the records are read and the Dataflow task waits for
> the input indefinitely, occasionally throwing exceptions.
>
> I suspect that something is wrong with the Kafka cluster configuration,
> but unfortunately, I lack experience with this tool to know what could be
> missing here.
> I would be very grateful for your help, Kafka config hints or anything
> else you can add.
>

Do you see consistent values in Dataflow step element counters ? That might
help you to figure out where the non-determinism is. Also, trying to run
read and write jobs separately might be helpful. I'm not too familiar with
Kafka cluster configurations though.

Thanks,
Cham


> Thanks and have a good day,
>
> Michal
>
> --
>
> Michał Walenia
> Polidea  | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! 
>


Need help reviewing Protobuf enhancement

2019-09-04 Thread Alex Van Boxel
Hi all,

I need some help getting the following PR's in Beam. Both of the next PR's
are tested on the DirectRunner and the Dataflow runner... I'm currently
using my own build of BEAM, because I need the support on the next product
we're building, so it's been extensively battle tested in my own pipelines
(yes... I've found some bugs and they are fixed in the PR).

I've planned still lots of enhancements, but I don't want todo them on the
current branches as it will introduce even more delays.

It would be cool that at least on next weeks Beam Summit I could say, it's
on master!

[BEAM-7274] Implement the Protobuf schema provider
https://github.com/apache/beam/pull/8690

This PR will introduce Row support for Protobuf. Extensive work has been to
support DynamicMessages, it's even dynamic first. Nothing is enabled by
default so existing workflow should not expect any disruption.

[BEAM-5967] Add handling of DynamicMessage in ProtoCoder
https://github.com/apache/beam/pull/8496

It's an older PR then the makes sure that Dynamic Message can be supported.
It uses the ProtoDomain (that is a serializable way of wrapping the
descriptor) that was back-ported from [BEAM-7274] so that bother PR's are
aligned.

Thanks for the help.

 _/
_/ Alex Van Boxel


Re: Cassandra flaky on Jenkins?

2019-09-04 Thread Jean-Baptiste Onofré
Thanks David,

it makes sense, it gives me time to investigate and fix.

Regards
JB

On 04/09/2019 15:01, David Morávek wrote:
> Hi, temporarily disabling the test
> , until BEAM-8025
>  is resolved (marking it
> as blocker for 2.16), so we can unblock ongoing pull requests.
> 
> Best,
> D.
> 
> On Tue, Sep 3, 2019 at 3:57 PM Jean-Baptiste Onofré  > wrote:
> 
> Hi Max,
> 
> yup, I'm starting the investigation.
> 
> I keep you posted.
> 
> Regards
> JB
> 
> On 03/09/2019 15:34, Maximilian Michels wrote:
> > The newest incarnation of this is here:
> > https://jira.apache.org/jira/browse/BEAM-8025
> >
> > Would be good if you could take a look JB.
> >
> > Thanks,
> > Max
> >
> > On 03.09.19 15:32, David Morávek wrote:
> >> yes, that looks similar. example:
> >>
> >> https://github.com/apache/beam/pull/9464
> >>
> >> D.
> >>
> >> On 3 Sep 2019, at 15:18, Jean-Baptiste Onofré  
> >> >> wrote:
> >>
> >>> Thanks David,
> >>>
> >>> the build is running on my machine to see if I can reproduce
> locally.
> >>>
> >>> It sounds like https://issues.apache.org/jira/browse/BEAM-7355
> right ?
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 03/09/2019 15:11, David Morávek wrote:
>  I’m running into these failures too
> 
>  D.
> 
>  Sent from my iPhone
> 
> > On 3 Sep 2019, at 14:34, Jean-Baptiste Onofré  
> > >> wrote:
> >
> > Hi,
> >
> > Let me take a look. Do you always have this issue on Jenkins or
> > randomly ?
> >
> > Regards
> > JB
> >
> >> On 03/09/2019 14:19, Alex Van Boxel wrote:
> >> Hi, is it only me that are bumping on the flaky Cassandra on
> >> Jenkins? I
> >> like to get my PR approved but I can't get past the Cassandra
> >> error...
> >>
> >> * org.apache.beam.sdk.io
> .cassandra.CassandraIOTest.classMethod
> >>
>   
> 
> >>
> >>
> >>
> >>
> >> _/
> >> _/ Alex Van Boxel
> >
> > -- 
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> >
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >>>
> >>> -- 
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org 
> >
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Cassandra flaky on Jenkins?

2019-09-04 Thread Alex Van Boxel
Thanks, got my PR green now,

 _/
_/ Alex Van Boxel


On Wed, Sep 4, 2019 at 3:02 PM David Morávek 
wrote:

> Hi, temporarily disabling the test
> , until BEAM-8025
>  is resolved (marking it
> as blocker for 2.16), so we can unblock ongoing pull requests.
>
> Best,
> D.
>
> On Tue, Sep 3, 2019 at 3:57 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Max,
>>
>> yup, I'm starting the investigation.
>>
>> I keep you posted.
>>
>> Regards
>> JB
>>
>> On 03/09/2019 15:34, Maximilian Michels wrote:
>> > The newest incarnation of this is here:
>> > https://jira.apache.org/jira/browse/BEAM-8025
>> >
>> > Would be good if you could take a look JB.
>> >
>> > Thanks,
>> > Max
>> >
>> > On 03.09.19 15:32, David Morávek wrote:
>> >> yes, that looks similar. example:
>> >>
>> >> https://github.com/apache/beam/pull/9464
>> >>
>> >> D.
>> >>
>> >> On 3 Sep 2019, at 15:18, Jean-Baptiste Onofré > >> > wrote:
>> >>
>> >>> Thanks David,
>> >>>
>> >>> the build is running on my machine to see if I can reproduce locally.
>> >>>
>> >>> It sounds like https://issues.apache.org/jira/browse/BEAM-7355 right
>> ?
>> >>>
>> >>> Regards
>> >>> JB
>> >>>
>> >>> On 03/09/2019 15:11, David Morávek wrote:
>>  I’m running into these failures too
>> 
>>  D.
>> 
>>  Sent from my iPhone
>> 
>> > On 3 Sep 2019, at 14:34, Jean-Baptiste Onofré > > > wrote:
>> >
>> > Hi,
>> >
>> > Let me take a look. Do you always have this issue on Jenkins or
>> > randomly ?
>> >
>> > Regards
>> > JB
>> >
>> >> On 03/09/2019 14:19, Alex Van Boxel wrote:
>> >> Hi, is it only me that are bumping on the flaky Cassandra on
>> >> Jenkins? I
>> >> like to get my PR approved but I can't get past the Cassandra
>> >> error...
>> >>
>> >> * org.apache.beam.sdk.io.cassandra.CassandraIOTest.classMethod
>> >>   <
>> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1300/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/
>> >
>> >>
>> >>
>> >>
>> >>
>> >> _/
>> >> _/ Alex Van Boxel
>> >
>> > --
>> > Jean-Baptiste Onofré
>> > jbono...@apache.org 
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
>> >>>
>> >>> --
>> >>> Jean-Baptiste Onofré
>> >>> jbono...@apache.org 
>> >>> http://blog.nanthrax.net
>> >>> Talend - http://www.talend.com
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


Re: Cassandra flaky on Jenkins?

2019-09-04 Thread David Morávek
Hi, temporarily disabling the test
, until BEAM-8025
 is resolved (marking it as
blocker for 2.16), so we can unblock ongoing pull requests.

Best,
D.

On Tue, Sep 3, 2019 at 3:57 PM Jean-Baptiste Onofré  wrote:

> Hi Max,
>
> yup, I'm starting the investigation.
>
> I keep you posted.
>
> Regards
> JB
>
> On 03/09/2019 15:34, Maximilian Michels wrote:
> > The newest incarnation of this is here:
> > https://jira.apache.org/jira/browse/BEAM-8025
> >
> > Would be good if you could take a look JB.
> >
> > Thanks,
> > Max
> >
> > On 03.09.19 15:32, David Morávek wrote:
> >> yes, that looks similar. example:
> >>
> >> https://github.com/apache/beam/pull/9464
> >>
> >> D.
> >>
> >> On 3 Sep 2019, at 15:18, Jean-Baptiste Onofré  >> > wrote:
> >>
> >>> Thanks David,
> >>>
> >>> the build is running on my machine to see if I can reproduce locally.
> >>>
> >>> It sounds like https://issues.apache.org/jira/browse/BEAM-7355 right ?
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 03/09/2019 15:11, David Morávek wrote:
>  I’m running into these failures too
> 
>  D.
> 
>  Sent from my iPhone
> 
> > On 3 Sep 2019, at 14:34, Jean-Baptiste Onofré  > > wrote:
> >
> > Hi,
> >
> > Let me take a look. Do you always have this issue on Jenkins or
> > randomly ?
> >
> > Regards
> > JB
> >
> >> On 03/09/2019 14:19, Alex Van Boxel wrote:
> >> Hi, is it only me that are bumping on the flaky Cassandra on
> >> Jenkins? I
> >> like to get my PR approved but I can't get past the Cassandra
> >> error...
> >>
> >> * org.apache.beam.sdk.io.cassandra.CassandraIOTest.classMethod
> >>   <
> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1300/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/
> >
> >>
> >>
> >>
> >>
> >> _/
> >> _/ Alex Van Boxel
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org 
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Problems with KafkaIOIT in Beam

2019-09-04 Thread Michał Walenia
Hi all,
recently I've been struggling to adapt Java's KafkaIOIT to work with a
large dataset generated by a SyntheticSource. I want to push 100M records
through a Kafka topic and verify data correctness and at the same time
check the performance of KafkaIO.Write and KafkaIO.Read.

To perform the tests I'm using a Kafka cluster on Kubernetes from the Beam
repo (here

).

To give you an overview of what should happen, first the records are
generated in a deterministic way (using hashes of list positions as Random
seeds), next they are written to Kafka - this concludes the write pipeline.
As for reading and correctness checking - first, the data is read from the
topic and after being decoded into String representations, a hashcode of
the whole PCollection is calculated (For details, check KafkaIOIT.java).

During the testing I ran into several problems:
1. When all the records are read from the Kafka topic, the hash is
different each time.
2. Sometimes not all the records are read and the Dataflow task waits for
the input indefinitely, occasionally throwing exceptions.

I suspect that something is wrong with the Kafka cluster configuration, but
unfortunately, I lack experience with this tool to know what could be
missing here.
I would be very grateful for your help, Kafka config hints or anything else
you can add.

Thanks and have a good day,

Michal

-- 

Michał Walenia
Polidea  | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.wale...@polidea.com

Unique Tech
Check out our projects!