Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-02 Thread Suneel Marthi
+1 non-binding

1. tested with beam samples
2. verified sigs and hashes of artifacts


On Fri, Aug 3, 2018 at 12:43 AM, Jean-Baptiste Onofré 
wrote:

> +1 (binding)
>
> Tested with beam-samples.
>
> I didn't have time to include three Jira, but 2.7.0 should be in vote in
> soon ;)
>
> Regards
> JB
>
> On 01/08/2018 01:50, Pablo Estrada wrote:
> > Hello everyone!
> >
> > I have been able to prepare a release candidate for Beam 2.6.0. : D
> >
> > Please review and vote on the release candidate #1 for the version
> > 2.6.0, as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staged set of artifacts is available for your review, which
> > includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> >  [2], which is signed with the key with
> > fingerprint 2F1FEDCDF6DD7990422F482F65224E0292DD8A51 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.6.0-RC1" [5],
> > * website pull request listing the release and publishing the API
> > reference manual [6].
> > * Python artifacts are deployed along with the source release to the
> > dist.apache.org  [2].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Regards
> > -Pablo.
> >
> > [1]
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12319527=12343392
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.6.0/
> > [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
> > [4] https://repository.apache.org/content/repositories/
> orgapachebeam-1044/
> > [5] https://github.com/apache/beam/tree/v2.6.0-RC1
> > [6] https://github.com/apache/beam-site/pull/518
> >
> > --
> > Got feedback? go/pabloem-feedback
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-02 Thread Jean-Baptiste Onofré
+1 (binding)

Tested with beam-samples.

I didn't have time to include three Jira, but 2.7.0 should be in vote in
soon ;)

Regards
JB

On 01/08/2018 01:50, Pablo Estrada wrote:
> Hello everyone!
> 
> I have been able to prepare a release candidate for Beam 2.6.0. : D
> 
> Please review and vote on the release candidate #1 for the version
> 2.6.0, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> The complete staged set of artifacts is available for your review, which
> includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
>  [2], which is signed with the key with
> fingerprint 2F1FEDCDF6DD7990422F482F65224E0292DD8A51 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.6.0-RC1" [5],
> * website pull request listing the release and publishing the API
> reference manual [6].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org  [2].
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Regards
> -Pablo.
> 
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12343392
> [2] https://dist.apache.org/repos/dist/dev/beam/2.6.0/
> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1044/
> [5] https://github.com/apache/beam/tree/v2.6.0-RC1
> [6] https://github.com/apache/beam-site/pull/518
> 
> -- 
> Got feedback? go/pabloem-feedback

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Nexmark PostCommit tests fail

2018-08-02 Thread Mikhail Gryzykhin
Thank you for looking into this Andrew.

--Mikhail

Have feedback ?


On Thu, Aug 2, 2018 at 5:17 PM Andrew Pilloud  wrote:

> These are timeouts at 4 hours. Dataflow is slow to start compared with
> other runners. https://github.com/apache/beam/pull/6127 will run tests in
> parallel and fix the issue.
>
> Andrew
>
> On Thu, Aug 2, 2018, 5:05 PM Reuven Lax  wrote:
>
>> I'm having a hard time understanding what the failure is. I see a bunch
>> of Jenkins errors in the log. Did the actual Beam pipeline fail, or is this
>> a Jenkins-level failure?
>>
>> Reuven
>>
>> On Thu, Aug 2, 2018 at 4:54 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> We have Nexmark PostCommit tests fail. Should we disable or fix these?
>>>
>>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/
>>>
>>> Regards,
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>


Re: Removing documentation for old Beam versions

2018-08-02 Thread Udi Meiri
[image: pr-520.png]
(trying that image again)

On Thu, Aug 2, 2018 at 7:00 PM Udi Meiri  wrote:

> Alright, created https://github.com/apache/beam-site/pull/520
> [image: pr-520.png]
> Reduces staging upload from 500M down to 270M, and halves the number of
> files from ~22k to 11k.
>
>
>
> On Thu, Aug 2, 2018 at 6:58 PM Pablo Estrada  wrote:
>
>> I believe tags will be necessarily because for anyone looking for old
>> docs that have been removed, they will need to browse back in history, not
>> just browse the tree of directories.
>> -P.
>>
>> On Thu, Aug 2, 2018, 6:46 PM Mikhail Gryzykhin  wrote:
>>
>>> Last time I talked with Scott I brought this idea in. I believe the plan
>>> was either to publish compiled site to website directly, or keep it in
>>> separate storage from apache/beam repo.
>>>
>>> One of the main reasons not to check in compiled version of website is
>>> that every developer will have to pull all the versions of website every
>>> time they clone repo, which is not that good of an idea to do.
>>>
>>> Regards,
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>>
>>> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri  wrote:
>>>
 Pablo, the docs are generated into versioned paths, e.g.,
 https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
 not necessary?
 Also, once apache/beam-site is merged with apache/beam the release
 branch should have the relevant docs (although perhaps it's better to put
 them in a different repo or storage system).

 Thomas, I would very much like to not have javadoc/pydoc generation be
 part of the website review process, as it takes up a lot of time when
 changes are staged (10s of thousands of files), especially when a PR is
 updated and existing staged files need to be deleted.


 On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin 
 wrote:

> +1 For removing old documentation.
>
> @Thomas: Migration work is in backlog and will be picked up in near
> time.
>
> --Mikhail
>
> Have feedback ?
>
>
> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise  wrote:
>
>> +1 for removing pre 2.0 documentation (as well as the entries from
>> https://beam.apache.org/get-started/downloads/)
>>
>> Isn't it part of the beam-site changes that we will no longer check
>> in generated documentation into the repository? Those can be generated 
>> and
>> deployed independently (when a commit to a branch occurs), such as done 
>> in
>> the Apex and Flink projects.
>>
>> I was told that Scott who was working in the beam-site changes is on
>> leave now and the migration is still pending (see note at
>> https://github.com/apache/beam/tree/master/website). Is anyone else
>> going to pick it up?
>>
>> Thanks,
>> Thomas
>>
>>
>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada 
>> wrote:
>>
>>> Is it worth adding a tag / branch to the repositories every time we
>>> make a release, so that people are able to dive in and find the docs?
>>> Best
>>> -P.
>>>
>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay 
>>> wrote:
>>>
 I would guess that users are still using some of these old
 releases. It is unclear from Beam website which releases are still
 supported or not. It probably makes sense to drop documentation for
 releases < 2.0. (I would suggest keeping docs for 2.0). For the future 
 I
 can work on updating the Beam website to clarify the state of each 
 release.

 On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri 
 wrote:

> The older docs are not directly linked to and are in Github commit
> history.
>
> If there are no objections I'm going to delete javadocs and pydocs
> for releases older than 1 year,
> meaning 2.0.0 and older (going by the dates here
> ).
>
> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
> danolive...@google.com> wrote:
>
>> The older docs should be recorded in the commit history of the
>> website repository, right? If they're not currently used in the 
>> website and
>> they're in the commit history then I don't see a reason to save them.
>>
>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri 
>> wrote:
>>
>>> Hi all,
>>> I'm writing a PR for apache/beam-site and
>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, 
>>> because it's
>>> trying to deletes 22k files and then copy 22k files (warning
>>> large file
>>> 
>>> ).
>>>
>>> It seems that we could save a 

Re: Removing documentation for old Beam versions

2018-08-02 Thread Udi Meiri
Alright, created https://github.com/apache/beam-site/pull/520
[image: pr-520.png]
Reduces staging upload from 500M down to 270M, and halves the number of
files from ~22k to 11k.



On Thu, Aug 2, 2018 at 6:58 PM Pablo Estrada  wrote:

> I believe tags will be necessarily because for anyone looking for old docs
> that have been removed, they will need to browse back in history, not just
> browse the tree of directories.
> -P.
>
> On Thu, Aug 2, 2018, 6:46 PM Mikhail Gryzykhin  wrote:
>
>> Last time I talked with Scott I brought this idea in. I believe the plan
>> was either to publish compiled site to website directly, or keep it in
>> separate storage from apache/beam repo.
>>
>> One of the main reasons not to check in compiled version of website is
>> that every developer will have to pull all the versions of website every
>> time they clone repo, which is not that good of an idea to do.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri  wrote:
>>
>>> Pablo, the docs are generated into versioned paths, e.g.,
>>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>>> not necessary?
>>> Also, once apache/beam-site is merged with apache/beam the release
>>> branch should have the relevant docs (although perhaps it's better to put
>>> them in a different repo or storage system).
>>>
>>> Thomas, I would very much like to not have javadoc/pydoc generation be
>>> part of the website review process, as it takes up a lot of time when
>>> changes are staged (10s of thousands of files), especially when a PR is
>>> updated and existing staged files need to be deleted.
>>>
>>>
>>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin 
>>> wrote:
>>>
 +1 For removing old documentation.

 @Thomas: Migration work is in backlog and will be picked up in near
 time.

 --Mikhail

 Have feedback ?


 On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise  wrote:

> +1 for removing pre 2.0 documentation (as well as the entries from
> https://beam.apache.org/get-started/downloads/)
>
> Isn't it part of the beam-site changes that we will no longer check in
> generated documentation into the repository? Those can be generated and
> deployed independently (when a commit to a branch occurs), such as done in
> the Apex and Flink projects.
>
> I was told that Scott who was working in the beam-site changes is on
> leave now and the migration is still pending (see note at
> https://github.com/apache/beam/tree/master/website). Is anyone else
> going to pick it up?
>
> Thanks,
> Thomas
>
>
> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada 
> wrote:
>
>> Is it worth adding a tag / branch to the repositories every time we
>> make a release, so that people are able to dive in and find the docs?
>> Best
>> -P.
>>
>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay  wrote:
>>
>>> I would guess that users are still using some of these old releases.
>>> It is unclear from Beam website which releases are still supported or 
>>> not.
>>> It probably makes sense to drop documentation for releases < 2.0. (I 
>>> would
>>> suggest keeping docs for 2.0). For the future I can work on updating the
>>> Beam website to clarify the state of each release.
>>>
>>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri  wrote:
>>>
 The older docs are not directly linked to and are in Github commit
 history.

 If there are no objections I'm going to delete javadocs and pydocs
 for releases older than 1 year,
 meaning 2.0.0 and older (going by the dates here
 ).

 On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
 danolive...@google.com> wrote:

> The older docs should be recorded in the commit history of the
> website repository, right? If they're not currently used in the 
> website and
> they're in the commit history then I don't see a reason to save them.
>
> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri 
> wrote:
>
>> Hi all,
>> I'm writing a PR for apache/beam-site and
>> beam_PreCommit_Website_Stage is timing out after 100 minutes, 
>> because it's
>> trying to deletes 22k files and then copy 22k files (warning
>> large file
>> 
>> ).
>>
>> It seems that we could save a lot of time by deleting the older
>> javadoc and pydoc files for older versions. Is there a good reason 
>> to keep
>> around this kind of documentation for older versions (say 1 year 
>> back)?
>>
>
>>> --

Re: Removing documentation for old Beam versions

2018-08-02 Thread Pablo Estrada
I believe tags will be necessarily because for anyone looking for old docs
that have been removed, they will need to browse back in history, not just
browse the tree of directories.
-P.

On Thu, Aug 2, 2018, 6:46 PM Mikhail Gryzykhin  wrote:

> Last time I talked with Scott I brought this idea in. I believe the plan
> was either to publish compiled site to website directly, or keep it in
> separate storage from apache/beam repo.
>
> One of the main reasons not to check in compiled version of website is
> that every developer will have to pull all the versions of website every
> time they clone repo, which is not that good of an idea to do.
>
> Regards,
> --Mikhail
>
> Have feedback ?
>
>
> On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri  wrote:
>
>> Pablo, the docs are generated into versioned paths, e.g.,
>> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are
>> not necessary?
>> Also, once apache/beam-site is merged with apache/beam the release branch
>> should have the relevant docs (although perhaps it's better to put them in
>> a different repo or storage system).
>>
>> Thomas, I would very much like to not have javadoc/pydoc generation be
>> part of the website review process, as it takes up a lot of time when
>> changes are staged (10s of thousands of files), especially when a PR is
>> updated and existing staged files need to be deleted.
>>
>>
>> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> +1 For removing old documentation.
>>>
>>> @Thomas: Migration work is in backlog and will be picked up in near time.
>>>
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>>
>>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise  wrote:
>>>
 +1 for removing pre 2.0 documentation (as well as the entries from
 https://beam.apache.org/get-started/downloads/)

 Isn't it part of the beam-site changes that we will no longer check in
 generated documentation into the repository? Those can be generated and
 deployed independently (when a commit to a branch occurs), such as done in
 the Apex and Flink projects.

 I was told that Scott who was working in the beam-site changes is on
 leave now and the migration is still pending (see note at
 https://github.com/apache/beam/tree/master/website). Is anyone else
 going to pick it up?

 Thanks,
 Thomas


 On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada 
 wrote:

> Is it worth adding a tag / branch to the repositories every time we
> make a release, so that people are able to dive in and find the docs?
> Best
> -P.
>
> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay  wrote:
>
>> I would guess that users are still using some of these old releases.
>> It is unclear from Beam website which releases are still supported or 
>> not.
>> It probably makes sense to drop documentation for releases < 2.0. (I 
>> would
>> suggest keeping docs for 2.0). For the future I can work on updating the
>> Beam website to clarify the state of each release.
>>
>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri  wrote:
>>
>>> The older docs are not directly linked to and are in Github commit
>>> history.
>>>
>>> If there are no objections I'm going to delete javadocs and pydocs
>>> for releases older than 1 year,
>>> meaning 2.0.0 and older (going by the dates here
>>> ).
>>>
>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>>> danolive...@google.com> wrote:
>>>
 The older docs should be recorded in the commit history of the
 website repository, right? If they're not currently used in the 
 website and
 they're in the commit history then I don't see a reason to save them.

 On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:

> Hi all,
> I'm writing a PR for apache/beam-site and
> beam_PreCommit_Website_Stage is timing out after 100 minutes, because 
> it's
> trying to deletes 22k files and then copy 22k files (warning
> large file
> 
> ).
>
> It seems that we could save a lot of time by deleting the older
> javadoc and pydoc files for older versions. Is there a good reason to 
> keep
> around this kind of documentation for older versions (say 1 year 
> back)?
>

>> --
> Got feedback? go/pabloem-feedback
> 
>
 --
Got feedback? go/pabloem-feedback


Re: Removing documentation for old Beam versions

2018-08-02 Thread Mikhail Gryzykhin
Last time I talked with Scott I brought this idea in. I believe the plan
was either to publish compiled site to website directly, or keep it in
separate storage from apache/beam repo.

One of the main reasons not to check in compiled version of website is that
every developer will have to pull all the versions of website every time
they clone repo, which is not that good of an idea to do.

Regards,
--Mikhail

Have feedback ?


On Thu, Aug 2, 2018 at 6:42 PM Udi Meiri  wrote:

> Pablo, the docs are generated into versioned paths, e.g.,
> https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are not
> necessary?
> Also, once apache/beam-site is merged with apache/beam the release branch
> should have the relevant docs (although perhaps it's better to put them in
> a different repo or storage system).
>
> Thomas, I would very much like to not have javadoc/pydoc generation be
> part of the website review process, as it takes up a lot of time when
> changes are staged (10s of thousands of files), especially when a PR is
> updated and existing staged files need to be deleted.
>
>
> On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin 
> wrote:
>
>> +1 For removing old documentation.
>>
>> @Thomas: Migration work is in backlog and will be picked up in near time.
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise  wrote:
>>
>>> +1 for removing pre 2.0 documentation (as well as the entries from
>>> https://beam.apache.org/get-started/downloads/)
>>>
>>> Isn't it part of the beam-site changes that we will no longer check in
>>> generated documentation into the repository? Those can be generated and
>>> deployed independently (when a commit to a branch occurs), such as done in
>>> the Apex and Flink projects.
>>>
>>> I was told that Scott who was working in the beam-site changes is on
>>> leave now and the migration is still pending (see note at
>>> https://github.com/apache/beam/tree/master/website). Is anyone else
>>> going to pick it up?
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada 
>>> wrote:
>>>
 Is it worth adding a tag / branch to the repositories every time we
 make a release, so that people are able to dive in and find the docs?
 Best
 -P.

 On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay  wrote:

> I would guess that users are still using some of these old releases.
> It is unclear from Beam website which releases are still supported or not.
> It probably makes sense to drop documentation for releases < 2.0. (I would
> suggest keeping docs for 2.0). For the future I can work on updating the
> Beam website to clarify the state of each release.
>
> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri  wrote:
>
>> The older docs are not directly linked to and are in Github commit
>> history.
>>
>> If there are no objections I'm going to delete javadocs and pydocs
>> for releases older than 1 year,
>> meaning 2.0.0 and older (going by the dates here
>> ).
>>
>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
>> danolive...@google.com> wrote:
>>
>>> The older docs should be recorded in the commit history of the
>>> website repository, right? If they're not currently used in the website 
>>> and
>>> they're in the commit history then I don't see a reason to save them.
>>>
>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:
>>>
 Hi all,
 I'm writing a PR for apache/beam-site and
 beam_PreCommit_Website_Stage is timing out after 100 minutes, because 
 it's
 trying to deletes 22k files and then copy 22k files (warning large
 file
 
 ).

 It seems that we could save a lot of time by deleting the older
 javadoc and pydoc files for older versions. Is there a good reason to 
 keep
 around this kind of documentation for older versions (say 1 year back)?

>>>
> --
 Got feedback? go/pabloem-feedback
 

>>>


Re: Removing documentation for old Beam versions

2018-08-02 Thread Udi Meiri
Pablo, the docs are generated into versioned paths, e.g.,
https://beam.apache.org/documentation/sdks/javadoc/2.5.0/ so tags are not
necessary?
Also, once apache/beam-site is merged with apache/beam the release branch
should have the relevant docs (although perhaps it's better to put them in
a different repo or storage system).

Thomas, I would very much like to not have javadoc/pydoc generation be part
of the website review process, as it takes up a lot of time when changes
are staged (10s of thousands of files), especially when a PR is updated and
existing staged files need to be deleted.


On Thu, Aug 2, 2018 at 1:15 PM Mikhail Gryzykhin  wrote:

> +1 For removing old documentation.
>
> @Thomas: Migration work is in backlog and will be picked up in near time.
>
> --Mikhail
>
> Have feedback ?
>
>
> On Thu, Aug 2, 2018 at 12:54 PM Thomas Weise  wrote:
>
>> +1 for removing pre 2.0 documentation (as well as the entries from
>> https://beam.apache.org/get-started/downloads/)
>>
>> Isn't it part of the beam-site changes that we will no longer check in
>> generated documentation into the repository? Those can be generated and
>> deployed independently (when a commit to a branch occurs), such as done in
>> the Apex and Flink projects.
>>
>> I was told that Scott who was working in the beam-site changes is on
>> leave now and the migration is still pending (see note at
>> https://github.com/apache/beam/tree/master/website). Is anyone else
>> going to pick it up?
>>
>> Thanks,
>> Thomas
>>
>>
>> On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada  wrote:
>>
>>> Is it worth adding a tag / branch to the repositories every time we make
>>> a release, so that people are able to dive in and find the docs?
>>> Best
>>> -P.
>>>
>>> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay  wrote:
>>>
 I would guess that users are still using some of these old releases. It
 is unclear from Beam website which releases are still supported or not. It
 probably makes sense to drop documentation for releases < 2.0. (I would
 suggest keeping docs for 2.0). For the future I can work on updating the
 Beam website to clarify the state of each release.

 On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri  wrote:

> The older docs are not directly linked to and are in Github commit
> history.
>
> If there are no objections I'm going to delete javadocs and pydocs for
> releases older than 1 year,
> meaning 2.0.0 and older (going by the dates here
> ).
>
> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira <
> danolive...@google.com> wrote:
>
>> The older docs should be recorded in the commit history of the
>> website repository, right? If they're not currently used in the website 
>> and
>> they're in the commit history then I don't see a reason to save them.
>>
>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:
>>
>>> Hi all,
>>> I'm writing a PR for apache/beam-site and
>>> beam_PreCommit_Website_Stage is timing out after 100 minutes, because 
>>> it's
>>> trying to deletes 22k files and then copy 22k files (warning large
>>> file
>>> 
>>> ).
>>>
>>> It seems that we could save a lot of time by deleting the older
>>> javadoc and pydoc files for older versions. Is there a good reason to 
>>> keep
>>> around this kind of documentation for older versions (say 1 year back)?
>>>
>>
 --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Call for items] Beam August Newsletter

2018-08-02 Thread Pablo Estrada
Thanks for doing this Rose! I'll add a couple of things.
-P.

On Thu, Aug 2, 2018, 4:18 PM Rose Nguyen  wrote:

> Hi all:
>
> Here's
> 
>  [1]
> the template for the August Beam Newsletter!
>
> *Add the highlights from June and July that you want to share with
> the community by 8/8 11:59 p.m. **PDT.*
>
> I'm working with Gris--we've heard your requests and will collect the
> notes via Google docs but send out the final version directly to the user
> mailing list. I'll edit and send the newsletter on 8/10.
>
> Thanks!
>
> [1]
> https://docs.google.com/document/d/124klHcJcIi_gD6rvMXwbbToINl1KTdXPmYrgGB998FQ/edit
> --
>
>
> Rose Thi Nguyen
>
>   Technical Writer
>
> (281) 683-6900
>
> --
> You received this message because you are subscribed to the Google Groups
> "datapls-team" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datapls-team+unsubscr...@google.com.
> To post to this group, send email to datapls-t...@google.com.
> To view this discussion on the web visit
> https://groups.google.com/a/google.com/d/msgid/datapls-team/CAJ3qQ%2BH19_m4NHKOmgY4feYxrGaQQnn3GdByuDNUV%3D%2BkrhwE6w%40mail.gmail.com
> 
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "dataflow-team" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dataflow-team+unsubscr...@google.com.
> To post to this group, send email to dataflow-t...@google.com.
> To view this discussion on the web visit
> https://groups.google.com/a/google.com/d/msgid/dataflow-team/CAJ3qQ%2BH19_m4NHKOmgY4feYxrGaQQnn3GdByuDNUV%3D%2BkrhwE6w%40mail.gmail.com
> 
> .
>
-- 
Got feedback? go/pabloem-feedback


Re: Nexmark PostCommit tests fail

2018-08-02 Thread Andrew Pilloud
These are timeouts at 4 hours. Dataflow is slow to start compared with
other runners. https://github.com/apache/beam/pull/6127 will run tests in
parallel and fix the issue.

Andrew

On Thu, Aug 2, 2018, 5:05 PM Reuven Lax  wrote:

> I'm having a hard time understanding what the failure is. I see a bunch of
> Jenkins errors in the log. Did the actual Beam pipeline fail, or is this a
> Jenkins-level failure?
>
> Reuven
>
> On Thu, Aug 2, 2018 at 4:54 PM Mikhail Gryzykhin 
> wrote:
>
>> Hi everyone,
>>
>> We have Nexmark PostCommit tests fail. Should we disable or fix these?
>>
>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback ?
>>
>


Re: Nexmark PostCommit tests fail

2018-08-02 Thread Reuven Lax
I'm having a hard time understanding what the failure is. I see a bunch of
Jenkins errors in the log. Did the actual Beam pipeline fail, or is this a
Jenkins-level failure?

Reuven

On Thu, Aug 2, 2018 at 4:54 PM Mikhail Gryzykhin  wrote:

> Hi everyone,
>
> We have Nexmark PostCommit tests fail. Should we disable or fix these?
>
> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/
>
> Regards,
> --Mikhail
>
> Have feedback ?
>


Nexmark PostCommit tests fail

2018-08-02 Thread Mikhail Gryzykhin
Hi everyone,

We have Nexmark PostCommit tests fail. Should we disable or fix these?

https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/

Regards,
--Mikhail

Have feedback ?


[Call for items] Beam August Newsletter

2018-08-02 Thread Rose Nguyen
Hi all:

Here's

[1]
the template for the August Beam Newsletter!

*Add the highlights from June and July that you want to share with
the community by 8/8 11:59 p.m. **PDT.*

I'm working with Gris--we've heard your requests and will collect the notes
via Google docs but send out the final version directly to the user mailing
list. I'll edit and send the newsletter on 8/10.

Thanks!

[1]
https://docs.google.com/document/d/124klHcJcIi_gD6rvMXwbbToINl1KTdXPmYrgGB998FQ/edit
-- 


Rose Thi Nguyen

  Technical Writer

(281) 683-6900


Re: Parallelizing test runs

2018-08-02 Thread Mikhail Gryzykhin
I've disabled concurrency for auto-triggered post-commits job. That should
reduce job scheduling considerably.

I believe that this change should resolve quota issue we have seen this
time. I'll monitor if problem reappears.

--Mikhail

Have feedback ?


On Wed, Aug 1, 2018 at 9:40 AM Pablo Estrada  wrote:

> It feels to me like a peak of 60 jobs per minute is pretty high. If I
> understand correctly, we run up to 20 dataflow jobs in parallel per test
> suite? Or what's the number here?
>
> It is also true that most our tests are simple NeedsRunner tests, that
> test a couple elements, so the whole pipeline overhead is on startup. This
> may be improved by lumping tests together (though might we lose
> debuggability?).  Our average number of jobs is, I hope, muuuch smaller
> than 60 per minute...
>
> With all these considerations, I would lean more towards having a retry
> policy as the immediate solution.
> -P.
>
> On Wed, Aug 1, 2018 at 9:07 AM Andrew Pilloud  wrote:
>
>> I like 1 and 2. How do credentials get into Jenkins? Could we create a
>> user per Jenkins host?
>>
>> On Tue, Jul 31, 2018 at 4:33 PM Reuven Lax  wrote:
>>
>>> There was also a proposal to lump multiple tests into a single Dataflow
>>> job instead of spinning up a separate Dataflow job for each test.
>>>
>>> On Tue, Jul 31, 2018 at 4:26 PM Mikhail Gryzykhin 
>>> wrote:
>>>
 I synced with Rafael. Below is summary of discussion.

 This quota is CreateRequestsPerMinutePerUser and it has 60 requests per
 user by default.

 I've created Jira [BEAM-5053](
 https://issues.apache.org/jira/browse/BEAM-5053) for this.

 I see following options we can utilize:
 1. Add retry logic. Although this limits us to 1 dataflow job start per
 second for whole Jenkins. In long scale this can also block one test job if
 other jobs take all the slots.
 2. Utilize different users to spin Dataflow jobs.
 3. Find way to rise quota limit on Dataflow. By default the field
 limits value to 60 requests per minute.
 4. Long run generic suggestion: limit amount of dataflow jobs we spin
 up and move tests to the form of unit or component tests.

 Please, fill in any insights or ideas you have on this.

 Regards,
 --Mikhail

 Have feedback ?


 On Tue, Jul 31, 2018 at 3:55 PM Mikhail Gryzykhin 
 wrote:

> Hi Everyone,
>
> Seems that we hit quota issue again:
> https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/553/consoleFull
>
> Can someone share information on how was this triaged last time or
> guide me on possible follow-up actions?
>
> Regards,
> --Mikhail
>
> Have feedback ?
>
>
> On Tue, Jul 3, 2018 at 9:12 PM Rafael Fernandez 
> wrote:
>
>> Summary for all folks following this story -- and many thanks for
>> explaining configs to me and pointing me to files and such.
>>
>> - Scott made changes to the config and we can now run 3
>> ValidatesRunner.Dataflow in parallel (each run is about 2 hours)
>> - With the latest quota changes, we peaked at ~70% capacity in
>> concurrent Dataflow jobs when running those
>> - I've been keeping an eye on quota peaks for all resources today and
>> have not seen any worryisome limits overall.
>> - Also note there are improvements planned to the
>> ValidatesRunner.Dataflow test so various items get batched and the test
>> itself runs faster -- I believe it's on Alan's radar
>>
>> Cheers,
>> r
>>
>> On Mon, Jul 2, 2018 at 4:23 PM Rafael Fernandez 
>> wrote:
>>
>>> Done!
>>>
>>> On Mon, Jul 2, 2018 at 4:10 PM Scott Wegner 
>>> wrote:
>>>
 Hey Rafael, looks like we need more 'INSTANCE_TEMPLATES' quota [1].
 Can you take a look? I've filed [BEAM-4722]:
 https://issues.apache.org/jira/browse/BEAM-4722

 [1] https://github.com/apache/beam/pull/5861#issuecomment-401963630


 On Mon, Jul 2, 2018 at 11:33 AM Rafael Fernandez <
 rfern...@google.com> wrote:

> OK, Scott just sent https://github.com/apache/beam/pull/5860 .
> Quotas should not be a problem, if they are, please file a JIRA under
> gcp-quota.
>
> Cheers,
> r
>
> On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles 
> wrote:
>
>> One thing that is nice when you do this is to be able to share
>> your results. Though if all you are sharing is "they passed" then I 
>> guess
>> we don't have to insist on evidence.
>>
>> Kenn
>>
>> On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner 
>> wrote:
>>
>>> A few thoughts:
>>>
>>> * The Jenkins job getting backed up
>>> is 

Re: Live coding & reviewing adventures

2018-08-02 Thread Holden Karau
Ok Gris has an even more delayed laptop so I'm going to push it out a week
and hope it shows up for then. Sorry about that one and thanks for everyone
who tuned in for the Go SDK one :)

On Mon, Jul 30, 2018 at 1:54 PM, Holden Karau  wrote:

> So small schedule changes.
> I’ll be doing some poking at the Go SDK at 2pm today -
> https://www.youtube.com/watch?v=9UAu1DOZJhM and the one with Gris setting
> up Beam on a new machine will be moved to Friday because her laptop got
> delayed - https://www.youtube.com/watch?v=x8Wg7qCDA5k
>
> On Tue, Jul 24, 2018 at 8:41 PM Holden Karau  wrote:
>
>> I'll be doing this again this week & next looking at a few different
>> topics.
>>
>> Tomorrow (July 25th @ 10am pacific) Gris & I will be updating the PR from
>> my last live stream (adding Python dependency handling) -
>> https://www.twitch.tv/events/P92irbgYR9Sx6nMQ-lGY3g /
>> https://www.youtube.com/watch?v=4xDsY5QL2zM
>>
>> In the afternoon @ 3 pm pacific I'll be looking at the dev tools we've
>> had some discussions around with respect to reviews -
>> https://www.twitch.tv/events/vNzcZ7DdSuGFNYURW_9WEQ /
>> https://www.youtube.com/watch?v=6cTmC_fP9B0
>>
>> Next week on Thursday August 1st @ 2pm pacific Gris & I will be setting
>> up Beam on her new laptop together, so for any new users looking to see how
>> to install Beam from source this one is for you (or for devs looking to see
>> how painful set up is) - https://www.twitch.tv/events
>> /YAYvNp3tT0COkcpNBxnp6A / https://www.youtube.com/watch?v=x8Wg7qCDA5k
>>
>> P.S.
>>
>> As always I'll be doing my regular Friday code reviews in Spark -
>> https://www.youtube.com/watch?v=O4rRx-3PTiM . You can see the other ones
>> I have planned on my twitch  events
>>  and youtube
>> .
>>
>> On Fri, Jul 13, 2018 at 11:54 AM, Holden Karau 
>> wrote:
>>
>>> Hi folks! I've been doing some live coding in my other projects and I
>>> figured I'd do some with Apache Beam as well.
>>>
>>> Today @ 3pm pacific I'm going be doing some impromptu exploration better
>>> review tooling possibilities (looking at forking spark-pr-dashboard for
>>> other projects like beam and setting up mentionbot to work with ASF infra)
>>> - https://www.youtube.com/watch?v=ff8_jbzC8JI
>>>
>>> Next week (Thursday the 19th at 2pm pacific) I'm going to be working on
>>> trying to get easier dependency management for the Python portable runner
>>> in place - https://www.youtube.com/watch?v=Sv0XhS2pYqA
>>>
>>> If your interested in seeing more of the development process I hope you
>>> will join me :)
>>>
>>> P.S.
>>>
>>> You can also follow on twitch which does a better job of notifications
>>> https://www.twitch.tv/holdenkarau
>>>
>>> Also one of the other thing I do is "live reviews" of PRs but they are
>>> generally opt-in and I don't have enough opt-ins from the Beam community to
>>> do live reviews in Beam, if you work on Beam and would be OK with me doing
>>> a live streamed review of your PRs let me know (if your curious to what
>>> they look like you can see some of them here in Spark land
>>> 
>>> ).
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>


-- 
Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-02 Thread Mingmin Xu
+1
Verified with SQL component.

On Thu, Aug 2, 2018 at 10:05 AM, Thomas Weise  wrote:

> It does include *some* of the portable Flink runner (you will be able to
> run wordcount as documented on https://beam.apache.org/
> contribute/portability/#status).
>
> I would recommend to continue using master though, as we are still not
> fully at MVP, adding test coverage and also Flink version update.
>
>
> On Thu, Aug 2, 2018 at 9:52 AM Suneel Marthi  wrote:
>
>> Does this release include the Portability runner for Flink ? - Sorry I
>> have not read the release notes yet, pardon my asking again.
>>
>> On Wed, Aug 1, 2018 at 7:03 PM, Boyuan Zhang  wrote:
>>
>>> +1
>>> Tested Dataflow related items in: https://s.apache.org/beam-
>>> release-validation
>>>
>>> On Wed, Aug 1, 2018 at 11:40 AM Yifan Zou  wrote:
>>>
 +1
 Tested Python quickstarts and mobile gaming examples against tar and
 wheel versions.
 https://builds.apache.org/job/beam_PostRelease_Python_Candidate/123/

 On Wed, Aug 1, 2018 at 8:27 AM Andrew Pilloud 
 wrote:

> +1 tested the Beam SQL jar from the Maven Central repo, it worked.
>
> On Wed, Aug 1, 2018 at 7:37 AM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> Hi Pablo,
>>
>> +1, tested on my apps and libs and words after some fixed due to some
>> breaking changes in ArgProvider - but guess it is not "public" to need to
>> be reported.
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>>
>> Le mer. 1 août 2018 à 01:50, Pablo Estrada  a
>> écrit :
>>
>>> Hello everyone!
>>>
>>> I have been able to prepare a release candidate for Beam 2.6.0. : D
>>>
>>> Please review and vote on the release candidate #1 for the version
>>> 2.6.0, as follows:
>>>
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> The complete staged set of artifacts is available for your review,
>>> which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> 2F1FEDCDF6DD7990422F482F65224E0292DD8A51 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.6.0-RC1" [5],
>>> * website pull request listing the release and publishing the API
>>> reference manual [6].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>>
>>> Regards
>>> -Pablo.
>>>
>>> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>>> projectId=12319527=12343392
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.6.0/
>>> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
>>> [4] https://repository.apache.org/content/repositories/
>>> orgapachebeam-1044/
>>> [5] https://github.com/apache/beam/tree/v2.6.0-RC1
>>> [6] https://github.com/apache/beam-site/pull/518
>>>
>>> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>
>>


-- 

Mingmin


Re: Removing documentation for old Beam versions

2018-08-02 Thread Thomas Weise
+1 for removing pre 2.0 documentation (as well as the entries from
https://beam.apache.org/get-started/downloads/)

Isn't it part of the beam-site changes that we will no longer check in
generated documentation into the repository? Those can be generated and
deployed independently (when a commit to a branch occurs), such as done in
the Apex and Flink projects.

I was told that Scott who was working in the beam-site changes is on leave
now and the migration is still pending (see note at
https://github.com/apache/beam/tree/master/website). Is anyone else going
to pick it up?

Thanks,
Thomas


On Thu, Aug 2, 2018 at 12:33 PM Pablo Estrada  wrote:

> Is it worth adding a tag / branch to the repositories every time we make a
> release, so that people are able to dive in and find the docs?
> Best
> -P.
>
> On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay  wrote:
>
>> I would guess that users are still using some of these old releases. It
>> is unclear from Beam website which releases are still supported or not. It
>> probably makes sense to drop documentation for releases < 2.0. (I would
>> suggest keeping docs for 2.0). For the future I can work on updating the
>> Beam website to clarify the state of each release.
>>
>> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri  wrote:
>>
>>> The older docs are not directly linked to and are in Github commit
>>> history.
>>>
>>> If there are no objections I'm going to delete javadocs and pydocs for
>>> releases older than 1 year,
>>> meaning 2.0.0 and older (going by the dates here
>>> ).
>>>
>>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira 
>>> wrote:
>>>
 The older docs should be recorded in the commit history of the website
 repository, right? If they're not currently used in the website and they're
 in the commit history then I don't see a reason to save them.

 On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:

> Hi all,
> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage
> is timing out after 100 minutes, because it's trying to deletes 22k files
> and then copy 22k files (warning large file
> 
> ).
>
> It seems that we could save a lot of time by deleting the older
> javadoc and pydoc files for older versions. Is there a good reason to keep
> around this kind of documentation for older versions (say 1 year back)?
>

>> --
> Got feedback? go/pabloem-feedback
>


Re: Removing documentation for old Beam versions

2018-08-02 Thread Pablo Estrada
Is it worth adding a tag / branch to the repositories every time we make a
release, so that people are able to dive in and find the docs?
Best
-P.

On Thu, Aug 2, 2018 at 12:09 PM Ahmet Altay  wrote:

> I would guess that users are still using some of these old releases. It is
> unclear from Beam website which releases are still supported or not. It
> probably makes sense to drop documentation for releases < 2.0. (I would
> suggest keeping docs for 2.0). For the future I can work on updating the
> Beam website to clarify the state of each release.
>
> On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri  wrote:
>
>> The older docs are not directly linked to and are in Github commit
>> history.
>>
>> If there are no objections I'm going to delete javadocs and pydocs for
>> releases older than 1 year,
>> meaning 2.0.0 and older (going by the dates here
>> ).
>>
>> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira 
>> wrote:
>>
>>> The older docs should be recorded in the commit history of the website
>>> repository, right? If they're not currently used in the website and they're
>>> in the commit history then I don't see a reason to save them.
>>>
>>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:
>>>
 Hi all,
 I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage
 is timing out after 100 minutes, because it's trying to deletes 22k files
 and then copy 22k files (warning large file
 
 ).

 It seems that we could save a lot of time by deleting the older javadoc
 and pydoc files for older versions. Is there a good reason to keep around
 this kind of documentation for older versions (say 1 year back)?

>>>
> --
Got feedback? go/pabloem-feedback


Re: Removing documentation for old Beam versions

2018-08-02 Thread Ahmet Altay
I would guess that users are still using some of these old releases. It is
unclear from Beam website which releases are still supported or not. It
probably makes sense to drop documentation for releases < 2.0. (I would
suggest keeping docs for 2.0). For the future I can work on updating the
Beam website to clarify the state of each release.

On Thu, Aug 2, 2018 at 12:06 PM, Udi Meiri  wrote:

> The older docs are not directly linked to and are in Github commit history.
>
> If there are no objections I'm going to delete javadocs and pydocs for
> releases older than 1 year,
> meaning 2.0.0 and older (going by the dates here
> ).
>
> On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira 
> wrote:
>
>> The older docs should be recorded in the commit history of the website
>> repository, right? If they're not currently used in the website and they're
>> in the commit history then I don't see a reason to save them.
>>
>> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:
>>
>>> Hi all,
>>> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage
>>> is timing out after 100 minutes, because it's trying to deletes 22k files
>>> and then copy 22k files (warning large file
>>> 
>>> ).
>>>
>>> It seems that we could save a lot of time by deleting the older javadoc
>>> and pydoc files for older versions. Is there a good reason to keep around
>>> this kind of documentation for older versions (say 1 year back)?
>>>
>>


Re: Removing documentation for old Beam versions

2018-08-02 Thread Udi Meiri
The older docs are not directly linked to and are in Github commit history.

If there are no objections I'm going to delete javadocs and pydocs for
releases older than 1 year,
meaning 2.0.0 and older (going by the dates here
).

On Thu, Aug 2, 2018 at 11:51 AM Daniel Oliveira 
wrote:

> The older docs should be recorded in the commit history of the website
> repository, right? If they're not currently used in the website and they're
> in the commit history then I don't see a reason to save them.
>
> On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:
>
>> Hi all,
>> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage is
>> timing out after 100 minutes, because it's trying to deletes 22k files and
>> then copy 22k files (warning large file
>> 
>> ).
>>
>> It seems that we could save a lot of time by deleting the older javadoc
>> and pydoc files for older versions. Is there a good reason to keep around
>> this kind of documentation for older versions (say 1 year back)?
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Removing documentation for old Beam versions

2018-08-02 Thread Daniel Oliveira
The older docs should be recorded in the commit history of the website
repository, right? If they're not currently used in the website and they're
in the commit history then I don't see a reason to save them.

On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:

> Hi all,
> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage is
> timing out after 100 minutes, because it's trying to deletes 22k files and
> then copy 22k files (warning large file
> 
> ).
>
> It seems that we could save a lot of time by deleting the older javadoc
> and pydoc files for older versions. Is there a good reason to keep around
> this kind of documentation for older versions (say 1 year back)?
>


Re: Community Examples Repository

2018-08-02 Thread Ankur Goenka
I like he initiative but I feel that fragmenting the codebase will make it
harder to discover examples. Having examples in a separate repo makes it
easier to forget that examples should get the same love as the rest of the
codebase.
The other challenge is the tooling and integration which is harder with
multiple repo.
It makes sense to isolate the examples and make them more obvious.
A sub project of examples as mentioned in the discussion might be
sufficient without having much overhead.

Thanks,
Ankur


On Thu, Aug 2, 2018 at 10:52 AM Kai Jiang  wrote:

> Agreed with Rui. We could also add more SQL examples (like, different IOs
> ) for everyone to get started with.
>
> Best,
> Kai
>
> On 2018/08/02 17:40:32, Rui Wang  wrote:
> > I might miss it: are examples to be moved including those which are not
> > under example/? For example there are some BeamSQL examples in
> > org/apache/beam/sdk/extensions/sql/example
> > <
> https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example
> >
> > .
> >
> >
> > It's better to keep BeamSQL examples in where it is because related API
> > might still change.
> >
> > -Rui
> >
> > On Thu, Aug 2, 2018 at 8:58 AM Ahmet Altay  wrote:
> >
> > > Robert, I agree with you in general. However there is also a second
> > > motivation. There is an increase in new PRs that are coming to add new
> > > examples. This is great however the core code (including
> distributions) is
> > > not a great place to host such examples. An examples repo would help in
> > > this case. It could also serve as an entry point for new contributors.
> > >
> > >
> > >
> > > On Thu, Aug 2, 2018 at 12:40 AM, Robert Bradshaw 
> > > wrote:
> > >
> > >> I have to admit I'm generally -1 on moving examples to a separate
> > >> repository. In particular, I think it would actually inhibit the
> > >> stated goals of increasing visibility and better keeping them up to
> > >> date, and for all the reasons we just migrated the beam-site directory
> > >> in. It seems the primary motivation is that it's difficult in Java to
> > >> have a portion of the repo that depends on another as if it were
> > >> "external" (i.e. the way others would use Beam) rather than being a
> > >> sub-project of Beam. Is this not doable?
> > >> On Wed, Aug 1, 2018 at 10:59 PM Charles Chen  wrote:
> > >> >
> > >> > I would also prefer that examples be linked to releases so that we
> can
> > >> build and test them during development; i.e. if your commit breaks
> > >> wordcount, we want to know right away so we can revert.  Perhaps we
> can
> > >> keep these in the repo but more clearly modularize the artifacts we
> release?
> > >> >
> > >> > For the Python SDK, if we separate this out in any way, there is the
> > >> separate issue of dealing with namespace packages (which are kind of
> broken
> > >> and poorly supported:
> > >> https://github.com/pypa/python-packaging-user-guide/issues/265), if
> we
> > >> want to keep the examples under the apache_beam.examples module
> path.  See
> > >> also
> https://packaging.python.org/guides/packaging-namespace-packages/.
> > >> >
> > >> > On Wed, Aug 1, 2018 at 9:29 PM j...@nanthrax.net 
> wrote:
> > >> >>
> > >> >> Hi,
> > >> >>
> > >> >> I don't have problem to move the examples in a dedicated
> repository.
> > >> However, IMHO, we have to:
> > >> >>
> > >> >> 1. Keep a build of examples linked to latest core release/SNAPSHOT
> > >> >> 2. Include the examples in the distribution (convenient for the
> users)
> > >> >>
> > >> >> On another topic, I think it would be better to avoid usage of
> Google
> > >> Doc for such kind of discussion and directly share on the mailing
> list (at
> > >> least a summary/light details).
> > >> >>
> > >> >> Regards
> > >> >> JB
> > >> >>
> > >> >> On Thursday, August 02, 2018 00:12 CEST, David Cavazos <
> > >> dcava...@google.com> wrote:
> > >> >>
> > >> >>
> > >> >> Hi everyone!
> > >> >>
> > >> >> We wanted to migrate the examples from the core repository to a new
> > >> Beam community examples repository. As the number of examples grow, it
> > >> makes sense to modularize and decouple the core functionality from the
> > >> examples.
> > >> >>
> > >> >> We will also create some guidelines with the best practices for new
> > >> examples to be submitted.
> > >> >>
> > >> >> For more details, feel free to take a look and comment on the
> proposal.
> > >> >>
> > >> >> Cheers,
> > >> >> David
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >>
> > >
> > >
> >
>


Re: Community Examples Repository

2018-08-02 Thread Kai Jiang
Agreed with Rui. We could also add more SQL examples (like, different IOs ) for 
everyone to get started with.

Best,
Kai

On 2018/08/02 17:40:32, Rui Wang  wrote: 
> I might miss it: are examples to be moved including those which are not
> under example/? For example there are some BeamSQL examples in
> org/apache/beam/sdk/extensions/sql/example
> 
> .
> 
> 
> It's better to keep BeamSQL examples in where it is because related API
> might still change.
> 
> -Rui
> 
> On Thu, Aug 2, 2018 at 8:58 AM Ahmet Altay  wrote:
> 
> > Robert, I agree with you in general. However there is also a second
> > motivation. There is an increase in new PRs that are coming to add new
> > examples. This is great however the core code (including distributions) is
> > not a great place to host such examples. An examples repo would help in
> > this case. It could also serve as an entry point for new contributors.
> >
> >
> >
> > On Thu, Aug 2, 2018 at 12:40 AM, Robert Bradshaw 
> > wrote:
> >
> >> I have to admit I'm generally -1 on moving examples to a separate
> >> repository. In particular, I think it would actually inhibit the
> >> stated goals of increasing visibility and better keeping them up to
> >> date, and for all the reasons we just migrated the beam-site directory
> >> in. It seems the primary motivation is that it's difficult in Java to
> >> have a portion of the repo that depends on another as if it were
> >> "external" (i.e. the way others would use Beam) rather than being a
> >> sub-project of Beam. Is this not doable?
> >> On Wed, Aug 1, 2018 at 10:59 PM Charles Chen  wrote:
> >> >
> >> > I would also prefer that examples be linked to releases so that we can
> >> build and test them during development; i.e. if your commit breaks
> >> wordcount, we want to know right away so we can revert.  Perhaps we can
> >> keep these in the repo but more clearly modularize the artifacts we 
> >> release?
> >> >
> >> > For the Python SDK, if we separate this out in any way, there is the
> >> separate issue of dealing with namespace packages (which are kind of broken
> >> and poorly supported:
> >> https://github.com/pypa/python-packaging-user-guide/issues/265), if we
> >> want to keep the examples under the apache_beam.examples module path.  See
> >> also https://packaging.python.org/guides/packaging-namespace-packages/.
> >> >
> >> > On Wed, Aug 1, 2018 at 9:29 PM j...@nanthrax.net  
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I don't have problem to move the examples in a dedicated repository.
> >> However, IMHO, we have to:
> >> >>
> >> >> 1. Keep a build of examples linked to latest core release/SNAPSHOT
> >> >> 2. Include the examples in the distribution (convenient for the users)
> >> >>
> >> >> On another topic, I think it would be better to avoid usage of Google
> >> Doc for such kind of discussion and directly share on the mailing list (at
> >> least a summary/light details).
> >> >>
> >> >> Regards
> >> >> JB
> >> >>
> >> >> On Thursday, August 02, 2018 00:12 CEST, David Cavazos <
> >> dcava...@google.com> wrote:
> >> >>
> >> >>
> >> >> Hi everyone!
> >> >>
> >> >> We wanted to migrate the examples from the core repository to a new
> >> Beam community examples repository. As the number of examples grow, it
> >> makes sense to modularize and decouple the core functionality from the
> >> examples.
> >> >>
> >> >> We will also create some guidelines with the best practices for new
> >> examples to be submitted.
> >> >>
> >> >> For more details, feel free to take a look and comment on the proposal.
> >> >>
> >> >> Cheers,
> >> >> David
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> >
> >
> 


Re: Community Examples Repository

2018-08-02 Thread Rui Wang
I might miss it: are examples to be moved including those which are not
under example/? For example there are some BeamSQL examples in
org/apache/beam/sdk/extensions/sql/example

.


It's better to keep BeamSQL examples in where it is because related API
might still change.

-Rui

On Thu, Aug 2, 2018 at 8:58 AM Ahmet Altay  wrote:

> Robert, I agree with you in general. However there is also a second
> motivation. There is an increase in new PRs that are coming to add new
> examples. This is great however the core code (including distributions) is
> not a great place to host such examples. An examples repo would help in
> this case. It could also serve as an entry point for new contributors.
>
>
>
> On Thu, Aug 2, 2018 at 12:40 AM, Robert Bradshaw 
> wrote:
>
>> I have to admit I'm generally -1 on moving examples to a separate
>> repository. In particular, I think it would actually inhibit the
>> stated goals of increasing visibility and better keeping them up to
>> date, and for all the reasons we just migrated the beam-site directory
>> in. It seems the primary motivation is that it's difficult in Java to
>> have a portion of the repo that depends on another as if it were
>> "external" (i.e. the way others would use Beam) rather than being a
>> sub-project of Beam. Is this not doable?
>> On Wed, Aug 1, 2018 at 10:59 PM Charles Chen  wrote:
>> >
>> > I would also prefer that examples be linked to releases so that we can
>> build and test them during development; i.e. if your commit breaks
>> wordcount, we want to know right away so we can revert.  Perhaps we can
>> keep these in the repo but more clearly modularize the artifacts we release?
>> >
>> > For the Python SDK, if we separate this out in any way, there is the
>> separate issue of dealing with namespace packages (which are kind of broken
>> and poorly supported:
>> https://github.com/pypa/python-packaging-user-guide/issues/265), if we
>> want to keep the examples under the apache_beam.examples module path.  See
>> also https://packaging.python.org/guides/packaging-namespace-packages/.
>> >
>> > On Wed, Aug 1, 2018 at 9:29 PM j...@nanthrax.net  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I don't have problem to move the examples in a dedicated repository.
>> However, IMHO, we have to:
>> >>
>> >> 1. Keep a build of examples linked to latest core release/SNAPSHOT
>> >> 2. Include the examples in the distribution (convenient for the users)
>> >>
>> >> On another topic, I think it would be better to avoid usage of Google
>> Doc for such kind of discussion and directly share on the mailing list (at
>> least a summary/light details).
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On Thursday, August 02, 2018 00:12 CEST, David Cavazos <
>> dcava...@google.com> wrote:
>> >>
>> >>
>> >> Hi everyone!
>> >>
>> >> We wanted to migrate the examples from the core repository to a new
>> Beam community examples repository. As the number of examples grow, it
>> makes sense to modularize and decouple the core functionality from the
>> examples.
>> >>
>> >> We will also create some guidelines with the best practices for new
>> examples to be submitted.
>> >>
>> >> For more details, feel free to take a look and comment on the proposal.
>> >>
>> >> Cheers,
>> >> David
>> >>
>> >>
>> >>
>> >>
>> >>
>>
>
>


Re: Cleanup resources on pipeline cancelation

2018-08-02 Thread Reuven Lax
Romain is correct, you would need some global reference counting here to
use the close() callback. The problem is that the input subscription is a
pipeline-wide resource, it's not a per-reader resource.

On Thu, Aug 2, 2018 at 10:07 AM Romain Manni-Bucau 
wrote:

>
>
> Le jeu. 2 août 2018 18:32, Andrew Pilloud  a écrit :
>
>> The subscriptions I want to clean up are ones that are implicitly created
>> by the PubsubIO. These subscriptions are created then leaked, they aren't
>> reused in future pipelines so the data loss issues are moot here. I agree
>> that we don't want to tear down user supplied subscriptions.
>>
>> I've been doing some more digging, it looks like the Source.Reader
>> interface has a close() callback
>> .
>> Is that a place I might be able to do cleanup? (It appears this is hooked
>> up to RichFunction.close() callback on Flink and called from the Direct
>> Runner but possibly not called from other runners.)
>>
>
>
> It is after the parallelization (you can have N>1 readers in parallel) so
> if you have some global reference counting to cleanup once yes, otherwise
> it will be hard.
>
>
>> Andrew
>>
>> On Thu, Aug 2, 2018 at 1:07 AM Reuven Lax  wrote:
>>
>>> Actually I think SDF is the right way to fix this. The SDF can set a
>>> timer at infinity (which will only fires when the pipeline shuts down). I
>>> believe that SDF support is being added to the portability layer now, so
>>> eventually all portable runners will support it, and maybe we can live with
>>> the status quo until then.
>>>
>>> On Wed, Aug 1, 2018 at 9:59 PM Romain Manni-Bucau 
>>> wrote:
>>>
 I agree Reuven. But leaking in a source doesnt give any guarantee
 regarding the execution since it will depends the runner and current API
 will not provide you that feature. Using a reference counting state can
 work better but would require a sdf migration (and will hit runner support
 issues :().


 Le jeu. 2 août 2018 05:39, Reuven Lax  a écrit :

> Hi Romain,
>
> Andrew's example actually wouldn't work for that. With Google Cloud
> Pub/Sub (the example source he referenced), if there is no subscription to
> a topic, all publishes to that topic are dropped on the floor; if you 
> don't
> want to lose data, your are expected to keep the subscription around
> continuously. In this example, leaking a subscription is probably
> preferable to losing date (especially since Pub/Sub itself garbage 
> collects
> subscriptions that have been inactive for a long time).
>
> The answer might be that Beam does not have a good lifecycle story
> here, and something needs to be built.
>
> Reuven
>
> On Tue, Jul 31, 2018 at 10:04 PM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> Hi Andrew,
>>
>> IIRC sources should clean up their resources per method since they
>> dont have a better lifecycle. Readers can create anything longer and
>> release it at close time.
>>
>>
>> Le mer. 1 août 2018 00:31, Andrew Pilloud  a
>> écrit :
>>
>>> Some of our IOs create external resources that need to be cleaned up
>>> when a pipeline is terminated. It looks like the
>>> org.apache.beam.sdk.io.UnboundedSource interface is called on creation, 
>>> but
>>> there is no call for cleanup. For example, PubsubIO creates a
>>> Pubsub subcription in createReader()/split() and it should be deleted at
>>> shutdown. Does anyone have ideas on how I might make this happen?
>>>
>>> (I filed https://issues.apache.org/jira/browse/BEAM-5051 tracking
>>> the PubSub specific issue.)
>>>
>>> Andrew
>>>
>>


Re: Cleanup resources on pipeline cancelation

2018-08-02 Thread Romain Manni-Bucau
Le jeu. 2 août 2018 18:32, Andrew Pilloud  a écrit :

> The subscriptions I want to clean up are ones that are implicitly created
> by the PubsubIO. These subscriptions are created then leaked, they aren't
> reused in future pipelines so the data loss issues are moot here. I agree
> that we don't want to tear down user supplied subscriptions.
>
> I've been doing some more digging, it looks like the Source.Reader
> interface has a close() callback
> .
> Is that a place I might be able to do cleanup? (It appears this is hooked
> up to RichFunction.close() callback on Flink and called from the Direct
> Runner but possibly not called from other runners.)
>


It is after the parallelization (you can have N>1 readers in parallel) so
if you have some global reference counting to cleanup once yes, otherwise
it will be hard.


> Andrew
>
> On Thu, Aug 2, 2018 at 1:07 AM Reuven Lax  wrote:
>
>> Actually I think SDF is the right way to fix this. The SDF can set a
>> timer at infinity (which will only fires when the pipeline shuts down). I
>> believe that SDF support is being added to the portability layer now, so
>> eventually all portable runners will support it, and maybe we can live with
>> the status quo until then.
>>
>> On Wed, Aug 1, 2018 at 9:59 PM Romain Manni-Bucau 
>> wrote:
>>
>>> I agree Reuven. But leaking in a source doesnt give any guarantee
>>> regarding the execution since it will depends the runner and current API
>>> will not provide you that feature. Using a reference counting state can
>>> work better but would require a sdf migration (and will hit runner support
>>> issues :().
>>>
>>>
>>> Le jeu. 2 août 2018 05:39, Reuven Lax  a écrit :
>>>
 Hi Romain,

 Andrew's example actually wouldn't work for that. With Google Cloud
 Pub/Sub (the example source he referenced), if there is no subscription to
 a topic, all publishes to that topic are dropped on the floor; if you don't
 want to lose data, your are expected to keep the subscription around
 continuously. In this example, leaking a subscription is probably
 preferable to losing date (especially since Pub/Sub itself garbage collects
 subscriptions that have been inactive for a long time).

 The answer might be that Beam does not have a good lifecycle story
 here, and something needs to be built.

 Reuven

 On Tue, Jul 31, 2018 at 10:04 PM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Hi Andrew,
>
> IIRC sources should clean up their resources per method since they
> dont have a better lifecycle. Readers can create anything longer and
> release it at close time.
>
>
> Le mer. 1 août 2018 00:31, Andrew Pilloud  a
> écrit :
>
>> Some of our IOs create external resources that need to be cleaned up
>> when a pipeline is terminated. It looks like the
>> org.apache.beam.sdk.io.UnboundedSource interface is called on creation, 
>> but
>> there is no call for cleanup. For example, PubsubIO creates a Pubsub
>> subcription in createReader()/split() and it should be deleted at 
>> shutdown.
>> Does anyone have ideas on how I might make this happen?
>>
>> (I filed https://issues.apache.org/jira/browse/BEAM-5051 tracking
>> the PubSub specific issue.)
>>
>> Andrew
>>
>


Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-02 Thread Thomas Weise
It does include *some* of the portable Flink runner (you will be able to
run wordcount as documented on
https://beam.apache.org/contribute/portability/#status).

I would recommend to continue using master though, as we are still not
fully at MVP, adding test coverage and also Flink version update.


On Thu, Aug 2, 2018 at 9:52 AM Suneel Marthi  wrote:

> Does this release include the Portability runner for Flink ? - Sorry I
> have not read the release notes yet, pardon my asking again.
>
> On Wed, Aug 1, 2018 at 7:03 PM, Boyuan Zhang  wrote:
>
>> +1
>> Tested Dataflow related items in:
>> https://s.apache.org/beam-release-validation
>>
>> On Wed, Aug 1, 2018 at 11:40 AM Yifan Zou  wrote:
>>
>>> +1
>>> Tested Python quickstarts and mobile gaming examples against tar and
>>> wheel versions.
>>> https://builds.apache.org/job/beam_PostRelease_Python_Candidate/123/
>>>
>>> On Wed, Aug 1, 2018 at 8:27 AM Andrew Pilloud 
>>> wrote:
>>>
 +1 tested the Beam SQL jar from the Maven Central repo, it worked.

 On Wed, Aug 1, 2018 at 7:37 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Hi Pablo,
>
> +1, tested on my apps and libs and words after some fixed due to some
> breaking changes in ArgProvider - but guess it is not "public" to need to
> be reported.
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
>
> Le mer. 1 août 2018 à 01:50, Pablo Estrada  a
> écrit :
>
>> Hello everyone!
>>
>> I have been able to prepare a release candidate for Beam 2.6.0. : D
>>
>> Please review and vote on the release candidate #1 for the version
>> 2.6.0, as follows:
>>
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> The complete staged set of artifacts is available for your review,
>> which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> 2F1FEDCDF6DD7990422F482F65224E0292DD8A51 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.6.0-RC1" [5],
>> * website pull request listing the release and publishing the API
>> reference manual [6].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>>
>> The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>>
>> Regards
>> -Pablo.
>>
>> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12343392
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.6.0/
>> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1044/
>> [5] https://github.com/apache/beam/tree/v2.6.0-RC1
>> [6] https://github.com/apache/beam-site/pull/518
>>
>> --
>> Got feedback? go/pabloem-feedback
>> 
>>
>
>


Re: Cleanup resources on pipeline cancelation

2018-08-02 Thread Andrew Pilloud
The subscriptions I want to clean up are ones that are implicitly created
by the PubsubIO. These subscriptions are created then leaked, they aren't
reused in future pipelines so the data loss issues are moot here. I agree
that we don't want to tear down user supplied subscriptions.

I've been doing some more digging, it looks like the Source.Reader
interface has a close() callback
.
Is that a place I might be able to do cleanup? (It appears this is hooked
up to RichFunction.close() callback on Flink and called from the Direct
Runner but possibly not called from other runners.)

Andrew

On Thu, Aug 2, 2018 at 1:07 AM Reuven Lax  wrote:

> Actually I think SDF is the right way to fix this. The SDF can set a timer
> at infinity (which will only fires when the pipeline shuts down). I believe
> that SDF support is being added to the portability layer now, so eventually
> all portable runners will support it, and maybe we can live with the status
> quo until then.
>
> On Wed, Aug 1, 2018 at 9:59 PM Romain Manni-Bucau 
> wrote:
>
>> I agree Reuven. But leaking in a source doesnt give any guarantee
>> regarding the execution since it will depends the runner and current API
>> will not provide you that feature. Using a reference counting state can
>> work better but would require a sdf migration (and will hit runner support
>> issues :().
>>
>>
>> Le jeu. 2 août 2018 05:39, Reuven Lax  a écrit :
>>
>>> Hi Romain,
>>>
>>> Andrew's example actually wouldn't work for that. With Google Cloud
>>> Pub/Sub (the example source he referenced), if there is no subscription to
>>> a topic, all publishes to that topic are dropped on the floor; if you don't
>>> want to lose data, your are expected to keep the subscription around
>>> continuously. In this example, leaking a subscription is probably
>>> preferable to losing date (especially since Pub/Sub itself garbage collects
>>> subscriptions that have been inactive for a long time).
>>>
>>> The answer might be that Beam does not have a good lifecycle story here,
>>> and something needs to be built.
>>>
>>> Reuven
>>>
>>> On Tue, Jul 31, 2018 at 10:04 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Hi Andrew,

 IIRC sources should clean up their resources per method since they dont
 have a better lifecycle. Readers can create anything longer and release it
 at close time.


 Le mer. 1 août 2018 00:31, Andrew Pilloud  a
 écrit :

> Some of our IOs create external resources that need to be cleaned up
> when a pipeline is terminated. It looks like the
> org.apache.beam.sdk.io.UnboundedSource interface is called on creation, 
> but
> there is no call for cleanup. For example, PubsubIO creates a Pubsub
> subcription in createReader()/split() and it should be deleted at 
> shutdown.
> Does anyone have ideas on how I might make this happen?
>
> (I filed https://issues.apache.org/jira/browse/BEAM-5051 tracking the
> PubSub specific issue.)
>
> Andrew
>



Re: Community Examples Repository

2018-08-02 Thread Ahmet Altay
Robert, I agree with you in general. However there is also a second
motivation. There is an increase in new PRs that are coming to add new
examples. This is great however the core code (including distributions) is
not a great place to host such examples. An examples repo would help in
this case. It could also serve as an entry point for new contributors.



On Thu, Aug 2, 2018 at 12:40 AM, Robert Bradshaw 
wrote:

> I have to admit I'm generally -1 on moving examples to a separate
> repository. In particular, I think it would actually inhibit the
> stated goals of increasing visibility and better keeping them up to
> date, and for all the reasons we just migrated the beam-site directory
> in. It seems the primary motivation is that it's difficult in Java to
> have a portion of the repo that depends on another as if it were
> "external" (i.e. the way others would use Beam) rather than being a
> sub-project of Beam. Is this not doable?
> On Wed, Aug 1, 2018 at 10:59 PM Charles Chen  wrote:
> >
> > I would also prefer that examples be linked to releases so that we can
> build and test them during development; i.e. if your commit breaks
> wordcount, we want to know right away so we can revert.  Perhaps we can
> keep these in the repo but more clearly modularize the artifacts we release?
> >
> > For the Python SDK, if we separate this out in any way, there is the
> separate issue of dealing with namespace packages (which are kind of broken
> and poorly supported: https://github.com/pypa/python-packaging-user-guide/
> issues/265), if we want to keep the examples under the
> apache_beam.examples module path.  See also https://packaging.python.org/
> guides/packaging-namespace-packages/.
> >
> > On Wed, Aug 1, 2018 at 9:29 PM j...@nanthrax.net  wrote:
> >>
> >> Hi,
> >>
> >> I don't have problem to move the examples in a dedicated repository.
> However, IMHO, we have to:
> >>
> >> 1. Keep a build of examples linked to latest core release/SNAPSHOT
> >> 2. Include the examples in the distribution (convenient for the users)
> >>
> >> On another topic, I think it would be better to avoid usage of Google
> Doc for such kind of discussion and directly share on the mailing list (at
> least a summary/light details).
> >>
> >> Regards
> >> JB
> >>
> >> On Thursday, August 02, 2018 00:12 CEST, David Cavazos <
> dcava...@google.com> wrote:
> >>
> >>
> >> Hi everyone!
> >>
> >> We wanted to migrate the examples from the core repository to a new
> Beam community examples repository. As the number of examples grow, it
> makes sense to modularize and decouple the core functionality from the
> examples.
> >>
> >> We will also create some guidelines with the best practices for new
> examples to be submitted.
> >>
> >> For more details, feel free to take a look and comment on the proposal.
> >>
> >> Cheers,
> >> David
> >>
> >>
> >>
> >>
> >>
>


Re: Cleanup resources on pipeline cancelation

2018-08-02 Thread Reuven Lax
Actually I think SDF is the right way to fix this. The SDF can set a timer
at infinity (which will only fires when the pipeline shuts down). I believe
that SDF support is being added to the portability layer now, so eventually
all portable runners will support it, and maybe we can live with the status
quo until then.

On Wed, Aug 1, 2018 at 9:59 PM Romain Manni-Bucau 
wrote:

> I agree Reuven. But leaking in a source doesnt give any guarantee
> regarding the execution since it will depends the runner and current API
> will not provide you that feature. Using a reference counting state can
> work better but would require a sdf migration (and will hit runner support
> issues :().
>
>
> Le jeu. 2 août 2018 05:39, Reuven Lax  a écrit :
>
>> Hi Romain,
>>
>> Andrew's example actually wouldn't work for that. With Google Cloud
>> Pub/Sub (the example source he referenced), if there is no subscription to
>> a topic, all publishes to that topic are dropped on the floor; if you don't
>> want to lose data, your are expected to keep the subscription around
>> continuously. In this example, leaking a subscription is probably
>> preferable to losing date (especially since Pub/Sub itself garbage collects
>> subscriptions that have been inactive for a long time).
>>
>> The answer might be that Beam does not have a good lifecycle story here,
>> and something needs to be built.
>>
>> Reuven
>>
>> On Tue, Jul 31, 2018 at 10:04 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Hi Andrew,
>>>
>>> IIRC sources should clean up their resources per method since they dont
>>> have a better lifecycle. Readers can create anything longer and release it
>>> at close time.
>>>
>>>
>>> Le mer. 1 août 2018 00:31, Andrew Pilloud  a
>>> écrit :
>>>
 Some of our IOs create external resources that need to be cleaned up
 when a pipeline is terminated. It looks like the
 org.apache.beam.sdk.io.UnboundedSource interface is called on creation, but
 there is no call for cleanup. For example, PubsubIO creates a Pubsub
 subcription in createReader()/split() and it should be deleted at shutdown.
 Does anyone have ideas on how I might make this happen?

 (I filed https://issues.apache.org/jira/browse/BEAM-5051 tracking the
 PubSub specific issue.)

 Andrew

>>>


Re: Community Examples Repository

2018-08-02 Thread Robert Bradshaw
I have to admit I'm generally -1 on moving examples to a separate
repository. In particular, I think it would actually inhibit the
stated goals of increasing visibility and better keeping them up to
date, and for all the reasons we just migrated the beam-site directory
in. It seems the primary motivation is that it's difficult in Java to
have a portion of the repo that depends on another as if it were
"external" (i.e. the way others would use Beam) rather than being a
sub-project of Beam. Is this not doable?
On Wed, Aug 1, 2018 at 10:59 PM Charles Chen  wrote:
>
> I would also prefer that examples be linked to releases so that we can build 
> and test them during development; i.e. if your commit breaks wordcount, we 
> want to know right away so we can revert.  Perhaps we can keep these in the 
> repo but more clearly modularize the artifacts we release?
>
> For the Python SDK, if we separate this out in any way, there is the separate 
> issue of dealing with namespace packages (which are kind of broken and poorly 
> supported: https://github.com/pypa/python-packaging-user-guide/issues/265), 
> if we want to keep the examples under the apache_beam.examples module path.  
> See also https://packaging.python.org/guides/packaging-namespace-packages/.
>
> On Wed, Aug 1, 2018 at 9:29 PM j...@nanthrax.net  wrote:
>>
>> Hi,
>>
>> I don't have problem to move the examples in a dedicated repository. 
>> However, IMHO, we have to:
>>
>> 1. Keep a build of examples linked to latest core release/SNAPSHOT
>> 2. Include the examples in the distribution (convenient for the users)
>>
>> On another topic, I think it would be better to avoid usage of Google Doc 
>> for such kind of discussion and directly share on the mailing list (at least 
>> a summary/light details).
>>
>> Regards
>> JB
>>
>> On Thursday, August 02, 2018 00:12 CEST, David Cavazos  
>> wrote:
>>
>>
>> Hi everyone!
>>
>> We wanted to migrate the examples from the core repository to a new Beam 
>> community examples repository. As the number of examples grow, it makes 
>> sense to modularize and decouple the core functionality from the examples.
>>
>> We will also create some guidelines with the best practices for new examples 
>> to be submitted.
>>
>> For more details, feel free to take a look and comment on the proposal.
>>
>> Cheers,
>> David
>>
>>
>>
>>
>>