Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-26 Thread Robert Bradshaw
+1 (binding), same verification as before.

On Wed, Sep 26, 2018 at 7:36 AM Charles Chen  wrote:

> To clarify, the only difference between RC2 and RC3 is the Python fix
> https://github.com/apache/beam/pull/6494.
>
> This means that the Java validations from RC2 should carry over, though I
> reran validations with RC3 anyway, as detailed on the spreadsheet.
>
> On Wed, Sep 26, 2018 at 12:41 AM Charles Chen  wrote:
>
>> As with before, please add any validation performed to the spreadsheet
>> here:
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1675964688
>>
>> On Wed, Sep 26, 2018 at 12:30 AM Charles Chen  wrote:
>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #3 for the version
>>> 2.7.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint 45C60AAAD115F560 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.7.0-RC3" [5],
>>> * website pull request listing the release and publishing the API
>>> reference manual [6].
>>> * Java artifacts were built with Gradle 4.8 and OpenJDK
>>> 1.8.0_181-8u181-b13-1~deb9u1.
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> Charles
>>>
>>> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12343654
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.7.0
>>> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1048/
>>> [5] https://github.com/apache/beam/tree/v2.7.0-RC3
>>> [6] https://github.com/apache/beam-site/pull/549
>>>
>>


Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #184

2018-09-26 Thread Apache Jenkins Server
See 


Changes:

[pablo] Adding display data to BQ write transform

[qinyeli] Interactive Beam -- read_cache_ids and write_cache_ids

[qinyeli] Interactive Beam -- renaming variables and functions

[qinyeli] Interactive Beam -- fixing PTransform # display issue

[github] typo Python -> Go

[mxm] Add tests for port supplier methods in ServerFactory

[ptomasroos] Fix combine.go to run on dataflow

[github] Fix sliding time windows example documentation

[mergebot] Fix a few typos in beam site docs

[apilloud] [BEAM-5491] Update NightlySnapshot job name

[ehudm] GCSIO: Allow empty object prefix in list_prefix().

[swegner] Upgrade assertj dependecny to latest (3.11.1)

[github] Improved Flatten input error message

[migryz] Add metrics dashboard deployment script and logic

[migryz] Fix rat issues

[aaltay] [BEAM-1251] Upgrade pylint version for py27-lint3 (#6489)

[aaltay] [BEAM-5319] Partially port runners (#6451)

[pablo] Fixing Py2-3 lint issue

[ccy] Revert "[BEAM-4747] mkdirs if they don't exist in localfilesystem

--
[...truncated 23.10 MB...]
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hadoop-input-format/2.8.0-SNAPSHOT/beam-sdks-java-io-hadoop-input-format-2.8.0-20180926.082741-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hadoop-input-format/2.8.0-SNAPSHOT/beam-sdks-java-io-hadoop-input-format-2.8.0-20180926.082741-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

45: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-hbase:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hbase/2.8.0-SNAPSHOT/beam-sdks-java-io-hbase-2.8.0-20180926.082755-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hbase/2.8.0-SNAPSHOT/beam-sdks-java-io-hbase-2.8.0-20180926.082755-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

46: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-hcatalog:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hcatalog/2.8.0-SNAPSHOT/beam-sdks-java-io-hcatalog-2.8.0-20180926.082804-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hcatalog/2.8.0-SNAPSHOT/beam-sdks-java-io-hcatalog-2.8.0-20180926.082804-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

47: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-jdbc:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jdbc/2.8.0-SNAPSHOT/beam-sdks-java-io-jdbc-2.8.0-20180926.082813-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jdbc/2.8.0-SNAPSHOT/beam-sdks-java-io-jdbc-2.8.0-20180926.082813-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

48: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-jms:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repos

Re: Removing documentation for old Beam versions

2018-09-26 Thread Scott Wegner
Alan found the place where website publishing is configured [1], which has
examples of project sites being configured with more than one git root.
This is great for us because it allows us to leave generated
javadocs/pydocs in the beam-site repository and publish website markdown
content from the main repo.

Alan has a PR ready to publish generated HTML in a post-commit job [2].
Once that goes through the last step is to upgrade the publishing config.

[1]
https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
[2] https://github.com/apache/beam/pull/6431

On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner  wrote:

> > We could add a new default branch (master?) and keep all the
> non-generated files (src/) there, and put generated files (content/) in the
> asf-site branch (like we already do).
>
> I'm strongly in favor of having sources in a single repository. We have
> significant process and infrastructure built up for the apache/beam repo
> (for build, PR, CI, release, etc.) that we can take advantage of by putting
> website sources in the same repo. The current beam-site repo PR automation
> is flaky because it was custom-built and not given the same level of
> attention as the main repo.
>
> The caveat to consolidating website sources in the main repo is that it
> incentivizes putting the generated sources branch on the same repo. I've
> documented a few of the reasons in the Appendix of the design doc [1]:
>  - It's easier to maintain a single repository; easily apply existing
> tooling/infrastructure
> - Jenkins tooling for publishing generated HTML may not work cross-repo [2]
>
> My preference is to move forward with the migration of sources to
> apache/beam [master], and website generated HTML to apache/beam [asf-site].
> I like the idea of separating the publishing/hosting of generated
> javadocs/pydocs since they add so much cruft, but it should not hold up the
> migration.
>
> [1]
> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>
> [2]
> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>
> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri  wrote:
>
>> Staying on beam-site SGTM. We could add a new default branch (master?)
>> and keep all the non-generated files (src/) there, and put generated files
>> (content/) in the asf-site branch (like we already do).
>> That way there's no confusion as to which files you should update.
>> (This is of course assuming we still place generated docs in git repos.)
>>
>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise  wrote:
>>
>>> My thought was to leave the asf-site branch in the beam-site repository,
>>> add generated docs to that branch (until we have a better solution), and
>>> have only sources in the beam repo.
>>>
>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>> it would eliminate the need to place generated docs into git repos.
>>>
>>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri  wrote:
>>>
 I believe that beam.apache.org is populated from the asf-site branch
 of the apache/beam-site repo. (gitpubsub:
 https://www.apache.org/dev/project-site.html#intro)
 If we move the markdown-based docs to apache/beam, leave generated
 javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
 then javadoc and pydoc will not get pushed to the website.

 Is there some place where we can push javadoc and pydoc files? Or
 perhaps there an alternative way to push updates to beam.apache.org?
 (not requiring the asf-site branch)

 On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise  wrote:

> Hi Scott,
>
> Thanks for bringing the discussion back here.
>
> I agree that we should separate the changes for hosting of generated
> java/pydocs from the rest of website automation so that we can make the
> switch and fix the contributor headache soon.
>
> But perhaps we can avoid adding 4m lines of generated code to the main
> beam repository (and keep on adding with every release) if we continue to
> serve the site from the old beam-site repo? (I left a comment the doc.)
>
> About trying buildbot, as mentioned earlier I would be happy to help
> with it. I prefer a setup that keeps the docs separate from the web site.
>
> Thomas
>
>
> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner 
> wrote:
>
>> Re-opening this thread as it came up today in the discussion for
>> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
>> Reliability improvements; design doc here:
>> https://s.apache.org/beam-site-automation
>>
>> The current plan is to keep generated javadoc/pydoc sources only on
>> the asf-site branch, which is necessary for the current githubpubsub
>> publishing mechanism. This maintains our current approach, the only 
>>

Re: Removing documentation for old Beam versions

2018-09-26 Thread Robert Bradshaw
I am also definitely in favor of a single repository. Perhaps I'm just
misunderstanding why the generated must be put in a git repository at
all--is it because that's the easiest way to serve them?

On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner  wrote:

> Alan found the place where website publishing is configured [1], which has
> examples of project sites being configured with more than one git root.
> This is great for us because it allows us to leave generated
> javadocs/pydocs in the beam-site repository and publish website markdown
> content from the main repo.
>
> Alan has a PR ready to publish generated HTML in a post-commit job [2].
> Once that goes through the last step is to upgrade the publishing config.
>
> [1]
> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
> [2] https://github.com/apache/beam/pull/6431
>
> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner  wrote:
>
>> > We could add a new default branch (master?) and keep all the
>> non-generated files (src/) there, and put generated files (content/) in the
>> asf-site branch (like we already do).
>>
>> I'm strongly in favor of having sources in a single repository. We have
>> significant process and infrastructure built up for the apache/beam repo
>> (for build, PR, CI, release, etc.) that we can take advantage of by putting
>> website sources in the same repo. The current beam-site repo PR automation
>> is flaky because it was custom-built and not given the same level of
>> attention as the main repo.
>>
>> The caveat to consolidating website sources in the main repo is that it
>> incentivizes putting the generated sources branch on the same repo. I've
>> documented a few of the reasons in the Appendix of the design doc [1]:
>>  - It's easier to maintain a single repository; easily apply existing
>> tooling/infrastructure
>> - Jenkins tooling for publishing generated HTML may not work cross-repo
>> [2]
>>
>> My preference is to move forward with the migration of sources to
>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>> I like the idea of separating the publishing/hosting of generated
>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>> migration.
>>
>> [1]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>
>> [2]
>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>
>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri  wrote:
>>
>>> Staying on beam-site SGTM. We could add a new default branch (master?)
>>> and keep all the non-generated files (src/) there, and put generated files
>>> (content/) in the asf-site branch (like we already do).
>>> That way there's no confusion as to which files you should update.
>>> (This is of course assuming we still place generated docs in git repos.)
>>>
>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise  wrote:
>>>
 My thought was to leave the asf-site branch in the beam-site
 repository, add generated docs to that branch (until we have a better
 solution), and have only sources in the beam repo.

 Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
 it would eliminate the need to place generated docs into git repos.

 On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri  wrote:

> I believe that beam.apache.org is populated from the asf-site branch
> of the apache/beam-site repo. (gitpubsub:
> https://www.apache.org/dev/project-site.html#intro)
> If we move the markdown-based docs to apache/beam, leave generated
> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
> then javadoc and pydoc will not get pushed to the website.
>
> Is there some place where we can push javadoc and pydoc files? Or
> perhaps there an alternative way to push updates to beam.apache.org?
> (not requiring the asf-site branch)
>
> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise  wrote:
>
>> Hi Scott,
>>
>> Thanks for bringing the discussion back here.
>>
>> I agree that we should separate the changes for hosting of generated
>> java/pydocs from the rest of website automation so that we can make the
>> switch and fix the contributor headache soon.
>>
>> But perhaps we can avoid adding 4m lines of generated code to the
>> main beam repository (and keep on adding with every release) if we 
>> continue
>> to serve the site from the old beam-site repo? (I left a comment the 
>> doc.)
>>
>> About trying buildbot, as mentioned earlier I would be happy to help
>> with it. I prefer a setup that keeps the docs separate from the web site.
>>
>> Thomas
>>
>>
>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner 
>> wrote:
>>
>>> Re-opening this thread as it came up today in the discussion for
>>> PR#6458 [1]. This PR i

Re: Removing documentation for old Beam versions

2018-09-26 Thread Scott Wegner
Yes. There are few options for publishing your ASF website, described here:
https://www.apache.org/dev/project-site.html. We can publish from a Git
repo, SVN, or a UI-based CMS interface.

On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw  wrote:

> I am also definitely in favor of a single repository. Perhaps I'm just
> misunderstanding why the generated must be put in a git repository at
> all--is it because that's the easiest way to serve them?
>
> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner  wrote:
>
>> Alan found the place where website publishing is configured [1], which
>> has examples of project sites being configured with more than one git root.
>> This is great for us because it allows us to leave generated
>> javadocs/pydocs in the beam-site repository and publish website markdown
>> content from the main repo.
>>
>> Alan has a PR ready to publish generated HTML in a post-commit job [2].
>> Once that goes through the last step is to upgrade the publishing config.
>>
>> [1]
>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
>> [2] https://github.com/apache/beam/pull/6431
>>
>> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner  wrote:
>>
>>> > We could add a new default branch (master?) and keep all the
>>> non-generated files (src/) there, and put generated files (content/) in the
>>> asf-site branch (like we already do).
>>>
>>> I'm strongly in favor of having sources in a single repository. We have
>>> significant process and infrastructure built up for the apache/beam repo
>>> (for build, PR, CI, release, etc.) that we can take advantage of by putting
>>> website sources in the same repo. The current beam-site repo PR automation
>>> is flaky because it was custom-built and not given the same level of
>>> attention as the main repo.
>>>
>>> The caveat to consolidating website sources in the main repo is that it
>>> incentivizes putting the generated sources branch on the same repo. I've
>>> documented a few of the reasons in the Appendix of the design doc [1]:
>>>  - It's easier to maintain a single repository; easily apply existing
>>> tooling/infrastructure
>>> - Jenkins tooling for publishing generated HTML may not work cross-repo
>>> [2]
>>>
>>> My preference is to move forward with the migration of sources to
>>> apache/beam [master], and website generated HTML to apache/beam [asf-site].
>>> I like the idea of separating the publishing/hosting of generated
>>> javadocs/pydocs since they add so much cruft, but it should not hold up the
>>> migration.
>>>
>>> [1]
>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>>
>>> [2]
>>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>>
>>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri  wrote:
>>>
 Staying on beam-site SGTM. We could add a new default branch (master?)
 and keep all the non-generated files (src/) there, and put generated files
 (content/) in the asf-site branch (like we already do).
 That way there's no confusion as to which files you should update.
 (This is of course assuming we still place generated docs in git repos.)

 On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise  wrote:

> My thought was to leave the asf-site branch in the beam-site
> repository, add generated docs to that branch (until we have a better
> solution), and have only sources in the beam repo.
>
> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
> it would eliminate the need to place generated docs into git repos.
>
> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri  wrote:
>
>> I believe that beam.apache.org is populated from the asf-site branch
>> of the apache/beam-site repo. (gitpubsub:
>> https://www.apache.org/dev/project-site.html#intro)
>> If we move the markdown-based docs to apache/beam, leave generated
>> javadoc and pydoc in apache/beam-site, and point gitpubsub to 
>> apache/beam,
>> then javadoc and pydoc will not get pushed to the website.
>>
>> Is there some place where we can push javadoc and pydoc files? Or
>> perhaps there an alternative way to push updates to beam.apache.org?
>> (not requiring the asf-site branch)
>>
>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise  wrote:
>>
>>> Hi Scott,
>>>
>>> Thanks for bringing the discussion back here.
>>>
>>> I agree that we should separate the changes for hosting of generated
>>> java/pydocs from the rest of website automation so that we can make the
>>> switch and fix the contributor headache soon.
>>>
>>> But perhaps we can avoid adding 4m lines of generated code to the
>>> main beam repository (and keep on adding with every release) if we 
>>> continue
>>> to serve the site from the old beam-site repo? (I left a comment the 
>>> doc.)
>>>
>>> About 

Re: Removing documentation for old Beam versions

2018-09-26 Thread Robert Bradshaw
OK, thanks. That link was very helpful. Of the three options we must use,
checking into git seems preferable than checking into svn let alone the
CMS. Keeping the same repo means that it's harder to generate the docs for
version X while head is checked out.

I'm in favor of moving forward with this in the short term, but we should
expore other options (like Flink has) for the longer term.



On Wed, Sep 26, 2018 at 3:53 PM Scott Wegner  wrote:

> Yes. There are few options for publishing your ASF website, described
> here: https://www.apache.org/dev/project-site.html. We can publish from a
> Git repo, SVN, or a UI-based CMS interface.
>
> On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw 
> wrote:
>
>> I am also definitely in favor of a single repository. Perhaps I'm just
>> misunderstanding why the generated must be put in a git repository at
>> all--is it because that's the easiest way to serve them?
>>
>> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner  wrote:
>>
>>> Alan found the place where website publishing is configured [1], which
>>> has examples of project sites being configured with more than one git root.
>>> This is great for us because it allows us to leave generated
>>> javadocs/pydocs in the beam-site repository and publish website markdown
>>> content from the main repo.
>>>
>>> Alan has a PR ready to publish generated HTML in a post-commit job [2].
>>> Once that goes through the last step is to upgrade the publishing config.
>>>
>>> [1]
>>> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
>>> [2] https://github.com/apache/beam/pull/6431
>>>
>>> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner  wrote:
>>>
 > We could add a new default branch (master?) and keep all the
 non-generated files (src/) there, and put generated files (content/) in the
 asf-site branch (like we already do).

 I'm strongly in favor of having sources in a single repository. We have
 significant process and infrastructure built up for the apache/beam repo
 (for build, PR, CI, release, etc.) that we can take advantage of by putting
 website sources in the same repo. The current beam-site repo PR automation
 is flaky because it was custom-built and not given the same level of
 attention as the main repo.

 The caveat to consolidating website sources in the main repo is that it
 incentivizes putting the generated sources branch on the same repo. I've
 documented a few of the reasons in the Appendix of the design doc [1]:
  - It's easier to maintain a single repository; easily apply existing
 tooling/infrastructure
 - Jenkins tooling for publishing generated HTML may not work cross-repo
 [2]

 My preference is to move forward with the migration of sources to
 apache/beam [master], and website generated HTML to apache/beam [asf-site].
 I like the idea of separating the publishing/hosting of generated
 javadocs/pydocs since they add so much cruft, but it should not hold up the
 migration.

 [1]
 https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc

 [2]
 https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace

 On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri  wrote:

> Staying on beam-site SGTM. We could add a new default branch (master?)
> and keep all the non-generated files (src/) there, and put generated files
> (content/) in the asf-site branch (like we already do).
> That way there's no confusion as to which files you should update.
> (This is of course assuming we still place generated docs in git
> repos.)
>
> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise  wrote:
>
>> My thought was to leave the asf-site branch in the beam-site
>> repository, add generated docs to that branch (until we have a better
>> solution), and have only sources in the beam repo.
>>
>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>> it would eliminate the need to place generated docs into git repos.
>>
>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri  wrote:
>>
>>> I believe that beam.apache.org is populated from the asf-site
>>> branch of the apache/beam-site repo. (gitpubsub:
>>> https://www.apache.org/dev/project-site.html#intro)
>>> If we move the markdown-based docs to apache/beam, leave generated
>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to 
>>> apache/beam,
>>> then javadoc and pydoc will not get pushed to the website.
>>>
>>> Is there some place where we can push javadoc and pydoc files? Or
>>> perhaps there an alternative way to push updates to beam.apache.org?
>>> (not requiring the asf-site branch)
>>>
>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise  wrote:
>>>
 Hi Scott,


Are there plans for removing joda-time from the beam java SDK?

2018-09-26 Thread Jeff Klukas
It looks like there a few spots in the Beam Java API where users have to
provide joda-time objects, such as FixedWindows#of(org.joda.time.Duration).

Are there any plans to support java.time objects in addition to joda
objects? Any plans to eventually remove joda-time?

My personal interest is that my team would like to eventually standardize
on usage of java.time and remove all explicit usage of joda-time in our
codebases. Even if joda-time is still pulled in transitively by the beam
java SDK and used internally, it would be nice for users to be able to
avoid explicit interaction with joda-time. I'm imagining it would be
possible to provide additional methods like
FixedWindows#of(java.time.Duration) and potentially marking the joda-based
variants as deprecated.

Does this seem worthy of opening a JIRA issue?


Re: Removing documentation for old Beam versions

2018-09-26 Thread Thomas Weise
Looks like the is agreement that all sources should be in the main beam
repository, the remaining discussion was where the generated content should
be served from, specifically the generated docs.

If the setup that Alan found allows us to keep using the beam-site
repository for the generated stuff and that does not unreasonably
complicate the CI process, then I'm in favor of that. It looks cleaner to
not mingle source and generated files in the same repo. Otherwise we can do
the asf-site branch in the main repo and get rid of docs from it once we
found a better solution.


On Wed, Sep 26, 2018 at 7:09 AM Robert Bradshaw  wrote:

> OK, thanks. That link was very helpful. Of the three options we must use,
> checking into git seems preferable than checking into svn let alone the
> CMS. Keeping the same repo means that it's harder to generate the docs for
> version X while head is checked out.
>
> I'm in favor of moving forward with this in the short term, but we should
> expore other options (like Flink has) for the longer term.
>
>
>
> On Wed, Sep 26, 2018 at 3:53 PM Scott Wegner  wrote:
>
>> Yes. There are few options for publishing your ASF website, described
>> here: https://www.apache.org/dev/project-site.html. We can publish from
>> a Git repo, SVN, or a UI-based CMS interface.
>>
>> On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw 
>> wrote:
>>
>>> I am also definitely in favor of a single repository. Perhaps I'm just
>>> misunderstanding why the generated must be put in a git repository at
>>> all--is it because that's the easiest way to serve them?
>>>
>>> On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner  wrote:
>>>
 Alan found the place where website publishing is configured [1], which
 has examples of project sites being configured with more than one git root.
 This is great for us because it allows us to leave generated
 javadocs/pydocs in the beam-site repository and publish website markdown
 content from the main repo.

 Alan has a PR ready to publish generated HTML in a post-commit job [2].
 Once that goes through the last step is to upgrade the publishing config.

 [1]
 https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
 [2] https://github.com/apache/beam/pull/6431

 On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner 
 wrote:

> > We could add a new default branch (master?) and keep all the
> non-generated files (src/) there, and put generated files (content/) in 
> the
> asf-site branch (like we already do).
>
> I'm strongly in favor of having sources in a single repository. We
> have significant process and infrastructure built up for the apache/beam
> repo (for build, PR, CI, release, etc.) that we can take advantage of by
> putting website sources in the same repo. The current beam-site repo PR
> automation is flaky because it was custom-built and not given the same
> level of attention as the main repo.
>
> The caveat to consolidating website sources in the main repo is that
> it incentivizes putting the generated sources branch on the same repo. 
> I've
> documented a few of the reasons in the Appendix of the design doc [1]:
>  - It's easier to maintain a single repository; easily apply existing
> tooling/infrastructure
> - Jenkins tooling for publishing generated HTML may not work
> cross-repo [2]
>
> My preference is to move forward with the migration of sources to
> apache/beam [master], and website generated HTML to apache/beam 
> [asf-site].
> I like the idea of separating the publishing/hosting of generated
> javadocs/pydocs since they add so much cruft, but it should not hold up 
> the
> migration.
>
> [1]
> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>
> [2]
> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>
> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri  wrote:
>
>> Staying on beam-site SGTM. We could add a new default branch
>> (master?) and keep all the non-generated files (src/) there, and put
>> generated files (content/) in the asf-site branch (like we already do).
>> That way there's no confusion as to which files you should update.
>> (This is of course assuming we still place generated docs in git
>> repos.)
>>
>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise  wrote:
>>
>>> My thought was to leave the asf-site branch in the beam-site
>>> repository, add generated docs to that branch (until we have a better
>>> solution), and have only sources in the beam repo.
>>>
>>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>>> it would eliminate the need to place generated docs into git repos.
>>>
>>> On Mon, Sep 24, 2018 at 11:

Re: Removing documentation for old Beam versions

2018-09-26 Thread Udi Meiri
Just to be clear, generated html for javadoc and pydoc will be put in
apache/beam-site, but generated html for .md files will be put in
apache/beam under the asf-site branch.

On Wed, Sep 26, 2018 at 9:34 AM Thomas Weise  wrote:

> Looks like the is agreement that all sources should be in the main beam
> repository, the remaining discussion was where the generated content should
> be served from, specifically the generated docs.
>
> If the setup that Alan found allows us to keep using the beam-site
> repository for the generated stuff and that does not unreasonably
> complicate the CI process, then I'm in favor of that. It looks cleaner to
> not mingle source and generated files in the same repo. Otherwise we can do
> the asf-site branch in the main repo and get rid of docs from it once we
> found a better solution.
>
>
> On Wed, Sep 26, 2018 at 7:09 AM Robert Bradshaw 
> wrote:
>
>> OK, thanks. That link was very helpful. Of the three options we must use,
>> checking into git seems preferable than checking into svn let alone the
>> CMS. Keeping the same repo means that it's harder to generate the docs for
>> version X while head is checked out.
>>
>> I'm in favor of moving forward with this in the short term, but we should
>> expore other options (like Flink has) for the longer term.
>>
>>
>>
>> On Wed, Sep 26, 2018 at 3:53 PM Scott Wegner  wrote:
>>
>>> Yes. There are few options for publishing your ASF website, described
>>> here: https://www.apache.org/dev/project-site.html. We can publish from
>>> a Git repo, SVN, or a UI-based CMS interface.
>>>
>>> On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw 
>>> wrote:
>>>
 I am also definitely in favor of a single repository. Perhaps I'm just
 misunderstanding why the generated must be put in a git repository at
 all--is it because that's the easiest way to serve them?

 On Wed, Sep 26, 2018 at 3:39 PM Scott Wegner  wrote:

> Alan found the place where website publishing is configured [1], which
> has examples of project sites being configured with more than one git 
> root.
> This is great for us because it allows us to leave generated
> javadocs/pydocs in the beam-site repository and publish website markdown
> content from the main repo.
>
> Alan has a PR ready to publish generated HTML in a post-commit job
> [2]. Once that goes through the last step is to upgrade the publishing
> config.
>
> [1]
> https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitwcsub/files/config/gitwcsub.cfg
> [2] https://github.com/apache/beam/pull/6431
>
> On Mon, Sep 24, 2018 at 4:35 PM Scott Wegner 
> wrote:
>
>> > We could add a new default branch (master?) and keep all the
>> non-generated files (src/) there, and put generated files (content/) in 
>> the
>> asf-site branch (like we already do).
>>
>> I'm strongly in favor of having sources in a single repository. We
>> have significant process and infrastructure built up for the apache/beam
>> repo (for build, PR, CI, release, etc.) that we can take advantage of by
>> putting website sources in the same repo. The current beam-site repo PR
>> automation is flaky because it was custom-built and not given the same
>> level of attention as the main repo.
>>
>> The caveat to consolidating website sources in the main repo is that
>> it incentivizes putting the generated sources branch on the same repo. 
>> I've
>> documented a few of the reasons in the Appendix of the design doc [1]:
>>  - It's easier to maintain a single repository; easily apply existing
>> tooling/infrastructure
>> - Jenkins tooling for publishing generated HTML may not work
>> cross-repo [2]
>>
>> My preference is to move forward with the migration of sources to
>> apache/beam [master], and website generated HTML to apache/beam 
>> [asf-site].
>> I like the idea of separating the publishing/hosting of generated
>> javadocs/pydocs since they add so much cruft, but it should not hold up 
>> the
>> migration.
>>
>> [1]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc
>>
>> [2]
>> https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace
>>
>> On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri  wrote:
>>
>>> Staying on beam-site SGTM. We could add a new default branch
>>> (master?) and keep all the non-generated files (src/) there, and put
>>> generated files (content/) in the asf-site branch (like we already do).
>>> That way there's no confusion as to which files you should update.
>>> (This is of course assuming we still place generated docs in git
>>> repos.)
>>>
>>> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise 
>>> wrote:
>>>
 My thought was to leave the asf-si

Re: Are there plans for removing joda-time from the beam java SDK?

2018-09-26 Thread Andrew Pilloud
Last I heard we were actually moving the other way, replacing java.time
with joda-time. See the giant schema PR here:
https://github.com/apache/beam/pull/4964 I don't think this was ever
discussed on the list though.

Andrew

On Wed, Sep 26, 2018 at 9:21 AM Jeff Klukas  wrote:

> It looks like there a few spots in the Beam Java API where users have to
> provide joda-time objects, such as FixedWindows#of(org.joda.time.Duration).
>
> Are there any plans to support java.time objects in addition to joda
> objects? Any plans to eventually remove joda-time?
>
> My personal interest is that my team would like to eventually standardize
> on usage of java.time and remove all explicit usage of joda-time in our
> codebases. Even if joda-time is still pulled in transitively by the beam
> java SDK and used internally, it would be nice for users to be able to
> avoid explicit interaction with joda-time. I'm imagining it would be
> possible to provide additional methods like
> FixedWindows#of(java.time.Duration) and potentially marking the joda-based
> variants as deprecated.
>
> Does this seem worthy of opening a JIRA issue?
>
>


Re: Are there plans for removing joda-time from the beam java SDK?

2018-09-26 Thread Reuven Lax
We started with Joda because Java 7 time classes were insufficient for our
needs. Now that we're on Java 8 we could use Java 8's time libraries (which
are much better), but unfortunately that would create
backwards-incompatible changes to our APIs. We should add this to the list
of things to do when we release Beam 3.0.

Reuven

On Wed, Sep 26, 2018 at 10:43 AM Andrew Pilloud  wrote:

> Last I heard we were actually moving the other way, replacing java.time
> with joda-time. See the giant schema PR here:
> https://github.com/apache/beam/pull/4964 I don't think this was ever
> discussed on the list though.
>
> Andrew
>
> On Wed, Sep 26, 2018 at 9:21 AM Jeff Klukas  wrote:
>
>> It looks like there a few spots in the Beam Java API where users have to
>> provide joda-time objects, such as FixedWindows#of(org.joda.time.Duration).
>>
>> Are there any plans to support java.time objects in addition to joda
>> objects? Any plans to eventually remove joda-time?
>>
>> My personal interest is that my team would like to eventually standardize
>> on usage of java.time and remove all explicit usage of joda-time in our
>> codebases. Even if joda-time is still pulled in transitively by the beam
>> java SDK and used internally, it would be nice for users to be able to
>> avoid explicit interaction with joda-time. I'm imagining it would be
>> possible to provide additional methods like
>> FixedWindows#of(java.time.Duration) and potentially marking the joda-based
>> variants as deprecated.
>>
>> Does this seem worthy of opening a JIRA issue?
>>
>>


Re: Are there plans for removing joda-time from the beam java SDK?

2018-09-26 Thread Jean-Baptiste Onofré
+1

It makes sense to me and it's also a plan to "split" the core in more grained 
modules, and give a more API flavor to Beam 3.

Regards
JB

Le 26 sept. 2018 à 13:49, à 13:49, Reuven Lax  a écrit:
>We started with Joda because Java 7 time classes were insufficient for
>our
>needs. Now that we're on Java 8 we could use Java 8's time libraries
>(which
>are much better), but unfortunately that would create
>backwards-incompatible changes to our APIs. We should add this to the
>list
>of things to do when we release Beam 3.0.
>
>Reuven
>
>On Wed, Sep 26, 2018 at 10:43 AM Andrew Pilloud 
>wrote:
>
>> Last I heard we were actually moving the other way, replacing
>java.time
>> with joda-time. See the giant schema PR here:
>> https://github.com/apache/beam/pull/4964 I don't think this was ever
>> discussed on the list though.
>>
>> Andrew
>>
>> On Wed, Sep 26, 2018 at 9:21 AM Jeff Klukas 
>wrote:
>>
>>> It looks like there a few spots in the Beam Java API where users
>have to
>>> provide joda-time objects, such as
>FixedWindows#of(org.joda.time.Duration).
>>>
>>> Are there any plans to support java.time objects in addition to joda
>>> objects? Any plans to eventually remove joda-time?
>>>
>>> My personal interest is that my team would like to eventually
>standardize
>>> on usage of java.time and remove all explicit usage of joda-time in
>our
>>> codebases. Even if joda-time is still pulled in transitively by the
>beam
>>> java SDK and used internally, it would be nice for users to be able
>to
>>> avoid explicit interaction with joda-time. I'm imagining it would be
>>> possible to provide additional methods like
>>> FixedWindows#of(java.time.Duration) and potentially marking the
>joda-based
>>> variants as deprecated.
>>>
>>> Does this seem worthy of opening a JIRA issue?
>>>
>>>


Re: Are there plans for removing joda-time from the beam java SDK?

2018-09-26 Thread Jeff Klukas
Looks like https://github.com/apache/beam/pull/4964 is somewhat different
from what I had in mind. As Reuven mentioned, I'm specifically interested
in using the Java 8 java.time API as a drop-in replacement for joda-time
objects so that we don't have to rely on an external library. PR 4964 is
using joda-time objects to replace older java.util and java.sql objects
with richer joda-time alternatives.

Reuven mentioned a "list of things to do when we release Beam 3.0". Is
there a JIRA issue or other document that's tracking Beam 3.0 work?

Reuven also mentioned that using java.time would introduce
backwards-incompatible changes to the Beam 2.x API, but in many cases (such
as FixedWindows) we could introduce alternative method signatures so that
we can support both joda and java.time. If there are methods that return
joda-time objects, it may be less feasible to support both.

On Wed, Sep 26, 2018 at 1:51 PM Jean-Baptiste Onofré 
wrote:

> +1
>
> It makes sense to me and it's also a plan to "split" the core in more
> grained modules, and give a more API flavor to Beam 3.
>
> Regards
> JB
> Le 26 sept. 2018, à 13:49, Reuven Lax  a écrit:
>>
>> We started with Joda because Java 7 time classes were insufficient for
>> our needs. Now that we're on Java 8 we could use Java 8's time libraries
>> (which are much better), but unfortunately that would create
>> backwards-incompatible changes to our APIs. We should add this to the list
>> of things to do when we release Beam 3.0.
>>
>> Reuven
>>
>> On Wed, Sep 26, 2018 at 10:43 AM Andrew Pilloud < apill...@google.com>
>> wrote:
>>
>>> Last I heard we were actually moving the other way, replacing java.time
>>> with joda-time. See the giant schema PR here:
>>> https://github.com/apache/beam/pull/4964 I don't think this was ever
>>> discussed on the list though.
>>>
>>> Andrew
>>>
>>> On Wed, Sep 26, 2018 at 9:21 AM Jeff Klukas < jklu...@mozilla.com>
>>> wrote:
>>>
 It looks like there a few spots in the Beam Java API where users have
 to provide joda-time objects, such as
 FixedWindows#of(org.joda.time.Duration).

 Are there any plans to support java.time objects in addition to joda
 objects? Any plans to eventually remove joda-time?

 My personal interest is that my team would like to eventually
 standardize on usage of java.time and remove all explicit usage of
 joda-time in our codebases. Even if joda-time is still pulled in
 transitively by the beam java SDK and used internally, it would be nice for
 users to be able to avoid explicit interaction with joda-time. I'm
 imagining it would be possible to provide additional methods like
 FixedWindows#of(java.time.Duration) and potentially marking the joda-based
 variants as deprecated.

 Does this seem worthy of opening a JIRA issue?




Python License code

2018-09-26 Thread Udi Meiri
Hi,
I'm reviewing a PR that has code licensed under Python License. It is
under category
A  so it's okay to
include.
The question is: where do we put the license notice? Is it sufficient to
place the code in a separate module with the license pasted at the top?

https://github.com/apache/beam/pull/6423/files#r220284399


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Python License code

2018-09-26 Thread Kenneth Knowles
We do have a top-level NOTICE file. I am neither a lawyer nor familiar with
the details of the requirements of the Python Software Foundation License.

In addition to the page you linked, have you followed the link to the LEGAL
Jira space (https://www.apache.org/legal/resolved.html#asking-questions)?
You may find your answer there, or could ask.

Kenn

On Wed, Sep 26, 2018 at 11:25 AM Udi Meiri  wrote:

> Hi,
> I'm reviewing a PR that has code licensed under Python License. It is
> under category A  so
> it's okay to include.
> The question is: where do we put the license notice? Is it sufficient to
> place the code in a separate module with the license pasted at the top?
>
> https://github.com/apache/beam/pull/6423/files#r220284399
>


Re: Python License code

2018-09-26 Thread Udi Meiri
I've opened https://issues.apache.org/jira/browse/LEGAL-417

On Wed, Sep 26, 2018 at 11:43 AM Kenneth Knowles  wrote:

> We do have a top-level NOTICE file. I am neither a lawyer nor familiar
> with the details of the requirements of the Python Software Foundation
> License.
>
> In addition to the page you linked, have you followed the link to the
> LEGAL Jira space (
> https://www.apache.org/legal/resolved.html#asking-questions)? You may
> find your answer there, or could ask.
>
> Kenn
>
> On Wed, Sep 26, 2018 at 11:25 AM Udi Meiri  wrote:
>
>> Hi,
>> I'm reviewing a PR that has code licensed under Python License. It is
>> under category A  so
>> it's okay to include.
>> The question is: where do we put the license notice? Is it sufficient to
>> place the code in a separate module with the license pasted at the top?
>>
>> https://github.com/apache/beam/pull/6423/files#r220284399
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug of the MqttIO.java

2018-09-26 Thread Lukasz Cwik
Yes, please create a JIRA account on issues.apache.org

Once you have one, please tell my the JIRA id and I'll add you as a
contributor to Apache Beam and assign BEAM-5496 to you.

Also this guide https://beam.apache.org/contribute/ helps people learn how
to contribute. It has useful information about how to build, test, open
PRs, find reviewers.

On Tue, Sep 25, 2018 at 9:37 PM flyisland  wrote:

> Cool, I'd like to.
>
> Is there anything I should've done first, like create an account etc.
>
> On Tue, Sep 25, 2018 at 11:46 PM Lukasz Cwik  wrote:
>
>> Thanks, I filed https://issues.apache.org/jira/browse/BEAM-5496 with the
>> details of your report.
>>
>> Would you be interested in submitting a patch with a test that exercises
>> the bug?
>>
>> On Tue, Sep 25, 2018 at 1:21 AM flyisland  wrote:
>>
>>> Hi
>>>
>>> There is a bug of the built-in MqttIO, please check the <
>>> https://github.com/apache/beam/blob/master/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java#L336>,
>>> this readObject() method forget to invoke the "stream.defaultReadObject()"
>>> method.
>>>
>>> // set an empty list to messages when deserialize
>>> private void readObject(java.io.ObjectInputStream stream)
>>> throws IOException, ClassNotFoundException {
>>> messages = new ArrayList<>();
>>> }
>>> }
>>>
>>> So there is an exception while the runner tried to deserialize the
>>> checkpoint object.
>>> java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException:
>>> 95 unexpected extra bytes after decoding
>>> org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219 at
>>> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
>>> ...
>>>
>>>
>>>
>>>


Re: Bug of the MqttIO.java

2018-09-26 Thread Jean-Baptiste Onofré
Yes please. Create a jira, I will tackle that. Thanks.

Regards
JB

Le 26 sept. 2018 à 15:12, à 15:12, Lukasz Cwik  a écrit:
>Yes, please create a JIRA account on issues.apache.org
>
>Once you have one, please tell my the JIRA id and I'll add you as a
>contributor to Apache Beam and assign BEAM-5496 to you.
>
>Also this guide https://beam.apache.org/contribute/ helps people learn
>how
>to contribute. It has useful information about how to build, test, open
>PRs, find reviewers.
>
>On Tue, Sep 25, 2018 at 9:37 PM flyisland  wrote:
>
>> Cool, I'd like to.
>>
>> Is there anything I should've done first, like create an account etc.
>>
>> On Tue, Sep 25, 2018 at 11:46 PM Lukasz Cwik 
>wrote:
>>
>>> Thanks, I filed https://issues.apache.org/jira/browse/BEAM-5496 with
>the
>>> details of your report.
>>>
>>> Would you be interested in submitting a patch with a test that
>exercises
>>> the bug?
>>>
>>> On Tue, Sep 25, 2018 at 1:21 AM flyisland 
>wrote:
>>>
 Hi

 There is a bug of the built-in MqttIO, please check the <

>https://github.com/apache/beam/blob/master/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java#L336>,
 this readObject() method forget to invoke the
>"stream.defaultReadObject()"
 method.

 // set an empty list to messages when deserialize
 private void readObject(java.io.ObjectInputStream stream)
 throws IOException, ClassNotFoundException {
 messages = new ArrayList<>();
 }
 }

 So there is an exception while the runner tried to deserialize the
 checkpoint object.
 java.lang.RuntimeException:
>org.apache.beam.sdk.coders.CoderException:
 95 unexpected extra bytes after decoding
 org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219 at

>org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
 ...






Modular IO presentation at Apachecon

2018-09-26 Thread Ismaël Mejía
Hello, today Eugene and me did a talk about about modular APIs for IO
at ApacheCon. This talk introduces some common patterns that we have
found while creating IO connectors and also presents recent ideas like
dynamic destinations, sequential writes among others using FileIO as a
use case.

In case you guys want to take a look, here is a copy of the slides, we
will probably add this to the IO authoring documentation too.

https://s.apache.org/beam-modular-io-talk


Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-26 Thread Charles Chen
+1. Performed additional validations as listed in the spreadsheet.

On Wed, Sep 26, 2018, 3:24 AM Robert Bradshaw  wrote:

> +1 (binding), same verification as before.
>
> On Wed, Sep 26, 2018 at 7:36 AM Charles Chen  wrote:
>
>> To clarify, the only difference between RC2 and RC3 is the Python fix
>> https://github.com/apache/beam/pull/6494.
>>
>> This means that the Java validations from RC2 should carry over, though I
>> reran validations with RC3 anyway, as detailed on the spreadsheet.
>>
>> On Wed, Sep 26, 2018 at 12:41 AM Charles Chen  wrote:
>>
>>> As with before, please add any validation performed to the spreadsheet
>>> here:
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1675964688
>>>
>>> On Wed, Sep 26, 2018 at 12:30 AM Charles Chen  wrote:
>>>
 Hi everyone,

 Please review and vote on the release candidate #3 for the version
 2.7.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)

 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to dist.apache.org
 [2], which is signed with the key with fingerprint 45C60AAAD115F560 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.7.0-RC3" [5],
 * website pull request listing the release and publishing the API
 reference manual [6].
 * Java artifacts were built with Gradle 4.8 and OpenJDK
 1.8.0_181-8u181-b13-1~deb9u1.
 * Python artifacts are deployed along with the source release to the
 dist.apache.org [2].

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 Thanks,
 Charles

 [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12343654
 [2] https://dist.apache.org/repos/dist/dev/beam/2.7.0
 [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1048/
 [5] https://github.com/apache/beam/tree/v2.7.0-RC3
 [6] https://github.com/apache/beam-site/pull/549

>>>


Re: Modular IO presentation at Apachecon

2018-09-26 Thread Alan Myrvold
Thanks for the slides.
Really enjoyed the talk in person, especially the concept that IO is a
transformation, and a source or sink are not special and the splittable
DoFn explanation.

On Wed, Sep 26, 2018 at 2:17 PM Ismaël Mejía  wrote:

> Hello, today Eugene and me did a talk about about modular APIs for IO
> at ApacheCon. This talk introduces some common patterns that we have
> found while creating IO connectors and also presents recent ideas like
> dynamic destinations, sequential writes among others using FileIO as a
> use case.
>
> In case you guys want to take a look, here is a copy of the slides, we
> will probably add this to the IO authoring documentation too.
>
> https://s.apache.org/beam-modular-io-talk
>


Re: SplittableDoFn

2018-09-26 Thread Reuven Lax
is synchronization over an entire work item, or just inside restriction
tracker? my concern is that some runners (especially streaming runners)
might have hundreds or thousands of parallel work items being processed for
the same SDF (for different keys), and I'm afraid of creating
lock-contention bottlenecks.

On Fri, Sep 21, 2018 at 3:42 PM Lukasz Cwik  wrote:

> The synchronization is related to Java thread safety since there is likely
> to be concurrent access needed to a restriction tracker to properly handle
> accessing the backlog and splitting concurrently from when the users DoFn
> is executing and updating the restriction tracker. This is similar to the
> Java thread safety needed in BoundedSource and UnboundedSource for fraction
> consumed, backlog bytes, and splitting.
>
> On Fri, Sep 21, 2018 at 2:38 PM Reuven Lax  wrote:
>
>> Can you give details on what the synchronization is per? Is it per key,
>> or global to each worker?
>>
>> On Fri, Sep 21, 2018 at 2:10 PM Lukasz Cwik  wrote:
>>
>>> As I was looking at the SplittableDoFn API while working towards making
>>> a proposal for how the backlog/splitting API could look, I found some sharp
>>> edges that could be improved.
>>>
>>> I noticed that:
>>> 1) We require users to write thread safe code, this is something that we
>>> haven't asked of users when writing a DoFn.
>>> 2) We "internal" methods within the RestrictionTracker that are not
>>> meant to be used by the runner.
>>>
>>> I can fix these issues by giving the user a forwarding restriction
>>> tracker[1] that provides an appropriate level of synchronization as needed
>>> and also provides the necessary observation hooks to see when a claim
>>> failed or succeeded.
>>>
>>> This requires a change to our experimental API since we need to pass
>>> a RestrictionTracker to the @ProcessElement method instead of a sub-type of
>>> RestrictionTracker.
>>> @ProcessElement
>>> processElement(ProcessContext context, OffsetRangeTracker tracker) { ...
>>> }
>>> becomes:
>>> @ProcessElement
>>> processElement(ProcessContext context, RestrictionTracker>> Long> tracker) { ... }
>>>
>>> This provides an additional benefit that it prevents users from working
>>> around the RestrictionTracker APIs and potentially making underlying
>>> changes to the tracker outside of the tryClaim call.
>>>
>>> Full implementation is available within this PR[2] and was wondering
>>> what people thought.
>>>
>>> 1:
>>> https://github.com/apache/beam/pull/6467/files#diff-ed95abb6bc30a9ed07faef5c3fea93f0R72
>>> 2: https://github.com/apache/beam/pull/6467
>>>
>>>
>>> On Mon, Sep 17, 2018 at 12:45 PM Lukasz Cwik  wrote:
>>>
 The changes to the API have not been proposed yet. So far it has all
 been about what is the representation and why.

 For splitting, the current idea has been about using the backlog as a
 way of telling the SplittableDoFn where to split, so it would be in terms
 of whatever the SDK decided to report.
 The runner always chooses a number for backlog that is relative to the
 SDKs reported backlog. It would be upto the SDK to round/clamp the number
 given by the Runner to represent something meaningful for itself.
 For example if the backlog that the SDK was reporting was bytes
 remaining in a file such as 500, then the Runner could provide some value
 like 212.2 which the SDK would then round to 212.
 If the backlog that the SDK was reporting 57 pubsub messages, then the
 Runner could provide a value like 300 which would mean to read 57 values
 and then another 243 as part of the current restriction.

 I believe that BoundedSource/UnboundedSource will have wrappers added
 that provide a basic SplittableDoFn implementation so existing IOs should
 be migrated over without API changes.

 On Mon, Sep 17, 2018 at 1:11 AM Ismaël Mejía  wrote:

> Thanks a lot Luke for bringing this back to the mailing list and Ryan
> for taking
> the notes.
>
> I would like to know if there was some discussion, or if you guys have
> given
> some thought to the required changes in the SDK (API) part. What will
> be the
> equivalent of `splitAtFraction` and what should IO authors do to
> support it..
>
> On Sat, Sep 15, 2018 at 1:52 AM Lukasz Cwik  wrote:
> >
> > Thanks to everyone who joined and for the questions asked.
> >
> > Ryan graciously collected notes of the discussion:
> https://docs.google.com/document/d/1kjJLGIiNAGvDiUCMEtQbw8tyOXESvwGeGZLL-0M06fQ/edit?usp=sharing
> >
> > The summary was that bringing BoundedSource/UnboundedSource into
> using a unified backlog-reporting mechanism with optional other signals
> that Dataflow has found useful (such as is the remaining restriction
> splittable (yes, no, unknown)). Other runners can use or not. SDFs should
> report backlog and watermark as minimum bar. The backlog should use an
> arbitrary precis

Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-26 Thread Ahmet Altay
+1. Thank you all!

On Wed, Sep 26, 2018 at 2:33 PM, Charles Chen  wrote:

> +1. Performed additional validations as listed in the spreadsheet.
>
>
> On Wed, Sep 26, 2018, 3:24 AM Robert Bradshaw  wrote:
>
>> +1 (binding), same verification as before.
>>
>> On Wed, Sep 26, 2018 at 7:36 AM Charles Chen  wrote:
>>
>>> To clarify, the only difference between RC2 and RC3 is the Python fix
>>> https://github.com/apache/beam/pull/6494.
>>>
>>> This means that the Java validations from RC2 should carry over, though
>>> I reran validations with RC3 anyway, as detailed on the spreadsheet.
>>>
>>> On Wed, Sep 26, 2018 at 12:41 AM Charles Chen  wrote:
>>>
 As with before, please add any validation performed to the spreadsheet
 here: https://docs.google.com/spreadsheets/d/1qk-
 N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1675964688

 On Wed, Sep 26, 2018 at 12:30 AM Charles Chen  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #3 for the version
> 2.7.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint 45C60AAAD115F560 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.7.0-RC3" [5],
> * website pull request listing the release and publishing the API
> reference manual [6].
> * Java artifacts were built with Gradle 4.8 and OpenJDK
> 1.8.0_181-8u181-b13-1~deb9u1.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Charles
>
> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12319527&version=12343654
> [2] https://dist.apache.org/repos/dist/dev/beam/2.7.0
> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
> [4] https://repository.apache.org/content/repositories/
> orgapachebeam-1048/
> [5] https://github.com/apache/beam/tree/v2.7.0-RC3
> [6] https://github.com/apache/beam-site/pull/549
>



Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-26 Thread Jean-Baptiste Onofré
+1 (binding)

Regards
JB

Le 26 sept. 2018 à 18:00, à 18:00, Ahmet Altay  a écrit:
>+1. Thank you all!
>
>On Wed, Sep 26, 2018 at 2:33 PM, Charles Chen  wrote:
>
>> +1. Performed additional validations as listed in the spreadsheet.
>>
>>
>> On Wed, Sep 26, 2018, 3:24 AM Robert Bradshaw 
>wrote:
>>
>>> +1 (binding), same verification as before.
>>>
>>> On Wed, Sep 26, 2018 at 7:36 AM Charles Chen  wrote:
>>>
 To clarify, the only difference between RC2 and RC3 is the Python
>fix
 https://github.com/apache/beam/pull/6494.

 This means that the Java validations from RC2 should carry over,
>though
 I reran validations with RC3 anyway, as detailed on the
>spreadsheet.

 On Wed, Sep 26, 2018 at 12:41 AM Charles Chen 
>wrote:

> As with before, please add any validation performed to the
>spreadsheet
> here: https://docs.google.com/spreadsheets/d/1qk-
> N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1675964688
>
> On Wed, Sep 26, 2018 at 12:30 AM Charles Chen 
>wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #3 for the
>version
>> 2.7.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific
>comments)
>>
>> The complete staging area is available for your review, which
>includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to
>dist.apache.org
>> [2], which is signed with the key with fingerprint
>45C60AAAD115F560 [3],
>> * all artifacts to be deployed to the Maven Central Repository
>[4],
>> * source code tag "v2.7.0-RC3" [5],
>> * website pull request listing the release and publishing the API
>> reference manual [6].
>> * Java artifacts were built with Gradle 4.8 and OpenJDK
>> 1.8.0_181-8u181-b13-1~deb9u1.
>> * Python artifacts are deployed along with the source release to
>the
>> dist.apache.org [2].
>>
>> The vote will be open for at least 72 hours. It is adopted by
>majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Charles
>>
>> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> projectId=12319527&version=12343654
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.7.0
>> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
>> [4] https://repository.apache.org/content/repositories/
>> orgapachebeam-1048/
>> [5] https://github.com/apache/beam/tree/v2.7.0-RC3
>> [6] https://github.com/apache/beam-site/pull/549
>>
>


Re: SplittableDoFn

2018-09-26 Thread Lukasz Cwik
Reuven, just inside the restriction tracker itself which is scoped per
executing SplittableDoFn. A user could incorrectly write the
synchronization since they are currently responsible for writing it though.

On Wed, Sep 26, 2018 at 2:51 PM Reuven Lax  wrote:

> is synchronization over an entire work item, or just inside restriction
> tracker? my concern is that some runners (especially streaming runners)
> might have hundreds or thousands of parallel work items being processed for
> the same SDF (for different keys), and I'm afraid of creating
> lock-contention bottlenecks.
>
> On Fri, Sep 21, 2018 at 3:42 PM Lukasz Cwik  wrote:
>
>> The synchronization is related to Java thread safety since there is
>> likely to be concurrent access needed to a restriction tracker to properly
>> handle accessing the backlog and splitting concurrently from when the users
>> DoFn is executing and updating the restriction tracker. This is similar to
>> the Java thread safety needed in BoundedSource and UnboundedSource for
>> fraction consumed, backlog bytes, and splitting.
>>
>> On Fri, Sep 21, 2018 at 2:38 PM Reuven Lax  wrote:
>>
>>> Can you give details on what the synchronization is per? Is it per key,
>>> or global to each worker?
>>>
>>> On Fri, Sep 21, 2018 at 2:10 PM Lukasz Cwik  wrote:
>>>
 As I was looking at the SplittableDoFn API while working towards making
 a proposal for how the backlog/splitting API could look, I found some sharp
 edges that could be improved.

 I noticed that:
 1) We require users to write thread safe code, this is something that
 we haven't asked of users when writing a DoFn.
 2) We "internal" methods within the RestrictionTracker that are not
 meant to be used by the runner.

 I can fix these issues by giving the user a forwarding restriction
 tracker[1] that provides an appropriate level of synchronization as needed
 and also provides the necessary observation hooks to see when a claim
 failed or succeeded.

 This requires a change to our experimental API since we need to pass
 a RestrictionTracker to the @ProcessElement method instead of a sub-type of
 RestrictionTracker.
 @ProcessElement
 processElement(ProcessContext context, OffsetRangeTracker tracker) {
 ... }
 becomes:
 @ProcessElement
 processElement(ProcessContext context, RestrictionTracker>>> Long> tracker) { ... }

 This provides an additional benefit that it prevents users from working
 around the RestrictionTracker APIs and potentially making underlying
 changes to the tracker outside of the tryClaim call.

 Full implementation is available within this PR[2] and was wondering
 what people thought.

 1:
 https://github.com/apache/beam/pull/6467/files#diff-ed95abb6bc30a9ed07faef5c3fea93f0R72
 2: https://github.com/apache/beam/pull/6467


 On Mon, Sep 17, 2018 at 12:45 PM Lukasz Cwik  wrote:

> The changes to the API have not been proposed yet. So far it has all
> been about what is the representation and why.
>
> For splitting, the current idea has been about using the backlog as a
> way of telling the SplittableDoFn where to split, so it would be in terms
> of whatever the SDK decided to report.
> The runner always chooses a number for backlog that is relative to the
> SDKs reported backlog. It would be upto the SDK to round/clamp the number
> given by the Runner to represent something meaningful for itself.
> For example if the backlog that the SDK was reporting was bytes
> remaining in a file such as 500, then the Runner could provide some value
> like 212.2 which the SDK would then round to 212.
> If the backlog that the SDK was reporting 57 pubsub messages, then the
> Runner could provide a value like 300 which would mean to read 57 values
> and then another 243 as part of the current restriction.
>
> I believe that BoundedSource/UnboundedSource will have wrappers added
> that provide a basic SplittableDoFn implementation so existing IOs should
> be migrated over without API changes.
>
> On Mon, Sep 17, 2018 at 1:11 AM Ismaël Mejía 
> wrote:
>
>> Thanks a lot Luke for bringing this back to the mailing list and Ryan
>> for taking
>> the notes.
>>
>> I would like to know if there was some discussion, or if you guys
>> have given
>> some thought to the required changes in the SDK (API) part. What will
>> be the
>> equivalent of `splitAtFraction` and what should IO authors do to
>> support it..
>>
>> On Sat, Sep 15, 2018 at 1:52 AM Lukasz Cwik  wrote:
>> >
>> > Thanks to everyone who joined and for the questions asked.
>> >
>> > Ryan graciously collected notes of the discussion:
>> https://docs.google.com/document/d/1kjJLGIiNAGvDiUCMEtQbw8tyOXESvwGeGZLL-0M06fQ/edit?usp=sharing
>> >
>> > The summary was that 

Re: Bug of the MqttIO.java

2018-09-26 Thread flyisland
Hi, My jira id is "flyisland", thanks!

On Thu, Sep 27, 2018 at 3:25 AM Jean-Baptiste Onofré 
wrote:

> Yes please. Create a jira, I will tackle that. Thanks.
>
> Regards
> JB
> Le 26 sept. 2018, à 15:12, Lukasz Cwik  a écrit:
>>
>> Yes, please create a JIRA account on issues.apache.org
>>
>> Once you have one, please tell my the JIRA id and I'll add you as a
>> contributor to Apache Beam and assign BEAM-5496 to you.
>>
>> Also this guide https://beam.apache.org/contribute/ helps people learn
>> how to contribute. It has useful information about how to build, test, open
>> PRs, find reviewers.
>>
>> On Tue, Sep 25, 2018 at 9:37 PM flyisland  wrote:
>>
>>> Cool, I'd like to.
>>>
>>> Is there anything I should've done first, like create an account etc.
>>>
>>> On Tue, Sep 25, 2018 at 11:46 PM Lukasz Cwik  wrote:
>>>
 Thanks, I filed https://issues.apache.org/jira/browse/BEAM-5496 with
 the details of your report.

 Would you be interested in submitting a patch with a test that
 exercises the bug?

 On Tue, Sep 25, 2018 at 1:21 AM flyisland  wrote:

> Hi
>
> There is a bug of the built-in MqttIO, please check the <
> https://github.com/apache/beam/blob/master/sdks/java/io/mqtt/src/main/java/org/apache/beam/sdk/io/mqtt/MqttIO.java#L336>,
> this readObject() method forget to invoke the "stream.defaultReadObject()"
> method.
>
> // set an empty list to messages when deserialize
> private void readObject(java.io.ObjectInputStream stream)
> throws IOException, ClassNotFoundException {
> messages = new ArrayList<>();
> }
> }
>
> So there is an exception while the runner tried to deserialize the
> checkpoint object.
> java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException:
> 95 unexpected extra bytes after decoding
> org.apache.beam.sdk.io.mqtt.MqttIO$MqttCheckpointMark@6764e219 at
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:340)
> ...
>
>
>
>


Re: Modular IO presentation at Apachecon

2018-09-26 Thread Thomas Weise
Thanks for sharing. I'm looking forward to see the recording of the talk
(hopefully!).

This will be very helpful for Beam users. IO still is typically the
unexpectedly hard and time consuming part of authoring pipelines.


On Wed, Sep 26, 2018 at 2:48 PM Alan Myrvold  wrote:

> Thanks for the slides.
> Really enjoyed the talk in person, especially the concept that IO is a
> transformation, and a source or sink are not special and the splittable
> DoFn explanation.
>
> On Wed, Sep 26, 2018 at 2:17 PM Ismaël Mejía  wrote:
>
>> Hello, today Eugene and me did a talk about about modular APIs for IO
>> at ApacheCon. This talk introduces some common patterns that we have
>> found while creating IO connectors and also presents recent ideas like
>> dynamic destinations, sequential writes among others using FileIO as a
>> use case.
>>
>> In case you guys want to take a look, here is a copy of the slides, we
>> will probably add this to the IO authoring documentation too.
>>
>> https://s.apache.org/beam-modular-io-talk
>>
>


Re: Modular IO presentation at Apachecon

2018-09-26 Thread Ankur Goenka
Thanks for sharing. Great slides and looking for the recorded session.

Do we have a central location where we link all the beam presentations for
discoverability?

On Wed, Sep 26, 2018 at 9:35 PM Thomas Weise  wrote:

> Thanks for sharing. I'm looking forward to see the recording of the talk
> (hopefully!).
>
> This will be very helpful for Beam users. IO still is typically the
> unexpectedly hard and time consuming part of authoring pipelines.
>
>
> On Wed, Sep 26, 2018 at 2:48 PM Alan Myrvold  wrote:
>
>> Thanks for the slides.
>> Really enjoyed the talk in person, especially the concept that IO is a
>> transformation, and a source or sink are not special and the splittable
>> DoFn explanation.
>>
>> On Wed, Sep 26, 2018 at 2:17 PM Ismaël Mejía  wrote:
>>
>>> Hello, today Eugene and me did a talk about about modular APIs for IO
>>> at ApacheCon. This talk introduces some common patterns that we have
>>> found while creating IO connectors and also presents recent ideas like
>>> dynamic destinations, sequential writes among others using FileIO as a
>>> use case.
>>>
>>> In case you guys want to take a look, here is a copy of the slides, we
>>> will probably add this to the IO authoring documentation too.
>>>
>>> https://s.apache.org/beam-modular-io-talk
>>>
>>