[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar
[ https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155253&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155253 ] ASF GitHub Bot logged work on BEAM-5730: Author: ASF GitHub Bot Created on: 17/Oct/18 03:30 Start Date: 17/Oct/18 03:30 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #6694: [BEAM-5730] Migrate ITs using DataflowRunner to use custom worker URL: https://github.com/apache/beam/pull/6694#issuecomment-430477290 All tests passed. Please review this PR @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155253) Time Spent: 40m (was: 0.5h) > Migrate Java test to use a staged worker jar > > > Key: BEAM-5730 > URL: https://issues.apache.org/jira/browse/BEAM-5730 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4505) Archive/Retire apache/beam-site repository
[ https://issues.apache.org/jira/browse/BEAM-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-4505. Resolution: Fixed Fix Version/s: Not applicable > Archive/Retire apache/beam-site repository > -- > > Key: BEAM-4505 > URL: https://issues.apache.org/jira/browse/BEAM-4505 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: beam-site-automation-reliability > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4493) Beam-Site Automation Reliability
[ https://issues.apache.org/jira/browse/BEAM-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner closed BEAM-4493. -- Resolution: Fixed Fix Version/s: Not applicable This migration is now complete! > Beam-Site Automation Reliability > > > Key: BEAM-4493 > URL: https://issues.apache.org/jira/browse/BEAM-4493 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: beam-site-automation-reliability > Fix For: Not applicable > > > https://s.apache.org/beam-site-automation -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4504) Disconnect mergebot from apache/beam-site repository
[ https://issues.apache.org/jira/browse/BEAM-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner closed BEAM-4504. -- Resolution: Fixed Fix Version/s: Not applicable > Disconnect mergebot from apache/beam-site repository > - > > Key: BEAM-4504 > URL: https://issues.apache.org/jira/browse/BEAM-4504 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: beam-site-automation-reliability > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4504) Disconnect mergebot from apache/beam-site repository
[ https://issues.apache.org/jira/browse/BEAM-4504?focusedWorklogId=155245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155245 ] ASF GitHub Bot logged work on BEAM-4504: Author: ASF GitHub Bot Created on: 17/Oct/18 02:16 Start Date: 17/Oct/18 02:16 Worklog Time Spent: 10m Work Description: swegner closed pull request #6713: [BEAM-4504] Retire Jenkins jobs from apache/beam-site repository URL: https://github.com/apache/beam/pull/6713 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.test-infra/jenkins/CommonJobProperties.groovy b/.test-infra/jenkins/CommonJobProperties.groovy index d098e1a8c7b..641cdfbd051 100644 --- a/.test-infra/jenkins/CommonJobProperties.groovy +++ b/.test-infra/jenkins/CommonJobProperties.groovy @@ -24,66 +24,15 @@ class CommonJobProperties { static String checkoutDir = 'src' - static void setSCM(def context, String repositoryName, boolean allowRemotePoll = true) { -context.scm { - git { -remote { - // Double quotes here mean ${repositoryName} is interpolated. - github("apache/${repositoryName}") - // Single quotes here mean that ${ghprbPullId} is not interpolated and instead passed - // through to Jenkins where it refers to the environment variable. - refspec('+refs/heads/*:refs/remotes/origin/* ' + - '+refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*') -} -branch('${sha1}') -extensions { - cleanAfterCheckout() - relativeTargetDirectory(checkoutDir) - if (!allowRemotePoll) { -disableRemotePoll() - } -} - } -} - } - - // Sets common top-level job properties for website repository jobs. - static void setTopLevelWebsiteJobProperties(def context, - String branch = 'asf-site', - int timeout = 100) { -setTopLevelJobProperties( -context, -'beam-site', -branch, -timeout) - } - // Sets common top-level job properties for main repository jobs. static void setTopLevelMainJobProperties(def context, - String branch = 'master', - int timeout = 100, + String defaultBranch = 'master', + int defaultTimeout = 100, boolean allowRemotePoll = true, String jenkinsExecutorLabel = 'beam') { -setTopLevelJobProperties( -context, -'beam', -branch, -timeout, -allowRemotePoll, -jenkinsExecutorLabel) - } - - // Sets common top-level job properties. Accessed through one of the above - // methods to protect jobs from internal details of param defaults. - private static void setTopLevelJobProperties(def context, - String repositoryName, - String defaultBranch, - int defaultTimeout, - boolean allowRemotePoll = true, - String jenkinsExecutorLabel = 'beam') { // GitHub project. context.properties { - githubProjectUrl('https://github.com/apache/' + repositoryName + '/') + githubProjectUrl('https://github.com/apache/beam/') } // Set JDK version. @@ -98,7 +47,25 @@ class CommonJobProperties { } // Source code management. -setSCM(context, repositoryName, allowRemotePoll) +context.scm { + git { +remote { + github("apache/beam") + // Single quotes here mean that ${ghprbPullId} is not interpolated and instead passed + // through to Jenkins where it refers to the environment variable. + refspec('+refs/heads/*:refs/remotes/origin/* ' + + '+refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*') +} +branch('${sha1}') +extensions { + cleanAfterCheckout() + relativeTargetDirectory(checkoutDir) + if (!allowRemotePoll) { +disableRemotePoll() + } +} + } +} context.parameters { // This is a recommended setup if you want to run the job manually. The @@ -196,14 +163,6 @@ class CommonJobProperties { context.switches("-Dorg.gradle.j
[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container
[ https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=155238&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155238 ] ASF GitHub Bot logged work on BEAM-4130: Author: ASF GitHub Bot Created on: 17/Oct/18 01:14 Start Date: 17/Oct/18 01:14 Worklog Time Spent: 10m Work Description: tweise closed pull request #6703: [BEAM-4130] Add tests for FlinkJobServerDriver URL: https://github.com/apache/beam/pull/6703 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java index 34f2edb5abb..93dc6f0121c 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java @@ -17,6 +17,7 @@ */ package org.apache.beam.runners.flink; +import com.google.common.annotations.VisibleForTesting; import com.google.common.util.concurrent.ListeningExecutorService; import com.google.common.util.concurrent.MoreExecutors; import com.google.common.util.concurrent.ThreadFactoryBuilder; @@ -45,7 +46,7 @@ private static final Logger LOG = LoggerFactory.getLogger(FlinkJobServerDriver.class); private final ListeningExecutorService executor; - private final ServerConfiguration configuration; + @VisibleForTesting ServerConfiguration configuration; private final ServerFactory jobServerFactory; private final ServerFactory artifactServerFactory; private GrpcFnServer jobServer; @@ -54,34 +55,34 @@ /** Configuration for the jobServer. */ public static class ServerConfiguration { @Option(name = "--job-host", usage = "The job server host name") -private String host = ""; +String host = "localhost"; @Option( name = "--job-port", usage = "The job service port. 0 to use a dynamic port. (Default: 8099)" ) -private int port = 8099; +int port = 8099; @Option( name = "--artifact-port", usage = "The artifact service port. 0 to use a dynamic port. (Default: 8098)" ) -private int artifactPort = 8098; +int artifactPort = 8098; @Option(name = "--artifacts-dir", usage = "The location to store staged artifact files") -private String artifactStagingPath = +String artifactStagingPath = Paths.get(System.getProperty("java.io.tmpdir"), "beam-artifact-staging").toString(); @Option( name = "--clean-artifacts-per-job", usage = "When true, remove each job's staged artifacts when it completes" ) -private Boolean cleanArtifactsPerJob = false; +boolean cleanArtifactsPerJob = false; @Option(name = "--flink-master-url", usage = "Flink master url to submit job.") -private String flinkMasterUrl = "[auto]"; +String flinkMasterUrl = "[auto]"; -public String getFlinkMasterUrl() { +String getFlinkMasterUrl() { return this.flinkMasterUrl; } @@ -89,9 +90,9 @@ public String getFlinkMasterUrl() { name = "--sdk-worker-parallelism", usage = "Default parallelism for SDK worker processes (see portable pipeline options)" ) -private String sdkWorkerParallelism = PortablePipelineOptions.SDK_WORKER_PARALLELISM_PIPELINE; +String sdkWorkerParallelism = PortablePipelineOptions.SDK_WORKER_PARALLELISM_PIPELINE; -public String getSdkWorkerParallelism() { +String getSdkWorkerParallelism() { return this.sdkWorkerParallelism; } } @@ -209,7 +210,7 @@ public void stop() { .build(); jobServiceGrpcFnServer = GrpcFnServer.create(service, descriptor, jobServerFactory); } -LOG.info("JobServer started on {}", jobServiceGrpcFnServer.getApiServiceDescriptor().getUrl()); +LOG.info("JobService started on {}", jobServiceGrpcFnServer.getApiServiceDescriptor().getUrl()); return jobServiceGrpcFnServer; } diff --git a/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkJobServerDriverTest.java b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkJobServerDriverTest.java new file mode 100644 index 000..fc44d8edf31 --- /dev/null +++ b/runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkJobServerDriverTest.java @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache L
[jira] [Assigned] (BEAM-5760) Portable Flink support for maxBundleSize/maxBundleMillis
[ https://issues.apache.org/jira/browse/BEAM-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise reassigned BEAM-5760: -- Assignee: Thomas Weise > Portable Flink support for maxBundleSize/maxBundleMillis > > > Key: BEAM-5760 > URL: https://issues.apache.org/jira/browse/BEAM-5760 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 2.8.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability-flink > Fix For: 2.9.0 > > > The portable runner needs to support larger bundles in streaming mode. > Currently every element is a separate bundle, which is very inefficient due > to the per bundle SDK worker overhead. The old Java SDK runner already > supports these parameters. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk
[ https://issues.apache.org/jira/browse/BEAM-5775?focusedWorklogId=155233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155233 ] ASF GitHub Bot logged work on BEAM-5775: Author: ASF GitHub Bot Created on: 17/Oct/18 00:51 Start Date: 17/Oct/18 00:51 Worklog Time Spent: 10m Work Description: mikekap opened a new pull request #6714: [BEAM-5775] Spark: implement a custom class to lazily encode values for persistence. URL: https://github.com/apache/beam/pull/6714 Spark's `StorageLevel` is the preferred mechanism to decide what is serialized when and where. With this change, Beam respects Spark's wish to keep data deserialized in memory, even if the storage level *may* swap to disk (e.g. MEMORY_AND_DISK). This PR also drive-by fixes using the `MEMORY_ONLY_2` storage level. The code previously assumed that no serialization was necessary, which isn't strictly true since the `_2` means "replicate to other nodes" - i.e. serialize over network. Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache
[jira] [Work logged] (BEAM-5058) Python precommits should run E2E tests
[ https://issues.apache.org/jira/browse/BEAM-5058?focusedWorklogId=155230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155230 ] ASF GitHub Bot logged work on BEAM-5058: Author: ASF GitHub Bot Created on: 17/Oct/18 00:48 Start Date: 17/Oct/18 00:48 Worklog Time Spent: 10m Work Description: udim commented on a change in pull request #6707: [BEAM-5058] Run basic ITs in Python Precommit URL: https://github.com/apache/beam/pull/6707#discussion_r225749774 ## File path: sdks/python/build.gradle ## @@ -226,6 +228,26 @@ task directRunnerIT(dependsOn: 'installGcpTest') { } } +task precommitIT(dependsOn: ['installGcpTest', 'sdist']) { Review comment: This task should run in parallel to the rest of the precommit tasks. This can be done if it is placed in a separate sub-project. Sub-projects are created by creating a new build.gradle file in a subdirectory, such as `sdks/python/precommit/dataflow/build.gradle`. (example of making tests parallel: https://github.com/apache/beam/pull/5731/files) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155230) Time Spent: 0.5h (was: 20m) > Python precommits should run E2E tests > -- > > Key: BEAM-5058 > URL: https://issues.apache.org/jira/browse/BEAM-5058 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > According to [https://beam.apache.org/contribute/testing/] (which I'm working > on), end-to-end tests should be run in precommit on each combination of > \{batch, streaming}x\{SDK language}x\{supported runner}. > At least 2 tests need to be added to Python's precommit: wordcount and > wordcount_streaming on Dataflow, and possibly on other supported runners > (direct runner and new runners plz). > These tests should be configured to run from a Gradle sub-project, so that > they're run in parallel to the unit tests. > Example that parallelizes Java precommit integration tests: > [https://github.com/apache/beam/pull/5731] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5058) Python precommits should run E2E tests
[ https://issues.apache.org/jira/browse/BEAM-5058?focusedWorklogId=155231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155231 ] ASF GitHub Bot logged work on BEAM-5058: Author: ASF GitHub Bot Created on: 17/Oct/18 00:48 Start Date: 17/Oct/18 00:48 Worklog Time Spent: 10m Work Description: udim commented on a change in pull request #6707: [BEAM-5058] Run basic ITs in Python Precommit URL: https://github.com/apache/beam/pull/6707#discussion_r225750474 ## File path: sdks/python/build.gradle ## @@ -226,6 +228,26 @@ task directRunnerIT(dependsOn: 'installGcpTest') { } } +task precommitIT(dependsOn: ['installGcpTest', 'sdist']) { + doLast { +// List of integration tests running in Python PreCommit. +def precommitTests = [ +"apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it", + "apache_beam.examples.streaming_wordcount_it_test:StreamingWordCountIT.test_streaming_wordcount_it", +] +def testOpts = [ Review comment: No need for `--attr=IT`, `--nologcapture`, `--nocapture`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155231) Time Spent: 0.5h (was: 20m) > Python precommits should run E2E tests > -- > > Key: BEAM-5058 > URL: https://issues.apache.org/jira/browse/BEAM-5058 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > According to [https://beam.apache.org/contribute/testing/] (which I'm working > on), end-to-end tests should be run in precommit on each combination of > \{batch, streaming}x\{SDK language}x\{supported runner}. > At least 2 tests need to be added to Python's precommit: wordcount and > wordcount_streaming on Dataflow, and possibly on other supported runners > (direct runner and new runners plz). > These tests should be configured to run from a Gradle sub-project, so that > they're run in parallel to the unit tests. > Example that parallelizes Java precommit integration tests: > [https://github.com/apache/beam/pull/5731] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5058) Python precommits should run E2E tests
[ https://issues.apache.org/jira/browse/BEAM-5058?focusedWorklogId=155229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155229 ] ASF GitHub Bot logged work on BEAM-5058: Author: ASF GitHub Bot Created on: 17/Oct/18 00:48 Start Date: 17/Oct/18 00:48 Worklog Time Spent: 10m Work Description: udim commented on a change in pull request #6707: [BEAM-5058] Run basic ITs in Python Precommit URL: https://github.com/apache/beam/pull/6707#discussion_r225751687 ## File path: sdks/python/build.gradle ## @@ -226,6 +228,26 @@ task directRunnerIT(dependsOn: 'installGcpTest') { } } +task precommitIT(dependsOn: ['installGcpTest', 'sdist']) { + doLast { +// List of integration tests running in Python PreCommit. +def precommitTests = [ +"apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it", + "apache_beam.examples.streaming_wordcount_it_test:StreamingWordCountIT.test_streaming_wordcount_it", +] +def testOpts = [ +"--tests=${precommitTests.join(',')}", +"--processes=4", +"--process-timeout=1800", // Total timeout includes all tests run. +] + +exec { + executable 'sh' + args '-c', ". ${envdir}/bin/activate && ./scripts/run_integration_test.sh --test_opts \"${testOpts.join(' ')}\"" Review comment: After parallelizing this task, please add a copy of it in `sdks/python/precommit/directrunner/build.gradle` with the option `--runner=TestDirectRunner`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155229) Time Spent: 0.5h (was: 20m) > Python precommits should run E2E tests > -- > > Key: BEAM-5058 > URL: https://issues.apache.org/jira/browse/BEAM-5058 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > According to [https://beam.apache.org/contribute/testing/] (which I'm working > on), end-to-end tests should be run in precommit on each combination of > \{batch, streaming}x\{SDK language}x\{supported runner}. > At least 2 tests need to be added to Python's precommit: wordcount and > wordcount_streaming on Dataflow, and possibly on other supported runners > (direct runner and new runners plz). > These tests should be configured to run from a Gradle sub-project, so that > they're run in parallel to the unit tests. > Example that parallelizes Java precommit integration tests: > [https://github.com/apache/beam/pull/5731] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5058) Python precommits should run E2E tests
[ https://issues.apache.org/jira/browse/BEAM-5058?focusedWorklogId=155232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155232 ] ASF GitHub Bot logged work on BEAM-5058: Author: ASF GitHub Bot Created on: 17/Oct/18 00:48 Start Date: 17/Oct/18 00:48 Worklog Time Spent: 10m Work Description: udim commented on a change in pull request #6707: [BEAM-5058] Run basic ITs in Python Precommit URL: https://github.com/apache/beam/pull/6707#discussion_r225750634 ## File path: sdks/python/build.gradle ## @@ -226,6 +228,26 @@ task directRunnerIT(dependsOn: 'installGcpTest') { } } +task precommitIT(dependsOn: ['installGcpTest', 'sdist']) { Review comment: sdist should run before installGcpTest. I believe gradle has a rule for that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155232) Time Spent: 40m (was: 0.5h) > Python precommits should run E2E tests > -- > > Key: BEAM-5058 > URL: https://issues.apache.org/jira/browse/BEAM-5058 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Udi Meiri >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > According to [https://beam.apache.org/contribute/testing/] (which I'm working > on), end-to-end tests should be run in precommit on each combination of > \{batch, streaming}x\{SDK language}x\{supported runner}. > At least 2 tests need to be added to Python's precommit: wordcount and > wordcount_streaming on Dataflow, and possibly on other supported runners > (direct runner and new runners plz). > These tests should be configured to run from a Gradle sub-project, so that > they're run in parallel to the unit tests. > Example that parallelizes Java precommit integration tests: > [https://github.com/apache/beam/pull/5731] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container
[ https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=155228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155228 ] ASF GitHub Bot logged work on BEAM-4130: Author: ASF GitHub Bot Created on: 17/Oct/18 00:32 Start Date: 17/Oct/18 00:32 Worklog Time Spent: 10m Work Description: tweise commented on issue #6703: [BEAM-4130] Add tests for FlinkJobServerDriver URL: https://github.com/apache/beam/pull/6703#issuecomment-430448030 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155228) Time Spent: 14h 20m (was: 14h 10m) > Portable Flink runner JobService entry point in a Docker container > -- > > Key: BEAM-4130 > URL: https://issues.apache.org/jira/browse/BEAM-4130 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Maximilian Michels >Priority: Minor > Fix For: 2.7.0 > > Time Spent: 14h 20m > Remaining Estimate: 0h > > The portable Flink runner exists as a Job Service that runs somewhere. We > need a main entry point that itself spins up the job service (and artifact > staging service). The main program itself should be packaged into an uberjar > such that it can be run locally or submitted to a Flink deployment via `flink > run`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5663) Add tox suites for various Python 3 versions
[ https://issues.apache.org/jira/browse/BEAM-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652724#comment-16652724 ] Valentyn Tymofieiev commented on BEAM-5663: --- I may be wrong but I suspect for some reason `@unittest.skipif` annotation did not get trigger in your test suite, and then the suite ran into BEAM-5623 which take long to finish. The Travis logs are truncated so I could see if we ran those tests or not. I tried to run py3 tox test suite from python:3.4-strech conatiner (see: https://s.apache.org/beam-py3-conversion-quick-start) and 117 tests didn't pass. The test suite finished within few minutes. > Add tox suites for various Python 3 versions > > > Key: BEAM-5663 > URL: https://issues.apache.org/jira/browse/BEAM-5663 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Manu Zhang >Priority: Minor > > Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test > failings across various Python 3 versions. It will be valuable to add tox > suites for Python 3.4, 3.5, 3.6 and 3.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk
Mike Kaplinskiy created BEAM-5775: - Summary: Make the spark runner not serialize data unless spark is spilling to disk Key: BEAM-5775 URL: https://issues.apache.org/jira/browse/BEAM-5775 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Mike Kaplinskiy Assignee: Amit Sela Currently for storage level MEMORY_ONLY, Beam does not coder-ify the data. This lets Spark keep the data in memory avoiding the serialization round trip. Unfortunately the logic is fairly coarse - as soon as you switch to MEMORY_AND_DISK, Beam coder-ifys the data even though Spark might have chosen to keep the data in memory, incurring the serialization overhead. Ideally Beam would serialize the data lazily - as Spark chooses to spill to disk. This would be a change in behavior when using beam, but luckily Spark has a solution for folks that want data serialized in memory - MEMORY_AND_DISK_SER will keep the data serialized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5741) Move "Contact Us" to a top-level link
[ https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652719#comment-16652719 ] Melissa Pashniak commented on BEAM-5741: Another possible option is removing or combining existing nav item(s)? But I'm not sure which we'd want to remove/combine as they all seem useful. > Move "Contact Us" to a top-level link > - > > Key: BEAM-5741 > URL: https://issues.apache.org/jira/browse/BEAM-5741 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > > It should be very easy to figure out how to get in touch with community. > "Contact Us" should be a top-level link on the page. > The page can also be improved with: > * Some basic text on how to use subscribe / unsubscribe links > * Recommendations on how to use various communications channels (Slack for > quick questions, dev@ for longer conversations. And all decisions should make > it back to dev@) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5741) Move "Contact Us" to a top-level link
[ https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652712#comment-16652712 ] Melissa Pashniak edited comment on BEAM-5741 at 10/17/18 12:15 AM: --- Would this be an addition section, or replace the Community section? I don't really agree with adding yet another item, there are two aspects here: 1) We have done a lot of rearranging of the navigation on the site, and as part of this, we looked at many sites when we landed on our current breakdown. On a majority of the sites that use a top nav structure, the contact us/mailing list pages are in a nav item "Community" (for example many Apache sites - Spark, Apex, Hadoop, Gearpump, and other big projects such as Tensorflow, Kubernetes, etc.) Because of this, we used the same "Community" terminology for consistency, and made the contact us page the default page that shows up when someone chooses Community. We used to have pull-down menus on the top nav, but we received feedback that it caused trouble for mobile devices because the menus were too long. We could attempt to put that back and only have a small subset of options, though it might be confusing to show some things but not all unless they click on the item. We could also move to a permanent static left nav structure to show more items at once (such as Flink, which has a "Getting help" page that is always visible), but then we'd lose the section-specific left nav when you choose a top nav item. 2) The other issue is one of remaining horizontal space. I am looking into adding searching capability/a search bar for the site, which would take up a big chunk of the remaining space after the top nav items. we are already nearing (imo) too many top level nav items. (some actually weren't enthused with how many are there even now) was (Author: melap): Would this replace the Community section? I don't really agree with this, there are two aspects here: 1) We have done a lot of rearranging of the navigation on the site, and as part of this, we looked at many sites when we landed on our current breakdown. On a majority of the sites that use a top nav structure, the contact us/mailing list pages are in a nav item "Community" (for example many Apache sites - Spark, Apex, Hadoop, Gearpump, and other big projects such as Tensorflow, Kubernetes, etc.) Because of this, we used the same "Community" terminology for consistency, and made the contact us page the default page that shows up when someone chooses Community. We used to have pull-down menus on the top nav, but we received feedback that it caused trouble for mobile devices because the menus were too long. We could attempt to put that back and only have a small subset of options, though it might be confusing to show some things but not all unless they click on the item. We could also move to a permanent static left nav structure to show more items at once (such as Flink, which has a "Getting help" page that is always visible), but then we'd lose the section-specific left nav when you choose a top nav item. 2) The other issue is one of remaining horizontal space. I am looking into adding searching capability/a search bar for the site, which would take up a big chunk of the remaining space after the top nav items. we are already nearing (imo) too many top level nav items. (some actually weren't enthused with how many are there even now) > Move "Contact Us" to a top-level link > - > > Key: BEAM-5741 > URL: https://issues.apache.org/jira/browse/BEAM-5741 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > > It should be very easy to figure out how to get in touch with community. > "Contact Us" should be a top-level link on the page. > The page can also be improved with: > * Some basic text on how to use subscribe / unsubscribe links > * Recommendations on how to use various communications channels (Slack for > quick questions, dev@ for longer conversations. And all decisions should make > it back to dev@) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5741) Move "Contact Us" to a top-level link
[ https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652712#comment-16652712 ] Melissa Pashniak edited comment on BEAM-5741 at 10/17/18 12:15 AM: --- Would this be an additional section, or replace the Community section? I don't really agree with adding yet another item, there are two aspects here: 1) We have done a lot of rearranging of the navigation on the site, and as part of this, we looked at many sites when we landed on our current breakdown. On a majority of the sites that use a top nav structure, the contact us/mailing list pages are in a nav item "Community" (for example many Apache sites - Spark, Apex, Hadoop, Gearpump, and other big projects such as Tensorflow, Kubernetes, etc.) Because of this, we used the same "Community" terminology for consistency, and made the contact us page the default page that shows up when someone chooses Community. We used to have pull-down menus on the top nav, but we received feedback that it caused trouble for mobile devices because the menus were too long. We could attempt to put that back and only have a small subset of options, though it might be confusing to show some things but not all unless they click on the item. We could also move to a permanent static left nav structure to show more items at once (such as Flink, which has a "Getting help" page that is always visible), but then we'd lose the section-specific left nav when you choose a top nav item. 2) The other issue is one of remaining horizontal space. I am looking into adding searching capability/a search bar for the site, which would take up a big chunk of the remaining space after the top nav items. we are already nearing (imo) too many top level nav items. (some actually weren't enthused with how many are there even now) was (Author: melap): Would this be an addition section, or replace the Community section? I don't really agree with adding yet another item, there are two aspects here: 1) We have done a lot of rearranging of the navigation on the site, and as part of this, we looked at many sites when we landed on our current breakdown. On a majority of the sites that use a top nav structure, the contact us/mailing list pages are in a nav item "Community" (for example many Apache sites - Spark, Apex, Hadoop, Gearpump, and other big projects such as Tensorflow, Kubernetes, etc.) Because of this, we used the same "Community" terminology for consistency, and made the contact us page the default page that shows up when someone chooses Community. We used to have pull-down menus on the top nav, but we received feedback that it caused trouble for mobile devices because the menus were too long. We could attempt to put that back and only have a small subset of options, though it might be confusing to show some things but not all unless they click on the item. We could also move to a permanent static left nav structure to show more items at once (such as Flink, which has a "Getting help" page that is always visible), but then we'd lose the section-specific left nav when you choose a top nav item. 2) The other issue is one of remaining horizontal space. I am looking into adding searching capability/a search bar for the site, which would take up a big chunk of the remaining space after the top nav items. we are already nearing (imo) too many top level nav items. (some actually weren't enthused with how many are there even now) > Move "Contact Us" to a top-level link > - > > Key: BEAM-5741 > URL: https://issues.apache.org/jira/browse/BEAM-5741 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > > It should be very easy to figure out how to get in touch with community. > "Contact Us" should be a top-level link on the page. > The page can also be improved with: > * Some basic text on how to use subscribe / unsubscribe links > * Recommendations on how to use various communications channels (Slack for > quick questions, dev@ for longer conversations. And all decisions should make > it back to dev@) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5741) Move "Contact Us" to a top-level link
[ https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652712#comment-16652712 ] Melissa Pashniak commented on BEAM-5741: Would this replace the Community section? I don't really agree with this, there are two aspects here: 1) We have done a lot of rearranging of the navigation on the site, and as part of this, we looked at many sites when we landed on our current breakdown. On a majority of the sites that use a top nav structure, the contact us/mailing list pages are in a nav item "Community" (for example many Apache sites - Spark, Apex, Hadoop, Gearpump, and other big projects such as Tensorflow, Kubernetes, etc.) Because of this, we used the same "Community" terminology for consistency, and made the contact us page the default page that shows up when someone chooses Community. We used to have pull-down menus on the top nav, but we received feedback that it caused trouble for mobile devices because the menus were too long. We could attempt to put that back and only have a small subset of options, though it might be confusing to show some things but not all unless they click on the item. We could also move to a permanent static left nav structure to show more items at once (such as Flink, which has a "Getting help" page that is always visible), but then we'd lose the section-specific left nav when you choose a top nav item. 2) The other issue is one of remaining horizontal space. I am looking into adding searching capability/a search bar for the site, which would take up a big chunk of the remaining space after the top nav items. we are already nearing (imo) too many top level nav items. (some actually weren't enthused with how many are there even now) > Move "Contact Us" to a top-level link > - > > Key: BEAM-5741 > URL: https://issues.apache.org/jira/browse/BEAM-5741 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > > It should be very easy to figure out how to get in touch with community. > "Contact Us" should be a top-level link on the page. > The page can also be improved with: > * Some basic text on how to use subscribe / unsubscribe links > * Recommendations on how to use various communications channels (Slack for > quick questions, dev@ for longer conversations. And all decisions should make > it back to dev@) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4504) Disconnect mergebot from apache/beam-site repository
[ https://issues.apache.org/jira/browse/BEAM-4504?focusedWorklogId=155221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155221 ] ASF GitHub Bot logged work on BEAM-4504: Author: ASF GitHub Bot Created on: 16/Oct/18 23:49 Start Date: 16/Oct/18 23:49 Worklog Time Spent: 10m Work Description: swegner commented on issue #6713: [BEAM-4504] Retire Jenkins jobs from apache/beam-site repository URL: https://github.com/apache/beam/pull/6713#issuecomment-430440123 Seed job [succeeded](https://builds.apache.org/job/beam_SeedJob/2809/). R: @Ardagan This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155221) Time Spent: 0.5h (was: 20m) > Disconnect mergebot from apache/beam-site repository > - > > Key: BEAM-4504 > URL: https://issues.apache.org/jira/browse/BEAM-4504 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: beam-site-automation-reliability > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5741) Move "Contact Us" to a top-level link
[ https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652674#comment-16652674 ] Scott Wegner commented on BEAM-5741: Yes, the feedback we got was that one of the most important pieces of documentation is how to reach out to the community. The 'contact-us' page is pretty good, but finding it is a bit difficult. It would be useful as a top-level link. /cc [~rohdesam] > Move "Contact Us" to a top-level link > - > > Key: BEAM-5741 > URL: https://issues.apache.org/jira/browse/BEAM-5741 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > > It should be very easy to figure out how to get in touch with community. > "Contact Us" should be a top-level link on the page. > The page can also be improved with: > * Some basic text on how to use subscribe / unsubscribe links > * Recommendations on how to use various communications channels (Slack for > quick questions, dev@ for longer conversations. And all decisions should make > it back to dev@) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4504) Disconnect mergebot from apache/beam-site repository
[ https://issues.apache.org/jira/browse/BEAM-4504?focusedWorklogId=155219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155219 ] ASF GitHub Bot logged work on BEAM-4504: Author: ASF GitHub Bot Created on: 16/Oct/18 23:45 Start Date: 16/Oct/18 23:45 Worklog Time Spent: 10m Work Description: swegner opened a new pull request #6713: [BEAM-4504] Retire Jenkins jobs from apache/beam-site repository URL: https://github.com/apache/beam/pull/6713 Website sources have been moved to apache/beam repository. This cleans up the Jenkins job definitions and removes some common code that was only used by those jobs. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155219) Time Spent: 10m Remaining Estimate: 0h > Disconnect mergebot from apache/beam-site repository > - > > Key: BEAM-4504 > URL: https://issues.apache.org/jira/
[jira] [Work logged] (BEAM-4504) Disconnect mergebot from apache/beam-site repository
[ https://issues.apache.org/jira/browse/BEAM-4504?focusedWorklogId=155220&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155220 ] ASF GitHub Bot logged work on BEAM-4504: Author: ASF GitHub Bot Created on: 16/Oct/18 23:45 Start Date: 16/Oct/18 23:45 Worklog Time Spent: 10m Work Description: swegner commented on issue #6713: [BEAM-4504] Retire Jenkins jobs from apache/beam-site repository URL: https://github.com/apache/beam/pull/6713#issuecomment-430439411 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155220) Time Spent: 20m (was: 10m) > Disconnect mergebot from apache/beam-site repository > - > > Key: BEAM-4504 > URL: https://issues.apache.org/jira/browse/BEAM-4504 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: beam-site-automation-reliability > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5663) Add tox suites for various Python 3 versions
[ https://issues.apache.org/jira/browse/BEAM-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652641#comment-16652641 ] Manu Zhang commented on BEAM-5663: -- [~tvalentyn], I simply run "./gradlew testPython3" for each environment. The Python 3.4 test is much much longer. Is there a flag that doesn't work in 3.4 ? > Add tox suites for various Python 3 versions > > > Key: BEAM-5663 > URL: https://issues.apache.org/jira/browse/BEAM-5663 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Manu Zhang >Priority: Minor > > Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test > failings across various Python 3 versions. It will be valuable to add tox > suites for Python 3.4, 3.5, 3.6 and 3.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard
[ https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=155209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155209 ] ASF GitHub Bot logged work on BEAM-5240: Author: ASF GitHub Bot Created on: 16/Oct/18 23:19 Start Date: 16/Oct/18 23:19 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6711: [BEAM-5240] Add Jira data to Beam post-commits dashboard URL: https://github.com/apache/beam/pull/6711#issuecomment-430434590 run python precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155209) Time Spent: 5h 10m (was: 5h) > Create post-commit tests dashboard > -- > > Key: BEAM-5240 > URL: https://issues.apache.org/jira/browse/BEAM-5240 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard
[ https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=155208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155208 ] ASF GitHub Bot logged work on BEAM-5240: Author: ASF GitHub Bot Created on: 16/Oct/18 23:19 Start Date: 16/Oct/18 23:19 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on issue #6711: [BEAM-5240] Add Jira data to Beam post-commits dashboard URL: https://github.com/apache/beam/pull/6711#issuecomment-430429358 run go precommits This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155208) Time Spent: 5h (was: 4h 50m) > Create post-commit tests dashboard > -- > > Key: BEAM-5240 > URL: https://issues.apache.org/jira/browse/BEAM-5240 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard
[ https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=155211&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155211 ] ASF GitHub Bot logged work on BEAM-5240: Author: ASF GitHub Bot Created on: 16/Oct/18 23:19 Start Date: 16/Oct/18 23:19 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6711: [BEAM-5240] Add Jira data to Beam post-commits dashboard URL: https://github.com/apache/beam/pull/6711#issuecomment-430434708 Running precommits to execute :rat target. Need to implement BEAM-5499 to avoid it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155211) Time Spent: 5h 20m (was: 5h 10m) > Create post-commit tests dashboard > -- > > Key: BEAM-5240 > URL: https://issues.apache.org/jira/browse/BEAM-5240 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard
[ https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=155206&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155206 ] ASF GitHub Bot logged work on BEAM-5240: Author: ASF GitHub Bot Created on: 16/Oct/18 23:19 Start Date: 16/Oct/18 23:19 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on issue #6711: [BEAM-5240] Add Jira data to Beam post-commits dashboard URL: https://github.com/apache/beam/pull/6711#issuecomment-430429424 run python precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155206) Time Spent: 4h 40m (was: 4.5h) > Create post-commit tests dashboard > -- > > Key: BEAM-5240 > URL: https://issues.apache.org/jira/browse/BEAM-5240 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5240) Create post-commit tests dashboard
[ https://issues.apache.org/jira/browse/BEAM-5240?focusedWorklogId=155207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155207 ] ASF GitHub Bot logged work on BEAM-5240: Author: ASF GitHub Bot Created on: 16/Oct/18 23:19 Start Date: 16/Oct/18 23:19 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on issue #6711: [BEAM-5240] Add Jira data to Beam post-commits dashboard URL: https://github.com/apache/beam/pull/6711#issuecomment-430429390 run go precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155207) Time Spent: 4h 50m (was: 4h 40m) > Create post-commit tests dashboard > -- > > Key: BEAM-5240 > URL: https://issues.apache.org/jira/browse/BEAM-5240 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5609) Improve Grafana dashboard: Add local testing infrastructure
[ https://issues.apache.org/jira/browse/BEAM-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652630#comment-16652630 ] Mikhail Gryzykhin commented on BEAM-5609: - I believe this is covered by .test-infra/metrics/docker-compose.yml by now. https://github.com/apache/beam/blob/master/.test-infra/metrics/docker-compose.yml It spins up whole service including data fetching. Will resolve ticket. > Improve Grafana dashboard: Add local testing infrastructure > --- > > Key: BEAM-5609 > URL: https://issues.apache.org/jira/browse/BEAM-5609 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Pablo Estrada >Assignee: Mikhail Gryzykhin >Priority: Major > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5609) Improve Grafana dashboard: Add local testing infrastructure
[ https://issues.apache.org/jira/browse/BEAM-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Gryzykhin resolved BEAM-5609. - Resolution: Fixed Fix Version/s: Not applicable > Improve Grafana dashboard: Add local testing infrastructure > --- > > Key: BEAM-5609 > URL: https://issues.apache.org/jira/browse/BEAM-5609 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Pablo Estrada >Assignee: Mikhail Gryzykhin >Priority: Major > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5663) Add tox suites for various Python 3 versions
[ https://issues.apache.org/jira/browse/BEAM-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652629#comment-16652629 ] Valentyn Tymofieiev commented on BEAM-5663: --- [~mauzhang] It also seems that your runs also included some tests that we currently skip in Python 3, for example I think 3.4 logs include a skipped test apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTestWithGrpc.test_pardo_metrics > Add tox suites for various Python 3 versions > > > Key: BEAM-5663 > URL: https://issues.apache.org/jira/browse/BEAM-5663 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Manu Zhang >Priority: Minor > > Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test > failings across various Python 3 versions. It will be valuable to add tox > suites for Python 3.4, 3.5, 3.6 and 3.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5774) beam_Release_Gradle_NightlySnapshot timed out
[ https://issues.apache.org/jira/browse/BEAM-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652623#comment-16652623 ] Kenneth Knowles commented on BEAM-5774: --- No, it appears to be a plain-and-simple timeout. > beam_Release_Gradle_NightlySnapshot timed out > - > > Key: BEAM-5774 > URL: https://issues.apache.org/jira/browse/BEAM-5774 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > > https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/209/ > Looking at the trend, this is not surprising: > https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/buildTimeTrend -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5774) beam_Release_Gradle_NightlySnapshot timed out
[ https://issues.apache.org/jira/browse/BEAM-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652622#comment-16652622 ] Kenneth Knowles commented on BEAM-5774: --- TBD whether this is BEAM-5249 > beam_Release_Gradle_NightlySnapshot timed out > - > > Key: BEAM-5774 > URL: https://issues.apache.org/jira/browse/BEAM-5774 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > > https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/209/ > Looking at the trend, this is not surprising: > https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/buildTimeTrend -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5663) Add tox suites for various Python 3 versions
[ https://issues.apache.org/jira/browse/BEAM-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652621#comment-16652621 ] Valentyn Tymofieiev edited comment on BEAM-5663 at 10/16/18 11:14 PM: -- Thanks, [~mauzhang]. I looked at the logs, and also verifed myself that some tests that pass on Python 3.5 on Jenkins, fail in other versions of the interpreter. FYI [~matthiasml6] [~RobbeSneyders] [~splovyt] [~Juta]. For example: python ./setup.py test -s apache_beam.typehints.typed_pipeline_test.SideInputTest.test_basic_side_input_hint fails on Python 3.4 with: == ERROR: test_basic_side_input_hint (apache_beam.typehints.typed_pipeline_test.SideInputTest) -- Traceback (most recent call last): File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 173, in test_basic_side_input_hint self._run_repeat_test(repeat) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 144, in _run_repeat_test self._run_repeat_test_good(repeat) File "/beam/sdks/python/apache_beam/options/pipeline_options.py", line 803, in wrapper f(*args, **kwargs) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 150, in _run_repeat_test_good result = ['a', 'bb', 'c'] | beam.Map(repeat, 3) File "/beam/sdks/python/apache_beam/transforms/ptransform.py", line 496, in __ror__ p.run().wait_until_finish() File "/beam/sdks/python/apache_beam/pipeline.py", line 403, in run self.to_runner_api(), self.runner, self._options).run(False) File "/beam/sdks/python/apache_beam/pipeline.py", line 416, in run return self.runner.run_pipeline(self) File "/beam/sdks/python/apache_beam/runners/direct/direct_runner.py", line 139, in run_pipeline return runner.run_pipeline(pipeline) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 231, in run_pipeline return self.run_via_runner_api(pipeline.to_runner_api()) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 234, in run_via_runner_api return self.run_stages(*self.create_stages(pipeline_proto)) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 967, in create_stages pcoll.coder_id = coders.get_id(coder) File "/beam/sdks/python/apache_beam/runners/pipeline_context.py", line 79, in get_id self._id_to_proto[id] = obj.to_runner_api(self._pipeline_context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 259, in to_runner_api component_coder_ids=[context.coders.get_id(c) for c in components]) File "/beam/sdks/python/apache_beam/coders/coders.py", line 259, in component_coder_ids=[context.coders.get_id(c) for c in components]) File "/beam/sdks/python/apache_beam/runners/pipeline_context.py", line 79, in get_id self._id_to_proto[id] = obj.to_runner_api(self._pipeline_context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 250, in to_runner_api urn, typed_param, components = self.to_runner_api_parameter(context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 276, in to_runner_api_parameter google.protobuf.wrappers_pb2.BytesValue(value=serialize_coder(self)), File "/beam/sdks/python/apache_beam/coders/coders.py", line 67, in serialize_coder pickler.dumps(coder)) TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple' was (Author: tvalentyn): Thanks, [~mauzhang]. I looked at the logs, and also verifed myself that some tests that pass on Python 3.5 on Jenkins, fail on Python 3.4. FYI [~matthiasml6] [~RobbeSneyders] [~splovyt] [~Juta]. For example: python ./setup.py test -s apache_beam.typehints.typed_pipeline_test.SideInputTest.test_basic_side_input_hint fails on Python 3.4 with: == ERROR: test_basic_side_input_hint (apache_beam.typehints.typed_pipeline_test.SideInputTest) -- Traceback (most recent call last): File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 173, in test_basic_side_input_hint self._run_repeat_test(repeat) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 144, in _run_repeat_test self._run_repeat_test_good(repeat) File "/beam/sdks/python/apache_beam/options/pipeline_options.py", line 803, in wrapper f(*args, **kwargs) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 150, in _run_repeat_test_good result = ['a', 'bb', 'c'] | beam.Map(repeat, 3) File "/beam/sdks/python/apache_beam/transforms/ptransform.py", line 496, in __ror__ p.run().wait_until_finish() File "/beam/sdks/python/a
[jira] [Commented] (BEAM-5663) Add tox suites for various Python 3 versions
[ https://issues.apache.org/jira/browse/BEAM-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652621#comment-16652621 ] Valentyn Tymofieiev commented on BEAM-5663: --- Thanks, [~mauzhang]. I verifed that some tests that pass on Python 3.5 on Jenkins, fail on Python 3.4. FYI [~matthiasml6] [~RobbeSneyders] [~splovyt] [~Juta]. For example: python ./setup.py test -s apache_beam.typehints.typed_pipeline_test.SideInputTest.test_basic_side_input_hint fails on Python 3.4 with: == ERROR: test_basic_side_input_hint (apache_beam.typehints.typed_pipeline_test.SideInputTest) -- Traceback (most recent call last): File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 173, in test_basic_side_input_hint self._run_repeat_test(repeat) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 144, in _run_repeat_test self._run_repeat_test_good(repeat) File "/beam/sdks/python/apache_beam/options/pipeline_options.py", line 803, in wrapper f(*args, **kwargs) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 150, in _run_repeat_test_good result = ['a', 'bb', 'c'] | beam.Map(repeat, 3) File "/beam/sdks/python/apache_beam/transforms/ptransform.py", line 496, in __ror__ p.run().wait_until_finish() File "/beam/sdks/python/apache_beam/pipeline.py", line 403, in run self.to_runner_api(), self.runner, self._options).run(False) File "/beam/sdks/python/apache_beam/pipeline.py", line 416, in run return self.runner.run_pipeline(self) File "/beam/sdks/python/apache_beam/runners/direct/direct_runner.py", line 139, in run_pipeline return runner.run_pipeline(pipeline) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 231, in run_pipeline return self.run_via_runner_api(pipeline.to_runner_api()) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 234, in run_via_runner_api return self.run_stages(*self.create_stages(pipeline_proto)) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 967, in create_stages pcoll.coder_id = coders.get_id(coder) File "/beam/sdks/python/apache_beam/runners/pipeline_context.py", line 79, in get_id self._id_to_proto[id] = obj.to_runner_api(self._pipeline_context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 259, in to_runner_api component_coder_ids=[context.coders.get_id(c) for c in components]) File "/beam/sdks/python/apache_beam/coders/coders.py", line 259, in component_coder_ids=[context.coders.get_id(c) for c in components]) File "/beam/sdks/python/apache_beam/runners/pipeline_context.py", line 79, in get_id self._id_to_proto[id] = obj.to_runner_api(self._pipeline_context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 250, in to_runner_api urn, typed_param, components = self.to_runner_api_parameter(context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 276, in to_runner_api_parameter google.protobuf.wrappers_pb2.BytesValue(value=serialize_coder(self)), File "/beam/sdks/python/apache_beam/coders/coders.py", line 67, in serialize_coder pickler.dumps(coder)) TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple' > Add tox suites for various Python 3 versions > > > Key: BEAM-5663 > URL: https://issues.apache.org/jira/browse/BEAM-5663 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Manu Zhang >Priority: Minor > > Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test > failings across various Python 3 versions. It will be valuable to add tox > suites for Python 3.4, 3.5, 3.6 and 3.7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5663) Add tox suites for various Python 3 versions
[ https://issues.apache.org/jira/browse/BEAM-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652621#comment-16652621 ] Valentyn Tymofieiev edited comment on BEAM-5663 at 10/16/18 11:13 PM: -- Thanks, [~mauzhang]. I looked at the logs, and also verifed myself that some tests that pass on Python 3.5 on Jenkins, fail on Python 3.4. FYI [~matthiasml6] [~RobbeSneyders] [~splovyt] [~Juta]. For example: python ./setup.py test -s apache_beam.typehints.typed_pipeline_test.SideInputTest.test_basic_side_input_hint fails on Python 3.4 with: == ERROR: test_basic_side_input_hint (apache_beam.typehints.typed_pipeline_test.SideInputTest) -- Traceback (most recent call last): File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 173, in test_basic_side_input_hint self._run_repeat_test(repeat) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 144, in _run_repeat_test self._run_repeat_test_good(repeat) File "/beam/sdks/python/apache_beam/options/pipeline_options.py", line 803, in wrapper f(*args, **kwargs) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 150, in _run_repeat_test_good result = ['a', 'bb', 'c'] | beam.Map(repeat, 3) File "/beam/sdks/python/apache_beam/transforms/ptransform.py", line 496, in __ror__ p.run().wait_until_finish() File "/beam/sdks/python/apache_beam/pipeline.py", line 403, in run self.to_runner_api(), self.runner, self._options).run(False) File "/beam/sdks/python/apache_beam/pipeline.py", line 416, in run return self.runner.run_pipeline(self) File "/beam/sdks/python/apache_beam/runners/direct/direct_runner.py", line 139, in run_pipeline return runner.run_pipeline(pipeline) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 231, in run_pipeline return self.run_via_runner_api(pipeline.to_runner_api()) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 234, in run_via_runner_api return self.run_stages(*self.create_stages(pipeline_proto)) File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 967, in create_stages pcoll.coder_id = coders.get_id(coder) File "/beam/sdks/python/apache_beam/runners/pipeline_context.py", line 79, in get_id self._id_to_proto[id] = obj.to_runner_api(self._pipeline_context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 259, in to_runner_api component_coder_ids=[context.coders.get_id(c) for c in components]) File "/beam/sdks/python/apache_beam/coders/coders.py", line 259, in component_coder_ids=[context.coders.get_id(c) for c in components]) File "/beam/sdks/python/apache_beam/runners/pipeline_context.py", line 79, in get_id self._id_to_proto[id] = obj.to_runner_api(self._pipeline_context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 250, in to_runner_api urn, typed_param, components = self.to_runner_api_parameter(context) File "/beam/sdks/python/apache_beam/coders/coders.py", line 276, in to_runner_api_parameter google.protobuf.wrappers_pb2.BytesValue(value=serialize_coder(self)), File "/beam/sdks/python/apache_beam/coders/coders.py", line 67, in serialize_coder pickler.dumps(coder)) TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple' was (Author: tvalentyn): Thanks, [~mauzhang]. I verifed that some tests that pass on Python 3.5 on Jenkins, fail on Python 3.4. FYI [~matthiasml6] [~RobbeSneyders] [~splovyt] [~Juta]. For example: python ./setup.py test -s apache_beam.typehints.typed_pipeline_test.SideInputTest.test_basic_side_input_hint fails on Python 3.4 with: == ERROR: test_basic_side_input_hint (apache_beam.typehints.typed_pipeline_test.SideInputTest) -- Traceback (most recent call last): File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 173, in test_basic_side_input_hint self._run_repeat_test(repeat) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 144, in _run_repeat_test self._run_repeat_test_good(repeat) File "/beam/sdks/python/apache_beam/options/pipeline_options.py", line 803, in wrapper f(*args, **kwargs) File "/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", line 150, in _run_repeat_test_good result = ['a', 'bb', 'c'] | beam.Map(repeat, 3) File "/beam/sdks/python/apache_beam/transforms/ptransform.py", line 496, in __ror__ p.run().wait_until_finish() File "/beam/sdks/python/apache_beam/pipeline.py", line 403, in run self.to_runne
[jira] [Commented] (BEAM-5057) beam_Release_Gradle_NightlySnapshot failing due to a Javadoc error
[ https://issues.apache.org/jira/browse/BEAM-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652620#comment-16652620 ] Kenneth Knowles commented on BEAM-5057: --- Is this now obsolete? > beam_Release_Gradle_NightlySnapshot failing due to a Javadoc error > -- > > Key: BEAM-5057 > URL: https://issues.apache.org/jira/browse/BEAM-5057 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/127/console] > [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/125/console] > > * What went wrong: > Execution failed for task ':beam-sdks-java-core:javadoc'. > > Javadoc generation failed. Generated Javadoc options file (useful for > > troubleshooting): > > '/home/jenkins/jenkins-slave/workspace/beam_Release_Gradle_NightlySnapshot/src/sdks/java/core/build/tmp/javadoc/javadoc.options' > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5774) beam_Release_Gradle_NightlySnapshot timed out
Kenneth Knowles created BEAM-5774: - Summary: beam_Release_Gradle_NightlySnapshot timed out Key: BEAM-5774 URL: https://issues.apache.org/jira/browse/BEAM-5774 Project: Beam Issue Type: Bug Components: build-system Reporter: Kenneth Knowles Assignee: Kenneth Knowles https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/209/ Looking at the trend, this is not surprising: https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/buildTimeTrend -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5773) Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for the Java Runtime Environment to continue."
[ https://issues.apache.org/jira/browse/BEAM-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652617#comment-16652617 ] Kenneth Knowles commented on BEAM-5773: --- Looks like the same thing as happed on [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1401/] only in this case Gradle could start threads but the Python test framework could not. {code} OpenBLAS blas_thread_init: RLIMIT_NPROC 10240 current, 10240 max Process SyncManager-1: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/managers.py", line 558, in _run_server server.serve_forever() File "/usr/lib/python2.7/multiprocessing/managers.py", line 184, in serve_forever t.start() File "/usr/lib/python2.7/threading.py", line 736, in start _start_new_thread(self.__bootstrap, ()) error: can't start new thread interrupted ./scripts/run_postcommit.sh: line 124: 32380 Segmentation fault (core dumped) python setup.py nosetests --attr $1 --nologcapture --processes=8 --process-timeout=3000 --test-pipeline-options="$JOINED_OPTS" $TESTS {code} > Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for > the Java Runtime Environment to continue." > -- > > Key: BEAM-5773 > URL: https://issues.apache.org/jira/browse/BEAM-5773 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > Jenkins failed on the Python Dataflow ValidatesRunner postcommit because it > Gradle allocate a thread. > [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1402/console] > Likely transient, but filing this to track if that is the case. > {code} > 15:07:52 [src] $ > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/gradlew > --info --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g > -Dorg.gradle.jvmargs=-Xmx4g :beam-sdks-python:validatesRunnerBatchTests > :beam-sdks-python:validatesRunnerStreamingTests > 15:07:52 # > 15:07:52 # There is insufficient memory for the Java Runtime Environment to > continue. > 15:07:52 # Cannot create GC thread. Out of system resources. > 15:07:52 # An error report file with more information is saved as: > 15:07:52 # > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/hs_err_pid31336.log > 15:07:53 Build step 'Invoke Gradle script' changed build result to FAILURE > 15:07:53 Build step 'Invoke Gradle script' marked build as failure > 15:07:56 Sending e-mails to: comm...@beam.apache.org > 15:07:57 No emails were triggered. > 15:07:57 Finished: FAILURE > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5773) Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for the Java Runtime Environment to continue."
[ https://issues.apache.org/jira/browse/BEAM-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5773: - Assignee: Kenneth Knowles > Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for > the Java Runtime Environment to continue." > -- > > Key: BEAM-5773 > URL: https://issues.apache.org/jira/browse/BEAM-5773 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > Jenkins failed on the Python Dataflow ValidatesRunner postcommit because it > Gradle allocate a thread. > [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1402/console] > Likely transient, but filing this to track if that is the case. > {code} > 15:07:52 [src] $ > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/gradlew > --info --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g > -Dorg.gradle.jvmargs=-Xmx4g :beam-sdks-python:validatesRunnerBatchTests > :beam-sdks-python:validatesRunnerStreamingTests > 15:07:52 # > 15:07:52 # There is insufficient memory for the Java Runtime Environment to > continue. > 15:07:52 # Cannot create GC thread. Out of system resources. > 15:07:52 # An error report file with more information is saved as: > 15:07:52 # > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/hs_err_pid31336.log > 15:07:53 Build step 'Invoke Gradle script' changed build result to FAILURE > 15:07:53 Build step 'Invoke Gradle script' marked build as failure > 15:07:56 Sending e-mails to: comm...@beam.apache.org > 15:07:57 No emails were triggered. > 15:07:57 Finished: FAILURE > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5773) Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for the Java Runtime Environment to continue."
[ https://issues.apache.org/jira/browse/BEAM-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652615#comment-16652615 ] Kenneth Knowles commented on BEAM-5773: --- Removed auto-assignee. > Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for > the Java Runtime Environment to continue." > -- > > Key: BEAM-5773 > URL: https://issues.apache.org/jira/browse/BEAM-5773 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > > Jenkins failed on the Python Dataflow ValidatesRunner postcommit because it > Gradle allocate a thread. > [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1402/console] > Likely transient, but filing this to track if that is the case. > {code} > 15:07:52 [src] $ > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/gradlew > --info --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g > -Dorg.gradle.jvmargs=-Xmx4g :beam-sdks-python:validatesRunnerBatchTests > :beam-sdks-python:validatesRunnerStreamingTests > 15:07:52 # > 15:07:52 # There is insufficient memory for the Java Runtime Environment to > continue. > 15:07:52 # Cannot create GC thread. Out of system resources. > 15:07:52 # An error report file with more information is saved as: > 15:07:52 # > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/hs_err_pid31336.log > 15:07:53 Build step 'Invoke Gradle script' changed build result to FAILURE > 15:07:53 Build step 'Invoke Gradle script' marked build as failure > 15:07:56 Sending e-mails to: comm...@beam.apache.org > 15:07:57 No emails were triggered. > 15:07:57 Finished: FAILURE > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5773) Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for the Java Runtime Environment to continue."
Kenneth Knowles created BEAM-5773: - Summary: Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for the Java Runtime Environment to continue." Key: BEAM-5773 URL: https://issues.apache.org/jira/browse/BEAM-5773 Project: Beam Issue Type: Bug Components: build-system Reporter: Kenneth Knowles Assignee: Luke Cwik Jenkins failed on the Python Dataflow ValidatesRunner postcommit because it Gradle allocate a thread. [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1402/console] Likely transient, but filing this to track if that is the case. {code} 15:07:52 [src] $ /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/gradlew --info --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g -Dorg.gradle.jvmargs=-Xmx4g :beam-sdks-python:validatesRunnerBatchTests :beam-sdks-python:validatesRunnerStreamingTests 15:07:52 # 15:07:52 # There is insufficient memory for the Java Runtime Environment to continue. 15:07:52 # Cannot create GC thread. Out of system resources. 15:07:52 # An error report file with more information is saved as: 15:07:52 # /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/hs_err_pid31336.log 15:07:53 Build step 'Invoke Gradle script' changed build result to FAILURE 15:07:53 Build step 'Invoke Gradle script' marked build as failure 15:07:56 Sending e-mails to: comm...@beam.apache.org 15:07:57 No emails were triggered. 15:07:57 Finished: FAILURE {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5730) Migrate Java test to use a staged worker jar
[ https://issues.apache.org/jira/browse/BEAM-5730?focusedWorklogId=155203&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155203 ] ASF GitHub Bot logged work on BEAM-5730: Author: ASF GitHub Bot Created on: 16/Oct/18 23:07 Start Date: 16/Oct/18 23:07 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #6694: [BEAM-5730] Migrate ITs using DataflowRunner to use custom worker URL: https://github.com/apache/beam/pull/6694#issuecomment-430432035 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155203) Time Spent: 0.5h (was: 20m) > Migrate Java test to use a staged worker jar > > > Key: BEAM-5730 > URL: https://issues.apache.org/jira/browse/BEAM-5730 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5773) Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for the Java Runtime Environment to continue."
[ https://issues.apache.org/jira/browse/BEAM-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5773: - Assignee: (was: Luke Cwik) > Failure in beam_PostCommit_Py_VR_Dataflow "There is insufficient memory for > the Java Runtime Environment to continue." > -- > > Key: BEAM-5773 > URL: https://issues.apache.org/jira/browse/BEAM-5773 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > > Jenkins failed on the Python Dataflow ValidatesRunner postcommit because it > Gradle allocate a thread. > [https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1402/console] > Likely transient, but filing this to track if that is the case. > {code} > 15:07:52 [src] $ > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/gradlew > --info --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g > -Dorg.gradle.jvmargs=-Xmx4g :beam-sdks-python:validatesRunnerBatchTests > :beam-sdks-python:validatesRunnerStreamingTests > 15:07:52 # > 15:07:52 # There is insufficient memory for the Java Runtime Environment to > continue. > 15:07:52 # Cannot create GC thread. Out of system resources. > 15:07:52 # An error report file with more information is saved as: > 15:07:52 # > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/hs_err_pid31336.log > 15:07:53 Build step 'Invoke Gradle script' changed build result to FAILURE > 15:07:53 Build step 'Invoke Gradle script' marked build as failure > 15:07:56 Sending e-mails to: comm...@beam.apache.org > 15:07:57 No emails were triggered. > 15:07:57 Finished: FAILURE > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5772) GCP IO tests slow down general Beam PostCommits
[ https://issues.apache.org/jira/browse/BEAM-5772?focusedWorklogId=155197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155197 ] ASF GitHub Bot logged work on BEAM-5772: Author: ASF GitHub Bot Created on: 16/Oct/18 22:58 Start Date: 16/Oct/18 22:58 Worklog Time Spent: 10m Work Description: pabloem opened a new pull request #6712: [BEAM-5772] Moving GCP IO tests to a new post commit suite URL: https://github.com/apache/beam/pull/6712 r: @Ardagan Can you help me with these jenkins jobs? : ) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155197) Time Spent: 10m Remaining Estimate: 0h > GCP IO tests slow down general Beam PostCommits > --- > > Key: BEAM-5772 > URL: https://issues.apache.org/jira/browse/BEAM-5772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, testing >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5772) GCP IO tests slow down general Beam PostCommits
Pablo Estrada created BEAM-5772: --- Summary: GCP IO tests slow down general Beam PostCommits Key: BEAM-5772 URL: https://issues.apache.org/jira/browse/BEAM-5772 Project: Beam Issue Type: Bug Components: io-java-gcp, testing Reporter: Pablo Estrada Assignee: Pablo Estrada -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5685) TopWikipediaSessionsIT is flaky
[ https://issues.apache.org/jira/browse/BEAM-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-5685. - Resolution: Fixed Fix Version/s: 2.8.0 > TopWikipediaSessionsIT is flaky > --- > > Key: BEAM-5685 > URL: https://issues.apache.org/jira/browse/BEAM-5685 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.8.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5693) Python SDK tests failing on Windows
[ https://issues.apache.org/jira/browse/BEAM-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-5693. - Resolution: Fixed Fix Version/s: 2.8.0 > Python SDK tests failing on Windows > --- > > Key: BEAM-5693 > URL: https://issues.apache.org/jira/browse/BEAM-5693 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5684) Need a test that verifies Flattening / not-flattening of BQ nested records
[ https://issues.apache.org/jira/browse/BEAM-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-5684. - Resolution: Fixed Fix Version/s: 2.8.0 > Need a test that verifies Flattening / not-flattening of BQ nested records > -- > > Key: BEAM-5684 > URL: https://issues.apache.org/jira/browse/BEAM-5684 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.8.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5407) [beam_PostCommit_Go_GradleBuild][testE2ETopWikiPages][RolledBack] Breaks post commit
[ https://issues.apache.org/jira/browse/BEAM-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-5407. - Resolution: Fixed Fix Version/s: 2.8.0 > [beam_PostCommit_Go_GradleBuild][testE2ETopWikiPages][RolledBack] Breaks post > commit > > > Key: BEAM-5407 > URL: https://issues.apache.org/jira/browse/BEAM-5407 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Batkhuyag Batsaikhan >Assignee: Pablo Estrada >Priority: Major > Fix For: 2.8.0 > > > Failing job url: > https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1482/testReport/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container
[ https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=155188&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155188 ] ASF GitHub Bot logged work on BEAM-4130: Author: ASF GitHub Bot Created on: 16/Oct/18 22:48 Start Date: 16/Oct/18 22:48 Worklog Time Spent: 10m Work Description: tweise commented on issue #6703: [BEAM-4130] Add tests for FlinkJobServerDriver URL: https://github.com/apache/beam/pull/6703#issuecomment-430428033 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155188) Time Spent: 14h 10m (was: 14h) > Portable Flink runner JobService entry point in a Docker container > -- > > Key: BEAM-4130 > URL: https://issues.apache.org/jira/browse/BEAM-4130 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Maximilian Michels >Priority: Minor > Fix For: 2.7.0 > > Time Spent: 14h 10m > Remaining Estimate: 0h > > The portable Flink runner exists as a Job Service that runs somewhere. We > need a main entry point that itself spins up the job service (and artifact > staging service). The main program itself should be packaged into an uberjar > such that it can be run locally or submitted to a Flink deployment via `flink > run`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155183 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 22:44 Start Date: 16/Oct/18 22:44 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#discussion_r225731222 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser): type=str, help='GCE minimum CPU platform. Default is determined by GCP.' ) +parser.add_argument( Review comment: In light of the discussion here on the dev@ list related to runner options (https://lists.apache.org/thread.html/78fe33dc41b04886f5355d66d50359265bfa2985580bb70f79c53545@%3Cdev.beam.apache.org%3E). Would it be better to expose this as a runner option? @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155183) Time Spent: 4h 50m (was: 4h 40m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155184 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 22:44 Start Date: 16/Oct/18 22:44 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#discussion_r225732036 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser): type=str, help='GCE minimum CPU platform. Default is determined by GCP.' ) +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' Review comment: Could you update the description here. We would not expect users to use this option typically. Biggest use case is probably development related changes. And it also cannot be used for legacy pipelines either. (Should this be an error, if fn api experiment is not set but this flag is used?) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155184) Time Spent: 5h (was: 4h 50m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5741) Move "Contact Us" to a top-level link
[ https://issues.apache.org/jira/browse/BEAM-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652569#comment-16652569 ] Melissa Pashniak commented on BEAM-5741: what do you mean by top-level link here? as in another item in the top items (documentation, SDKs, community, etc.)? > Move "Contact Us" to a top-level link > - > > Key: BEAM-5741 > URL: https://issues.apache.org/jira/browse/BEAM-5741 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Scott Wegner >Priority: Major > > It should be very easy to figure out how to get in touch with community. > "Contact Us" should be a top-level link on the page. > The page can also be improved with: > * Some basic text on how to use subscribe / unsubscribe links > * Recommendations on how to use various communications channels (Slack for > quick questions, dev@ for longer conversations. And all decisions should make > it back to dev@) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=155171&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155171 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 16/Oct/18 22:39 Start Date: 16/Oct/18 22:39 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#issuecomment-430425870 Very cool. Thanks Micah! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155171) Time Spent: 5h 10m (was: 5h) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 5h 10m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=155172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155172 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 16/Oct/18 22:39 Start Date: 16/Oct/18 22:39 Worklog Time Spent: 10m Work Description: pabloem closed pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java index 42b9c1114a7..2b276f404c7 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java @@ -17,6 +17,9 @@ */ package org.apache.beam.runners.flink; +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.auto.service.AutoService; import com.google.common.collect.BiMap; import com.google.common.collect.HashMultiset; import com.google.common.collect.ImmutableMap; @@ -34,6 +37,7 @@ import java.util.TreeMap; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.runners.core.SystemReduceFn; +import org.apache.beam.runners.core.construction.NativeTransforms; import org.apache.beam.runners.core.construction.PTransformTranslation; import org.apache.beam.runners.core.construction.RehydratedComponents; import org.apache.beam.runners.core.construction.RunnerPCollectionView; @@ -52,6 +56,7 @@ import org.apache.beam.runners.flink.translation.wrappers.streaming.SingletonKeyedWorkItemCoder; import org.apache.beam.runners.flink.translation.wrappers.streaming.WindowDoFnOperator; import org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector; +import org.apache.beam.runners.flink.translation.wrappers.streaming.io.StreamingImpulseSource; import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.runners.fnexecution.wire.WireCoders; import org.apache.beam.sdk.coders.ByteArrayCoder; @@ -156,6 +161,9 @@ public StreamExecutionEnvironment getExecutionEnvironment() { void translate(String id, RunnerApi.Pipeline pipeline, T t); } + private static final String STREAMING_IMPULSE_TRANSFORM_URN = + "flink:transform:streaming_impulse:v1"; + private final Map> urnToTransformTranslator; @@ -165,6 +173,7 @@ public StreamExecutionEnvironment getExecutionEnvironment() { translatorMap.put(PTransformTranslation.FLATTEN_TRANSFORM_URN, this::translateFlatten); translatorMap.put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, this::translateGroupByKey); translatorMap.put(PTransformTranslation.IMPULSE_TRANSFORM_URN, this::translateImpulse); +translatorMap.put(STREAMING_IMPULSE_TRANSFORM_URN, this::translateStreamingImpulse); translatorMap.put( PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, this::translateAssignWindows); translatorMap.put(ExecutableStage.URN, this::translateExecutableStage); @@ -403,6 +412,40 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + /** Predicate to determine whether a URN is a Flink native transform. */ + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URN.equals( + PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { + throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +Dat
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155170 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 22:38 Start Date: 16/Oct/18 22:38 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430425707 How does this interact with installing the packages in boot.go. Would not this (https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L104) fail? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155170) Time Spent: 4h 40m (was: 4.5h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5627) Several IO tests fail in Python 3 when accessing a temporary file with TypeError: a bytes-like object is required, not 'str'
[ https://issues.apache.org/jira/browse/BEAM-5627?focusedWorklogId=155169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155169 ] ASF GitHub Bot logged work on BEAM-5627: Author: ASF GitHub Bot Created on: 16/Oct/18 22:37 Start Date: 16/Oct/18 22:37 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6671: [BEAM-5627] Fix sources test for py3. URL: https://github.com/apache/beam/pull/6671#issuecomment-430425580 Thanks. Please hold on this PR. Not ready yet, fails on beamimport internal testing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155169) Time Spent: 2h 40m (was: 2.5h) > Several IO tests fail in Python 3 when accessing a temporary file with > TypeError: a bytes-like object is required, not 'str' > -- > > Key: BEAM-5627 > URL: https://issues.apache.org/jira/browse/BEAM-5627 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Rakesh Kumar >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ERROR: test_split_at_fraction_exhaustive > (apache_beam.io.source_test_utils_test.SourceTestUtilsTest) > -- > Traceback (most recent call last): >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", > line 120, in test_split_at_fraction_exhaustive > source = self._create_source(data) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", > line 43, in _create_source > source = LineSource(self._create_file_with_data(data)) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", > line 35, in _create_file_with_data > f.write(line + '\n') >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/tempfile.py", > line 622, in func_wrapper > return func(*args, **kwargs) > TypeError: a bytes-like object is required, not 'str' > Also similar: > == > ERROR: test_file_sink_writing > (apache_beam.io.filebasedsink_test.TestFileBasedSink) > -- > Traceback (most recent call last): >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink_test.py", line 121, in > test_file_sink_writing > init_token, writer_results = self._common_init(sink) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink_test.py", line 103, in _common_init > writer1 = sink.open_writer(init_token, '1') > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/options/value_provider.py", line 133, in _f > return fnc(self, *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink.py", line 185, in open_writer > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink.py", line 385, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink_test.py", line 82, in open > file_handle.write('[start]') > TypeError: a bytes-like object is required, not 'str' -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=155168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155168 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 16/Oct/18 22:37 Start Date: 16/Oct/18 22:37 Worklog Time Spent: 10m Work Description: mwylde commented on issue #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#issuecomment-430425385 Style checks are passing, should be good to merge. Thanks for the reviews! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155168) Time Spent: 5h (was: 4h 50m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 5h > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5627) Several IO tests fail in Python 3 when accessing a temporary file with TypeError: a bytes-like object is required, not 'str'
[ https://issues.apache.org/jira/browse/BEAM-5627?focusedWorklogId=155165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155165 ] ASF GitHub Bot logged work on BEAM-5627: Author: ASF GitHub Bot Created on: 16/Oct/18 22:34 Start Date: 16/Oct/18 22:34 Worklog Time Spent: 10m Work Description: manuzhang commented on issue #6671: [BEAM-5627] Fix sources test for py3. URL: https://github.com/apache/beam/pull/6671#issuecomment-430424774 R: @tvalentyn @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155165) Time Spent: 2.5h (was: 2h 20m) > Several IO tests fail in Python 3 when accessing a temporary file with > TypeError: a bytes-like object is required, not 'str' > -- > > Key: BEAM-5627 > URL: https://issues.apache.org/jira/browse/BEAM-5627 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Rakesh Kumar >Priority: Major > Fix For: Not applicable > > Time Spent: 2.5h > Remaining Estimate: 0h > > ERROR: test_split_at_fraction_exhaustive > (apache_beam.io.source_test_utils_test.SourceTestUtilsTest) > -- > Traceback (most recent call last): >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", > line 120, in test_split_at_fraction_exhaustive > source = self._create_source(data) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", > line 43, in _create_source > source = LineSource(self._create_file_with_data(data)) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils_test.py", > line 35, in _create_file_with_data > f.write(line + '\n') >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/tempfile.py", > line 622, in func_wrapper > return func(*args, **kwargs) > TypeError: a bytes-like object is required, not 'str' > Also similar: > == > ERROR: test_file_sink_writing > (apache_beam.io.filebasedsink_test.TestFileBasedSink) > -- > Traceback (most recent call last): >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink_test.py", line 121, in > test_file_sink_writing > init_token, writer_results = self._common_init(sink) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink_test.py", line 103, in _common_init > writer1 = sink.open_writer(init_token, '1') > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/options/value_provider.py", line 133, in _f > return fnc(self, *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink.py", line 185, in open_writer > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink.py", line 385, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/ >apache_beam/io/filebasedsink_test.py", line 82, in open > file_handle.write('[start]') > TypeError: a bytes-like object is required, not 'str' -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=155152&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155152 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 16/Oct/18 22:10 Start Date: 16/Oct/18 22:10 Worklog Time Spent: 10m Work Description: mwylde commented on issue #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#issuecomment-430419035 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155152) Time Spent: 4h 50m (was: 4h 40m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 4h 50m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5176) FailOnWarnings behave differently between CLI and Intellij build
[ https://issues.apache.org/jira/browse/BEAM-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652531#comment-16652531 ] Kenneth Knowles commented on BEAM-5176: --- My mistake; I was on a funky branch. > FailOnWarnings behave differently between CLI and Intellij build > - > > Key: BEAM-5176 > URL: https://issues.apache.org/jira/browse/BEAM-5176 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Etienne Chauchot >Assignee: Kenneth Knowles >Priority: Major > > In command line the build passes but fails on the IDE because of warnings. > To make it pass I had to put false in failOnWarnings in ApplyJavaNature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155151 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 22:06 Start Date: 16/Oct/18 22:06 Worklog Time Spent: 10m Work Description: pabloem closed pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/options/pipeline_options.py b/sdks/python/apache_beam/options/pipeline_options.py index a0059dbb381..357c97ea6da 100644 --- a/sdks/python/apache_beam/options/pipeline_options.py +++ b/sdks/python/apache_beam/options/pipeline_options.py @@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser): type=str, help='GCE minimum CPU platform. Default is determined by GCP.' ) +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' +) def validate(self, validator): errors = [] diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py index 1acd3488524..4143f2dbb1d 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py @@ -381,6 +381,13 @@ def run_pipeline(self, pipeline): self.dataflow_client = apiclient.DataflowApplicationClient( pipeline._options) +dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None) +if dataflow_worker_jar is not None: + experiments = ["use_staged_dataflow_worker_jar"] + if debug_options.experiments is not None: +experiments = list(set(experiments + debug_options.experiments)) + debug_options.experiments = experiments + # Create the job description and send a request to the service. The result # can be None if there is no need to send a request to the service (e.g. # template creation). If a request was sent and failed then the call will diff --git a/sdks/python/apache_beam/runners/portability/stager.py b/sdks/python/apache_beam/runners/portability/stager.py index ef7401ac6aa..cd7e24fce51 100644 --- a/sdks/python/apache_beam/runners/portability/stager.py +++ b/sdks/python/apache_beam/runners/portability/stager.py @@ -59,6 +59,7 @@ from apache_beam.internal import pickler from apache_beam.io.filesystems import FileSystems from apache_beam.options.pipeline_options import SetupOptions +from apache_beam.options.pipeline_options import WorkerOptions # TODO(angoenka): Remove reference to dataflow internal names from apache_beam.runners.dataflow.internal import names from apache_beam.utils import processes @@ -123,8 +124,7 @@ def stage_job_resources(self, Returns: A list of file names (no paths) for the resources staged. All the - files - are assumed to be staged at staging_location. + files are assumed to be staged at staging_location. Raises: RuntimeError: If files specified are not found or error encountered @@ -256,6 +256,14 @@ def stage_job_resources(self, 'The file "%s" cannot be found. Its location was specified by ' 'the --sdk_location command-line option.' % sdk_path) +worker_options = options.view_as(WorkerOptions) +dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None) +if dataflow_worker_jar is not None: + jar_staged_filename = 'dataflow-worker.jar' + staged_path = FileSystems.join(staging_location, jar_staged_filename) + self.stage_artifact(dataflow_worker_jar, staged_path) + resources.append(jar_staged_filename) + # Delete all temp files created while staging job resources. shutil.rmtree(temp_dir) retrieval_token = self.commit_manifest() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155151) Time Spent: 4.5h (was: 4h 20m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > P
[jira] [Commented] (BEAM-5176) FailOnWarnings behave differently between CLI and Intellij build
[ https://issues.apache.org/jira/browse/BEAM-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652520#comment-16652520 ] Kenneth Knowles commented on BEAM-5176: --- Incidentally this repros on the command line for me suddenly. > FailOnWarnings behave differently between CLI and Intellij build > - > > Key: BEAM-5176 > URL: https://issues.apache.org/jira/browse/BEAM-5176 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Etienne Chauchot >Assignee: Kenneth Knowles >Priority: Major > > In command line the build passes but fails on the IDE because of warnings. > To make it pass I had to put false in failOnWarnings in ApplyJavaNature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5176) FailOnWarnings behave differently between CLI and Intellij build
[ https://issues.apache.org/jira/browse/BEAM-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652517#comment-16652517 ] Kenneth Knowles commented on BEAM-5176: --- Actually when I do {{./gradlew --debug}} I see the following passed to the failing javac command: {code:java} -Xlint:all -Werror -XepDisableWarningsInGeneratedCode -XepExcludedPaths:(.*/)?(build/generated .*avro-java|build/generated)/.* -Xep:MutableConstantField:OFF -Xlint:-options -Xlint:-cast -Xlint:-deprecation -Xlint:-processing -Xlint:-rawtypes -Xlint: -serial -Xlint:-try -Xlint:-unchecked -Xlint:-varargs{code} > FailOnWarnings behave differently between CLI and Intellij build > - > > Key: BEAM-5176 > URL: https://issues.apache.org/jira/browse/BEAM-5176 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Etienne Chauchot >Assignee: Kenneth Knowles >Priority: Major > > In command line the build passes but fails on the IDE because of warnings. > To make it pass I had to put false in failOnWarnings in ApplyJavaNature -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5765) Document IntelliJ workflow: Perform a full build
[ https://issues.apache.org/jira/browse/BEAM-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner closed BEAM-5765. -- Resolution: Fixed Assignee: Scott Wegner Fix Version/s: Not applicable > Document IntelliJ workflow: Perform a full build > > > Key: BEAM-5765 > URL: https://issues.apache.org/jira/browse/BEAM-5765 > Project: Beam > Issue Type: Sub-task > Components: build-system, website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Fix For: Not applicable > > > The current IntelliJ documentation is not well organized. The plan is to > re-organize it into a set of developer workflows, with very prescriptive > steps that are easy to follow and validate that they are still working. > This task tracks writing documentation for the scenario: "How-to: Perform a > full build" > The proposed set of workflows to document is listed in this notes doc: > https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit?usp=sharing > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5767) Document IntelliJ workflow: Run a single unit test
[ https://issues.apache.org/jira/browse/BEAM-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner reassigned BEAM-5767: -- Assignee: Scott Wegner > Document IntelliJ workflow: Run a single unit test > -- > > Key: BEAM-5767 > URL: https://issues.apache.org/jira/browse/BEAM-5767 > Project: Beam > Issue Type: Sub-task > Components: build-system, website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > > The current IntelliJ documentation is not well organized. The plan is to > re-organize it into a set of developer workflows, with very prescriptive > steps that are easy to follow and validate that they are still working. > This task tracks writing documentation for the scenario: "How-to: Run a > single unit test" > The proposed set of workflows to document is listed in this notes doc: > https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit?usp=sharing > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5766) Document IntelliJ workflow: Build and test a single module
[ https://issues.apache.org/jira/browse/BEAM-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner closed BEAM-5766. -- Resolution: Fixed Fix Version/s: Not applicable > Document IntelliJ workflow: Build and test a single module > -- > > Key: BEAM-5766 > URL: https://issues.apache.org/jira/browse/BEAM-5766 > Project: Beam > Issue Type: Sub-task > Components: build-system, website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Fix For: Not applicable > > > The current IntelliJ documentation is not well organized. The plan is to > re-organize it into a set of developer workflows, with very prescriptive > steps that are easy to follow and validate that they are still working. > This task tracks writing documentation for the scenario: "How-to: Build and > test a single module" > The proposed set of workflows to document is listed in this notes doc: > https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit?usp=sharing > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5766) Document IntelliJ workflow: Build and test a single module
[ https://issues.apache.org/jira/browse/BEAM-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner reassigned BEAM-5766: -- Assignee: Scott Wegner > Document IntelliJ workflow: Build and test a single module > -- > > Key: BEAM-5766 > URL: https://issues.apache.org/jira/browse/BEAM-5766 > Project: Beam > Issue Type: Sub-task > Components: build-system, website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > > The current IntelliJ documentation is not well organized. The plan is to > re-organize it into a set of developer workflows, with very prescriptive > steps that are easy to follow and validate that they are still working. > This task tracks writing documentation for the scenario: "How-to: Build and > test a single module" > The proposed set of workflows to document is listed in this notes doc: > https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit?usp=sharing > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4663) Implement Cost calculations for Cost-Based Optimization (CBO)
[ https://issues.apache.org/jira/browse/BEAM-4663?focusedWorklogId=155114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155114 ] ASF GitHub Bot logged work on BEAM-4663: Author: ASF GitHub Bot Created on: 16/Oct/18 20:50 Start Date: 16/Oct/18 20:50 Worklog Time Spent: 10m Work Description: apilloud commented on issue #6656: [BEAM-4663] [SQL] CBO cost calculation URL: https://github.com/apache/beam/pull/6656#issuecomment-430395733 Overriding Calcite's cost functions in Beam SQL isn't going to buy us much until we implement `getStatistic` in BeamCalciteTable instead of using [UNKNOWN](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/schema/Statistics.java#L37). Calcite heavily weights RowCount and [it is the only attribute considered](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L98) in the initial sort. This also drops important internal information in the cost model. The builtin [Aggregate](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java#L317) prefers the `$SUM0` operator via the cost model. The builtin [Join](https://github.com/apache/calcite/blob/d59b639d27da704f00eff616324a2c04aa06f84c/core/src/main/java/org/apache/calcite/rel/core/Join.java#L196) takes into account the join condition via the row count estimate. If we are going to do this, we need to extend the builtin cost model rather than overriding it to preserve this. I'm also not convinced that the internal model's assumption that dIo = 0 is wrong. (That appears to be the primary difference here.) Outside of Aggregate operators that assumption is effectively true in Dataflow. This is an area where we should have tests showing that our model produces better plans than the default. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155114) Time Spent: 1h 50m (was: 1h 40m) > Implement Cost calculations for Cost-Based Optimization (CBO) > -- > > Key: BEAM-4663 > URL: https://issues.apache.org/jira/browse/BEAM-4663 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > To support CBO, we should implement methods in each Beam*Rel.java. > computeSelfCost(...) as our first step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5058) Python precommits should run E2E tests
[ https://issues.apache.org/jira/browse/BEAM-5058?focusedWorklogId=155113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155113 ] ASF GitHub Bot logged work on BEAM-5058: Author: ASF GitHub Bot Created on: 16/Oct/18 20:48 Start Date: 16/Oct/18 20:48 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #6707: [BEAM-5058] Run basic ITs in Python Precommit URL: https://github.com/apache/beam/pull/6707#issuecomment-430395043 PreCommit passed. @udim @aaltay Please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155113) Time Spent: 20m (was: 10m) > Python precommits should run E2E tests > -- > > Key: BEAM-5058 > URL: https://issues.apache.org/jira/browse/BEAM-5058 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Udi Meiri >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > According to [https://beam.apache.org/contribute/testing/] (which I'm working > on), end-to-end tests should be run in precommit on each combination of > \{batch, streaming}x\{SDK language}x\{supported runner}. > At least 2 tests need to be added to Python's precommit: wordcount and > wordcount_streaming on Dataflow, and possibly on other supported runners > (direct runner and new runners plz). > These tests should be configured to run from a Gradle sub-project, so that > they're run in parallel to the unit tests. > Example that parallelizes Java precommit integration tests: > [https://github.com/apache/beam/pull/5731] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container
[ https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=155112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155112 ] ASF GitHub Bot logged work on BEAM-4130: Author: ASF GitHub Bot Created on: 16/Oct/18 20:44 Start Date: 16/Oct/18 20:44 Worklog Time Spent: 10m Work Description: tweise edited a comment on issue #6703: [BEAM-4130] Add tests for FlinkJobServerDriver URL: https://github.com/apache/beam/pull/6703#issuecomment-430379066 @aaltay note that we are waiting to merge this PR - it is blocked by unrelated Java pre-commit issues. This PR needs to go into the release. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155112) Time Spent: 14h (was: 13h 50m) > Portable Flink runner JobService entry point in a Docker container > -- > > Key: BEAM-4130 > URL: https://issues.apache.org/jira/browse/BEAM-4130 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Maximilian Michels >Priority: Minor > Fix For: 2.7.0 > > Time Spent: 14h > Remaining Estimate: 0h > > The portable Flink runner exists as a Job Service that runs somewhere. We > need a main entry point that itself spins up the job service (and artifact > staging service). The main program itself should be packaged into an uberjar > such that it can be run locally or submitted to a Flink deployment via `flink > run`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5384) [SQL] Calcite optimizes away LogicalProject
[ https://issues.apache.org/jira/browse/BEAM-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652385#comment-16652385 ] Rui Wang edited comment on BEAM-5384 at 10/16/18 8:19 PM: -- I have seen that in BeamSQL, LogicalProject is gone for query "SELECT key, COUNT( * ) FROM TABLE GROUP BY key". was (Author: amaliujia): I have seen that in BeamSQL, LogicalProject is gone for query "SELECT key, COUNT(*) FROM TABLE GROUP BY key". > [SQL] Calcite optimizes away LogicalProject > --- > > Key: BEAM-5384 > URL: https://issues.apache.org/jira/browse/BEAM-5384 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > *From > [https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution] > :* > I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform > and writes the results to BigQuery. > When I don't do any aggregation in my SQL statement it works fine: > {code:java} > .. > PCollection outputStream = > sqlRows.apply( > "sql_transform", > SqlTransform.query("select views from PCOLLECTION")); > outputStream.setCoder(SCHEMA.getRowCoder()); > .. > {code} > However, when I try to aggregate with a sum then it fails (throws a > CannotPlanException exception): > {code:java} > .. > PCollection outputStream = > sqlRows.apply( > "sql_transform", > SqlTransform.query("select wikimedia_project, > sum(views) from PCOLLECTION group by wikimedia_project")); > outputStream.setCoder(SCHEMA.getRowCoder()); > .. > {code} > Stacktrace: > {code:java} > Step #1: 11:47:37,562 0[main] INFO > org.apache.beam.runners.dataflow.DataflowRunner - > PipelineOptions.filesToStage was not specified. Defaulting to files from the > classpath: will stage 117 files. Enable logging at DEBUG level to see which > files will be staged. > Step #1: 11:47:39,845 2283 [main] INFO > org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL: > Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`) > Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION` > Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project` > Step #1: 11:47:40,387 2825 [main] INFO > org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan> > Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)]) > Step #1: BeamIOSourceRel(table=[[beam, PCOLLECTION]]) > Step #1: > Step #1: Exception in thread "main" > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner > state: > Step #1: > Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[] > Step #1: Original rel: > Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], > group=[{0}], EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = > {11.375000476837158 rows, 0.0 cpu, 0.0 io}, id = 5 > Step #1: BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], > table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 > rows, 101.0 cpu, 0.0 io}, id = 2 > Step #1: > Step #1: Sets: > Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views) > Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81 > Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405 > Step #1: > rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1) > Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9 > Step #1: > rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)), > rowcount=10.0, cumulative cost={inf} > Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0 > Step #1: > rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]), > rowcount=10.0, cumulative cost={inf} > Step #1: > Step #1: > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448) > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildChea
[jira] [Commented] (BEAM-5384) [SQL] Calcite optimizes away LogicalProject
[ https://issues.apache.org/jira/browse/BEAM-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652385#comment-16652385 ] Rui Wang commented on BEAM-5384: I have seen that in BeamSQL, LogicalProject is gone for query "SELECT key, COUNT(*) FROM TABLE GROUP BY key". > [SQL] Calcite optimizes away LogicalProject > --- > > Key: BEAM-5384 > URL: https://issues.apache.org/jira/browse/BEAM-5384 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > *From > [https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution] > :* > I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform > and writes the results to BigQuery. > When I don't do any aggregation in my SQL statement it works fine: > {code:java} > .. > PCollection outputStream = > sqlRows.apply( > "sql_transform", > SqlTransform.query("select views from PCOLLECTION")); > outputStream.setCoder(SCHEMA.getRowCoder()); > .. > {code} > However, when I try to aggregate with a sum then it fails (throws a > CannotPlanException exception): > {code:java} > .. > PCollection outputStream = > sqlRows.apply( > "sql_transform", > SqlTransform.query("select wikimedia_project, > sum(views) from PCOLLECTION group by wikimedia_project")); > outputStream.setCoder(SCHEMA.getRowCoder()); > .. > {code} > Stacktrace: > {code:java} > Step #1: 11:47:37,562 0[main] INFO > org.apache.beam.runners.dataflow.DataflowRunner - > PipelineOptions.filesToStage was not specified. Defaulting to files from the > classpath: will stage 117 files. Enable logging at DEBUG level to see which > files will be staged. > Step #1: 11:47:39,845 2283 [main] INFO > org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL: > Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`) > Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION` > Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project` > Step #1: 11:47:40,387 2825 [main] INFO > org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan> > Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)]) > Step #1: BeamIOSourceRel(table=[[beam, PCOLLECTION]]) > Step #1: > Step #1: Exception in thread "main" > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner > state: > Step #1: > Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[] > Step #1: Original rel: > Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], > group=[{0}], EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = > {11.375000476837158 rows, 0.0 cpu, 0.0 io}, id = 5 > Step #1: BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], > table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 > rows, 101.0 cpu, 0.0 io}, id = 2 > Step #1: > Step #1: Sets: > Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views) > Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81 > Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405 > Step #1: > rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1) > Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9 > Step #1: > rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)), > rowcount=10.0, cumulative cost={inf} > Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0 > Step #1: > rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]), > rowcount=10.0, cumulative cost={inf} > Step #1: > Step #1: > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448) > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298) > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666
[jira] [Closed] (BEAM-5763) Re-organize IntelliJ docs into workflow tasks
[ https://issues.apache.org/jira/browse/BEAM-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner closed BEAM-5763. -- Resolution: Fixed Fix Version/s: Not applicable The IntelliJ documentation is now organized as as set of task-focused pages: https://cwiki.apache.org/confluence/display/BEAM/Using+IntelliJ+IDE > Re-organize IntelliJ docs into workflow tasks > - > > Key: BEAM-5763 > URL: https://issues.apache.org/jira/browse/BEAM-5763 > Project: Beam > Issue Type: Sub-task > Components: build-system, website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Fix For: Not applicable > > > The current documentation is not well organized. It mostly focuses on how to > get an initial setup working, but doesn't talk about common developer tasks > (building from scratch, testing a single module / unit test / integration > test, recovering from project corruption). > I'd like to re-organize the documentation so to make it very prescriptive to > follow and easy to validate that it works. > Current set of proposed "workflows" listed in this doc: > https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit?usp=sharing > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container
[ https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=155093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155093 ] ASF GitHub Bot logged work on BEAM-4130: Author: ASF GitHub Bot Created on: 16/Oct/18 19:59 Start Date: 16/Oct/18 19:59 Worklog Time Spent: 10m Work Description: tweise commented on issue #6703: [BEAM-4130] Add tests for FlinkJobServerDriver URL: https://github.com/apache/beam/pull/6703#issuecomment-430379066 @aaltay not that we are waiting to merge this PR - it is blocked by unrelated Java pre-commit issues. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155093) Time Spent: 13h 50m (was: 13h 40m) > Portable Flink runner JobService entry point in a Docker container > -- > > Key: BEAM-4130 > URL: https://issues.apache.org/jira/browse/BEAM-4130 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Maximilian Michels >Priority: Minor > Fix For: 2.7.0 > > Time Spent: 13h 50m > Remaining Estimate: 0h > > The portable Flink runner exists as a Job Service that runs somewhere. We > need a main entry point that itself spins up the job service (and artifact > staging service). The main program itself should be packaged into an uberjar > such that it can be run locally or submitted to a Flink deployment via `flink > run`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5759) ConcurrentModificationException on JmsIO checkpoint finalization
[ https://issues.apache.org/jira/browse/BEAM-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652348#comment-16652348 ] Jean-Baptiste Onofré commented on BEAM-5759: Thanks for catching. I'm reviewing the PR. > ConcurrentModificationException on JmsIO checkpoint finalization > > > Key: BEAM-5759 > URL: https://issues.apache.org/jira/browse/BEAM-5759 > Project: Beam > Issue Type: Bug > Components: io-java-jms >Affects Versions: 2.8.0 >Reporter: Andrew Fulton >Assignee: Andrew Fulton > Fix For: 2.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When reading from a JmsIO source, a ConcurrentModificationException can be > thrown when checkpoint finalization occurs under heavy load. > For example: > {{jsonPayload: {}} > {{ exception: "java.util.ConcurrentModificationException}} > {{ at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:903)}} > {{ at java.util.ArrayList$Itr.next(ArrayList.java:853)}} > {{ at > org.apache.beam.sdk.io.jms.JmsCheckpointMark.finalizeCheckpoint(JmsCheckpointMark.java:65)}} > {{ at > com.google.cloud.dataflow.worker.StreamingModeExecutionContext$1.run(StreamingModeExecutionContext.java:379)}} > {{ at > com.google.cloud.dataflow.worker.StreamingDataflowWorker$8.run(StreamingDataflowWorker.java:846)}} > {{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)}} > {{ at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)}} > {{ at java.lang.Thread.run(Thread.java:745)}} > {{"}} > {{ job: "2018-09-27_08_55_18-6454085774348718625" }} > {{ logger: "com.google.cloud.dataflow.worker.StreamingDataflowWorker" }} > {{ message: "Source checkpoint finalization failed:" }} > {{ thread: "309" }} > {{ work: "" }} > {{ worker: "test-andrew-092715504-09270855-tkfp-harness-dnmb" }} > > Looking at the JmsCheckpointMark code, it appears that access to the pending > message list is unprotected - thus if a thread calls finalizeCheckpoint while > a separate processing thread adds more messages to the checkpoint mark list > then an exception will be thrown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5759) ConcurrentModificationException on JmsIO checkpoint finalization
[ https://issues.apache.org/jira/browse/BEAM-5759?focusedWorklogId=155092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155092 ] ASF GitHub Bot logged work on BEAM-5759: Author: ASF GitHub Bot Created on: 16/Oct/18 19:52 Start Date: 16/Oct/18 19:52 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #6702: [BEAM-5759] Ensuring JmsIO checkpoint state is accessed and modified safely URL: https://github.com/apache/beam/pull/6702#issuecomment-430376482 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155092) Time Spent: 0.5h (was: 20m) > ConcurrentModificationException on JmsIO checkpoint finalization > > > Key: BEAM-5759 > URL: https://issues.apache.org/jira/browse/BEAM-5759 > Project: Beam > Issue Type: Bug > Components: io-java-jms >Affects Versions: 2.8.0 >Reporter: Andrew Fulton >Assignee: Andrew Fulton > Fix For: 2.9.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When reading from a JmsIO source, a ConcurrentModificationException can be > thrown when checkpoint finalization occurs under heavy load. > For example: > {{jsonPayload: {}} > {{ exception: "java.util.ConcurrentModificationException}} > {{ at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:903)}} > {{ at java.util.ArrayList$Itr.next(ArrayList.java:853)}} > {{ at > org.apache.beam.sdk.io.jms.JmsCheckpointMark.finalizeCheckpoint(JmsCheckpointMark.java:65)}} > {{ at > com.google.cloud.dataflow.worker.StreamingModeExecutionContext$1.run(StreamingModeExecutionContext.java:379)}} > {{ at > com.google.cloud.dataflow.worker.StreamingDataflowWorker$8.run(StreamingDataflowWorker.java:846)}} > {{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)}} > {{ at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)}} > {{ at java.lang.Thread.run(Thread.java:745)}} > {{"}} > {{ job: "2018-09-27_08_55_18-6454085774348718625" }} > {{ logger: "com.google.cloud.dataflow.worker.StreamingDataflowWorker" }} > {{ message: "Source checkpoint finalization failed:" }} > {{ thread: "309" }} > {{ work: "" }} > {{ worker: "test-andrew-092715504-09270855-tkfp-harness-dnmb" }} > > Looking at the JmsCheckpointMark code, it appears that access to the pending > message list is unprotected - thus if a thread calls finalizeCheckpoint while > a separate processing thread adds more messages to the checkpoint mark list > then an exception will be thrown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4663) Implement Cost calculations for Cost-Based Optimization (CBO)
[ https://issues.apache.org/jira/browse/BEAM-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-4663: - Assignee: Kai Jiang > Implement Cost calculations for Cost-Based Optimization (CBO) > -- > > Key: BEAM-4663 > URL: https://issues.apache.org/jira/browse/BEAM-4663 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > To support CBO, we should implement methods in each Beam*Rel.java. > computeSelfCost(...) as our first step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4663) Implement Cost calculations for Cost-Based Optimization (CBO)
[ https://issues.apache.org/jira/browse/BEAM-4663?focusedWorklogId=155089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155089 ] ASF GitHub Bot logged work on BEAM-4663: Author: ASF GitHub Bot Created on: 16/Oct/18 19:49 Start Date: 16/Oct/18 19:49 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #6656: [BEAM-4663] [SQL] CBO cost calculation URL: https://github.com/apache/beam/pull/6656#discussion_r225684401 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java ## @@ -111,6 +114,15 @@ public RelWriter explainTerms(RelWriter pw) { return pw; } + @Override + public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery metadata) { +RelNode child = getInput(); +Double rowCnt = metadata.getRowCount(child); Review comment: A `BoundedSource` does have size estimation that we might be able to use. What I think is important is the ability to correctly guide Calcite to apply the desired rules. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155089) Time Spent: 1h 40m (was: 1.5h) > Implement Cost calculations for Cost-Based Optimization (CBO) > -- > > Key: BEAM-4663 > URL: https://issues.apache.org/jira/browse/BEAM-4663 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > To support CBO, we should implement methods in each Beam*Rel.java. > computeSelfCost(...) as our first step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155083 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 19:30 Start Date: 16/Oct/18 19:30 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430369135 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155083) Time Spent: 4h 20m (was: 4h 10m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3900) Introduce Euphoria Java 8 DSL
[ https://issues.apache.org/jira/browse/BEAM-3900?focusedWorklogId=155081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155081 ] ASF GitHub Bot logged work on BEAM-3900: Author: ASF GitHub Bot Created on: 16/Oct/18 19:29 Start Date: 16/Oct/18 19:29 Worklog Time Spent: 10m Work Description: je-ik opened a new pull request #6709: [BEAM-3900] docs: TopPerKey is supported by euphoria URL: https://github.com/apache/beam/pull/6709 TopPerKey is supported by Euphoria DSL, it was just a left-over in documentation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155081) Time Spent: 3.5h (was: 3h 20m) > Introduce Euphoria Java 8 DSL > - > > Key: BEAM-3900 > URL: https://issues.apache.org/jira/browse/BEAM-3900 > Project: Beam > Issue Type: New Feature > Components: dsl-euphoria >Reporter: David Moravek >Assignee: David Moravek >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > This is the umbrella issue for integrating [Euphoria > API|http://github.com/seznam/euphoria] into Beam. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4663) Implement Cost calculations for Cost-Based Optimization (CBO)
[ https://issues.apache.org/jira/browse/BEAM-4663?focusedWorklogId=155058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155058 ] ASF GitHub Bot logged work on BEAM-4663: Author: ASF GitHub Bot Created on: 16/Oct/18 18:54 Start Date: 16/Oct/18 18:54 Worklog Time Spent: 10m Work Description: amaliujia commented on a change in pull request #6656: [BEAM-4663] [SQL] CBO cost calculation URL: https://github.com/apache/beam/pull/6656#discussion_r22599 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java ## @@ -111,6 +114,15 @@ public RelWriter explainTerms(RelWriter pw) { return pw; } + @Override + public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery metadata) { +RelNode child = getInput(); +Double rowCnt = metadata.getRowCount(child); Review comment: Because there is no support to get the relatively estimate row count yet, I suggest we don't work on CBO but focus on something we can control (e.g. logical optimization). For example, we can work on figuring out whether rules work in [BeamRuleSets.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java). I have seen an issue that we add a rule int the list but that rule triggers a bug at a moment. Also some rules definitely are not working because of traits setup. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155058) Time Spent: 1h 20m (was: 1h 10m) > Implement Cost calculations for Cost-Based Optimization (CBO) > -- > > Key: BEAM-4663 > URL: https://issues.apache.org/jira/browse/BEAM-4663 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > To support CBO, we should implement methods in each Beam*Rel.java. > computeSelfCost(...) as our first step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4663) Implement Cost calculations for Cost-Based Optimization (CBO)
[ https://issues.apache.org/jira/browse/BEAM-4663?focusedWorklogId=155059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155059 ] ASF GitHub Bot logged work on BEAM-4663: Author: ASF GitHub Bot Created on: 16/Oct/18 18:54 Start Date: 16/Oct/18 18:54 Worklog Time Spent: 10m Work Description: amaliujia commented on a change in pull request #6656: [BEAM-4663] [SQL] CBO cost calculation URL: https://github.com/apache/beam/pull/6656#discussion_r22599 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java ## @@ -111,6 +114,15 @@ public RelWriter explainTerms(RelWriter pw) { return pw; } + @Override + public RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery metadata) { +RelNode child = getInput(); +Double rowCnt = metadata.getRowCount(child); Review comment: Because there is no support to get the relatively correct estimate row count yet, I suggest we don't work on CBO but focus on something we can control (e.g. logical optimization). For example, we can work on figuring out whether rules work in [BeamRuleSets.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java). I have seen an issue that we add a rule int the list but that rule triggers a bug at a moment. Also some rules definitely are not working because of traits setup. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155059) Time Spent: 1.5h (was: 1h 20m) > Implement Cost calculations for Cost-Based Optimization (CBO) > -- > > Key: BEAM-4663 > URL: https://issues.apache.org/jira/browse/BEAM-4663 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > To support CBO, we should implement methods in each Beam*Rel.java. > computeSelfCost(...) as our first step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5058) Python precommits should run E2E tests
[ https://issues.apache.org/jira/browse/BEAM-5058?focusedWorklogId=155052&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155052 ] ASF GitHub Bot logged work on BEAM-5058: Author: ASF GitHub Bot Created on: 16/Oct/18 18:32 Start Date: 16/Oct/18 18:32 Worklog Time Spent: 10m Work Description: markflyhigh opened a new pull request #6707: [BEAM-5058] Run basic ITs in Python Precommit URL: https://github.com/apache/beam/pull/6707 According to https://beam.apache.org/contribute/testing/, we want to run basic e2e tests in Python Precommit. This change adds following suite to Python precommit: - directRunnerIT. 3 integration tests that run with DirectRunner. Finish within 1min. - precommitIT. Including wordcount batch and streaming integration tests runs against DataflowRunner. I expects this change will increase precommit time ~10mins and overall runtime will be 30 - 40 mins based on recent data. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking -
[jira] [Work logged] (BEAM-5114) Create example uber jars for supported runners
[ https://issues.apache.org/jira/browse/BEAM-5114?focusedWorklogId=155047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155047 ] ASF GitHub Bot logged work on BEAM-5114: Author: ASF GitHub Bot Created on: 16/Oct/18 18:18 Start Date: 16/Oct/18 18:18 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #6191: [BEAM-5114] Create example uber jars URL: https://github.com/apache/beam/pull/6191#issuecomment-430343037 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155047) Time Spent: 2h (was: 1h 50m) > Create example uber jars for supported runners > -- > > Key: BEAM-5114 > URL: https://issues.apache.org/jira/browse/BEAM-5114 > Project: Beam > Issue Type: New Feature > Components: examples-java >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Producing these artifacts results in several benefits > * Gives an example of how to package user code for different runners > * Enables ad-hoc testing of runner changes against real user pipelines easier > * Enables integration testing end-to-end pipelines against different runner > services -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5114) Create example uber jars for supported runners
[ https://issues.apache.org/jira/browse/BEAM-5114?focusedWorklogId=155048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155048 ] ASF GitHub Bot logged work on BEAM-5114: Author: ASF GitHub Bot Created on: 16/Oct/18 18:18 Start Date: 16/Oct/18 18:18 Worklog Time Spent: 10m Work Description: stale[bot] closed pull request #6191: [BEAM-5114] Create example uber jars URL: https://github.com/apache/beam/pull/6191 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/examples/java/direct/build.gradle b/examples/java/direct/build.gradle new file mode 100644 index 000..751b2f35457 --- /dev/null +++ b/examples/java/direct/build.gradle @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import groovy.json.JsonOutput + +apply plugin: org.apache.beam.gradle.BeamModulePlugin +// Disable default shadow jar closure and include all class files and resources. +applyJavaNature(shadowClosure: {}) + +dependencies { +compile project(path: ":beam-examples-java", configuration: "shadow") +compile project(path: ":beam-examples-java", configuration: "directRunnerPreCommit") +} diff --git a/examples/java/flink/build.gradle b/examples/java/flink/build.gradle new file mode 100644 index 000..c0674f4d48a --- /dev/null +++ b/examples/java/flink/build.gradle @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import groovy.json.JsonOutput + +apply plugin: org.apache.beam.gradle.BeamModulePlugin +// Disable default shadow jar closure and include all class files and resources. +applyJavaNature(shadowClosure: {}) + +dependencies { +compile project(path: ":beam-examples-java", configuration: "shadow") +compile project(path: ":beam-examples-java", configuration: "flinkRunnerPreCommit") +} diff --git a/examples/java/portable/build.gradle b/examples/java/portable/build.gradle new file mode 100644 index 000..8e342feab22 --- /dev/null +++ b/examples/java/portable/build.gradle @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import groovy.json.JsonOutput + +apply plugin: org.apache.beam.gradle.BeamModulePlugin +// Disable default shadow jar closure and include all class files and resources. +applyJavaNature(shadowClosure: {}) + +dependencies { +compile project(path: ":beam-examples-java", configuration: "shadow") +
[jira] [Commented] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652186#comment-16652186 ] Valentyn Tymofieiev commented on BEAM-5315: --- Hi [~udim], thanks for upgrading the Datastore dependency. You could: 1. Make sure existing ~35 datastore unit tests pass on Python 2. 2. Modify setup.py to remove the restriction that prevents installation of datastore client library on Python 3: https://github.com/apache/beam/blob/3e7e0346492f9c70903590c50133f9f5a5acf9ee/sdks/python/setup.py#L143. 3. Run Datastore unit tests in Python 3. You can follow https://s.apache.org/beam-py3-conversion-quick-start for instructions. Chances are some tests may still be failing for other reasons, since not all IO tests currently work in Python 3. Eventually, we should run all tests in IO module as part of Python 3 presubmit suite, currently we only run a subset: https://github.com/apache/beam/blob/3e7e0346492f9c70903590c50133f9f5a5acf9ee/sdks/python/tox.ini#L61. > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Simon >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container
[ https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=155040&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155040 ] ASF GitHub Bot logged work on BEAM-4130: Author: ASF GitHub Bot Created on: 16/Oct/18 17:56 Start Date: 16/Oct/18 17:56 Worklog Time Spent: 10m Work Description: tweise commented on issue #6703: [BEAM-4130] Add tests for FlinkJobServerDriver URL: https://github.com/apache/beam/pull/6703#issuecomment-430335378 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155040) Time Spent: 13h 40m (was: 13.5h) > Portable Flink runner JobService entry point in a Docker container > -- > > Key: BEAM-4130 > URL: https://issues.apache.org/jira/browse/BEAM-4130 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Maximilian Michels >Priority: Minor > Fix For: 2.7.0 > > Time Spent: 13h 40m > Remaining Estimate: 0h > > The portable Flink runner exists as a Job Service that runs somewhere. We > need a main entry point that itself spins up the job service (and artifact > staging service). The main program itself should be packaged into an uberjar > such that it can be run locally or submitted to a Flink deployment via `flink > run`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4492) Update Python bigquery library to latest version
[ https://issues.apache.org/jira/browse/BEAM-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-4492. - Resolution: Duplicate Fix Version/s: Not applicable I'm taking over upgrading bigquery client. See PR in linked issue. > Update Python bigquery library to latest version > > > Key: BEAM-4492 > URL: https://issues.apache.org/jira/browse/BEAM-4492 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Charles Chen >Priority: Major > Fix For: Not applicable > > > Current google-cloud-bigquery is set to 0.25.0 in > https://github.com/apache/beam/blob/master/sdks/python/setup.py#L130 > However, the latest version is 1.2.0. > According to comment in setup.py, this library is only used for testing, so > it should be easy to update. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=155035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155035 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 16/Oct/18 17:54 Start Date: 16/Oct/18 17:54 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r225645564 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -115,6 +115,7 @@ private void scheduleRelease(JobInfo jobInfo) { int environmentCacheTTLMillis = pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); if (environmentCacheTTLMillis > 0) { + // Do immediate cleanup if this class is not loaded on Flink parent classloader. if (this.getClass().getClassLoader() != ExecutionEnvironment.class.getClassLoader()) { Review comment: *Flink* classes (org.apache.flink.*) are always loaded through the parent classloader. The loaded class are always tight to the classloader which was used to load them. All other classes are first loaded through the child classloader, then the parent by default. ```yaml classloader.resolve-order: Whether Flink should use a child-first ClassLoader when loading user-code classes or a parent-first ClassLoader. Can be one of parent-first or child-first. (default: child-first) ``` https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/config.html#common-options This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155035) Time Spent: 36.5h (was: 36h 20m) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 36.5h > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=155037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155037 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 16/Oct/18 17:54 Start Date: 16/Oct/18 17:54 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r225645778 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -115,6 +115,7 @@ private void scheduleRelease(JobInfo jobInfo) { int environmentCacheTTLMillis = pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); if (environmentCacheTTLMillis > 0) { + // Do immediate cleanup if this class is not loaded on Flink parent classloader. if (this.getClass().getClassLoader() != ExecutionEnvironment.class.getClassLoader()) { Review comment: `FlinkUserCodeClassLoader` has been removed and the new ones, `ChildFirstClassLoader` and `ParentFirstClassloader` are package-private. So not really an option anyore to check for them. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155037) Time Spent: 36h 40m (was: 36.5h) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 36h 40m > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=155034&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155034 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 16/Oct/18 17:51 Start Date: 16/Oct/18 17:51 Worklog Time Spent: 10m Work Description: swegner commented on issue #6679: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6679#issuecomment-430333767 @tvalentyn I had looked into the possibility of multiple templates. However it appears that the workflow for selecting a pull request requires using a special pull request URL with query parameters ([docs](https://help.github.com/articles/about-automation-for-issues-and-pull-requests-with-query-parameters/)). I don't like the idea of requiring contributors to mess with URLs to open a PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155034) Time Spent: 23.5h (was: 23h 20m) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Time Spent: 23.5h > Remaining Estimate: 0h > > I have been trying to use google datalab with python3. As I see there are > several packages that does not support python3 yet which google datalab > depends on. This is one of them. > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=155033&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155033 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 16/Oct/18 17:45 Start Date: 16/Oct/18 17:45 Worklog Time Spent: 10m Work Description: angoenka commented on issue #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#issuecomment-430331548 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155033) Time Spent: 36h 20m (was: 36h 10m) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 36h 20m > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=155032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155032 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 16/Oct/18 17:44 Start Date: 16/Oct/18 17:44 Worklog Time Spent: 10m Work Description: tweise closed pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink.groovy b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink.groovy new file mode 100644 index 000..ad09a0ab53d --- /dev/null +++ b/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Flink.groovy @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties +import PostcommitJobBuilder + +// This job runs the suite of ValidatesRunner tests against the Flink runner. +PostcommitJobBuilder.postCommitJob('beam_PostCommit_Java_PVR_Flink', + 'Run Java Flink PortableValidatesRunner', 'Java Flink PortableValidatesRunner Tests', this) { + description('Runs the Java PortableValidatesRunner suite on the Flink runner.') + + // Set common parameters. + commonJobProperties.setTopLevelMainJobProperties(delegate) + + // Publish all test results to Jenkins + publishers { +archiveJunit('**/build/test-results/**/*.xml') + } + + // Gradle goals for this job. + steps { +gradle { + rootBuildScriptDir(commonJobProperties.checkoutDir) + tasks(':beam-runners-flink_2.11-job-server:validatesPortableRunner') + commonJobProperties.setGradleSwitches(delegate) +} + } +} diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy index d1a80dc5e29..28548c09ddd 100644 --- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy +++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy @@ -1499,6 +1499,7 @@ artifactId=${project.name} testClassesDirs = project.files(project.project(":beam-sdks-java-core").sourceSets.test.output.classesDirs, project.project(":beam-runners-core-java").sourceSets.test.output.classesDirs) maxParallelForks config.parallelism useJUnit(config.testCategories) +dependsOn ':beam-sdks-java-container:docker' } } } diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java index 679c7cc4bb9..93dc6f0121c 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java @@ -170,6 +170,11 @@ public void run() { } } + public String start() throws IOException { +jobServer = createJobServer(); +return jobServer.getApiServiceDescriptor().getUrl(); + } + public void stop() { if (jobServer != null) { try { diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java index bb2b9dcbe16..90d291ea28a 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java @@ -115,6 +115,7 @@ private void scheduleRelease(Job
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=155030&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155030 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 16/Oct/18 17:42 Start Date: 16/Oct/18 17:42 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r225641625 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -115,6 +115,7 @@ private void scheduleRelease(JobInfo jobInfo) { int environmentCacheTTLMillis = pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); if (environmentCacheTTLMillis > 0) { + // Do immediate cleanup if this class is not loaded on Flink parent classloader. if (this.getClass().getClassLoader() != ExecutionEnvironment.class.getClassLoader()) { Review comment: Let's continue investigating this as a follow-up, since it isn't directly linked to this PR. Merging.. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155030) Time Spent: 36h (was: 35h 50m) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 36h > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3746) Count.globally should override getIncompatibleGlobalWindowErrorMessage to tell the user the usage that is currently only in javadoc
[ https://issues.apache.org/jira/browse/BEAM-3746?focusedWorklogId=155022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155022 ] ASF GitHub Bot logged work on BEAM-3746: Author: ASF GitHub Bot Created on: 16/Oct/18 17:35 Start Date: 16/Oct/18 17:35 Worklog Time Spent: 10m Work Description: kennknowles closed pull request #6632: [BEAM-3746] Change incompatible message from referencing the output collection to referencing the input collection URL: https://github.com/apache/beam/pull/6632 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineFnBase.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineFnBase.java index 3756f1fd42d..d2cfaecad0c 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineFnBase.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineFnBase.java @@ -107,7 +107,7 @@ abstract static class AbstractGlobalCombineFn implements GlobalCombineFn, Serializable { private static final String INCOMPATIBLE_GLOBAL_WINDOW_ERROR_MESSAGE = -"Default values are not supported in Combine.globally() if the output " +"Default values are not supported in Combine.globally() if the input " + "PCollection is not windowed by GlobalWindows. Instead, use " + "Combine.globally().withoutDefaults() to output an empty PCollection if the input " + "PCollection is empty, or Combine.globally().asSingletonView() to get the default " diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Top.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Top.java index 59e569e09a6..354bcb139a3 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Top.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Top.java @@ -414,10 +414,10 @@ public void populateDisplayData(DisplayData.Builder builder) { @Override public String getIncompatibleGlobalWindowErrorMessage() { - return "Default values are not supported in Top.[of, smallest, largest]() if the output " + return "Default values are not supported in Top.[of, smallest, largest]() if the input " + "PCollection is not windowed by GlobalWindows. Instead, use " - + "Top.[of, smallest, largest]().withoutDefaults() to output an empty PCollection if the" - + " input PCollection is empty, or Top.[of, smallest, largest]().asSingletonView() to " + + "Top.[of, smallest, largest]().withoutDefaults() to output an empty PCollection if the " + + "input PCollection is empty, or Top.[of, smallest, largest]().asSingletonView() to " + "get a PCollection containing the empty list if the input PCollection is empty."; } } This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155022) Time Spent: 3h (was: 2h 50m) > Count.globally should override getIncompatibleGlobalWindowErrorMessage to > tell the user the usage that is currently only in javadoc > --- > > Key: BEAM-3746 > URL: https://issues.apache.org/jira/browse/BEAM-3746 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Sam Rohde >Priority: Major > Labels: beginner, newbie, starter > Time Spent: 3h > Remaining Estimate: 0h > > https://beam.apache.org/documentation/sdks/javadoc/2.3.0/org/apache/beam/sdk/transforms/Count.html#globally-- > "Note: if the input collection uses a windowing strategy other than > GlobalWindows, use Combine.globally(Count.combineFn()).withoutDefaults() > instead." > But the actual crash a user gets is: > "java.lang.IllegalStateException: Default values are not supported in > Combine.globally() if the output PCollection is not windowed by > GlobalWindows. Instead, use Combine.globally().withoutDefaults() to output an > empty PCollection if the input PCollection is empty, or > Combine.globally().asSingletonView() to get the default output of the > CombineFn if t
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=155020&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155020 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 16/Oct/18 17:32 Start Date: 16/Oct/18 17:32 Worklog Time Spent: 10m Work Description: aaltay closed pull request #6679: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6679 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/website/src/contribute/index.md b/website/src/contribute/index.md index 230c3df2ae1..9414a7e748f 100644 --- a/website/src/contribute/index.md +++ b/website/src/contribute/index.md @@ -328,8 +328,9 @@ Work is in progress to add Python 3 support to Beam. Current goal is to make Be - [Proposal](https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE) - [Kanban Board](https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail) + - [Python 3 Conversion Quick Start Guide](https://docs.google.com/document/d/1s1BJVCY65LB_SYK1SU1u7NbZiFANoq-nEYaEvzRbYlA) -Contributions are welcome! If you are interested to help, you can select a subpackage to port and assign yourself the corresponding issue. Comment on the issue if you cannot assign it yourself. +Contributions are welcome! If you are interested to help, you can select an unassigned issue in the Kanban board and assign it to yourself. Comment on the issue if you cannot assign it yourself. When submitting a new PR, please tag [@RobbeSneyders](https://github.com/robbesneyders), [@aaltay](https://github.com/aaltay), and [@tvalentyn](https://github.com/tvalentyn). ### Next Java LTS version support (Java 11 / 18.9) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155020) Time Spent: 23h 20m (was: 23h 10m) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Time Spent: 23h 20m > Remaining Estimate: 0h > > I have been trying to use google datalab with python3. As I see there are > several packages that does not support python3 yet which google datalab > depends on. This is one of them. > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=155016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155016 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 16/Oct/18 17:24 Start Date: 16/Oct/18 17:24 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r225635682 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -115,6 +115,7 @@ private void scheduleRelease(JobInfo jobInfo) { int environmentCacheTTLMillis = pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); if (environmentCacheTTLMillis > 0) { + // Do immediate cleanup if this class is not loaded on Flink parent classloader. if (this.getClass().getClassLoader() != ExecutionEnvironment.class.getClassLoader()) { Review comment: FlinkUserCodeClassloader seems to have been removed since Flink 1.1 so can't use it. This PR says that flink now loads all flink classes to the parent class loader https://github.com/apache/flink/pull/4891 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155016) Time Spent: 35h 50m (was: 35h 40m) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 35h 50m > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)