[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=154120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154120 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 13/Oct/18 18:40 Start Date: 13/Oct/18 18:40 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r224968563 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1498,6 +1498,7 @@ artifactId=${project.name} testClassesDirs = project.files(project.project(":beam-sdks-java-core").sourceSets.test.output.classesDirs, project.project(":beam-runners-core-java").sourceSets.test.output.classesDirs) maxParallelForks config.parallelism useJUnit(config.testCategories) +dependsOn ':beam-sdks-java-container:docker' Review comment: Is this later going to change to use the process environment? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154120) Time Spent: 33h 40m (was: 33.5h) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 33h 40m > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink
[ https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=154119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154119 ] ASF GitHub Bot logged work on BEAM-5708: Author: ASF GitHub Bot Created on: 13/Oct/18 18:21 Start Date: 13/Oct/18 18:21 Worklog Time Spent: 10m Work Description: tweise closed pull request #6638: [BEAM-5708] Cache environment in portable flink runner URL: https://github.com/apache/beam/pull/6638 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy index 290f72399db..0ba980b27e5 100644 --- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy +++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy @@ -1484,6 +1484,7 @@ artifactId=${project.name} def beamTestPipelineOptions = [ "--runner=org.apache.beam.runners.reference.testing.TestPortableRunner", "--jobServerDriver=${config.jobServerDriver}", +"--environmentCacheMillis=1", ] if (config.jobServerConfig) { beamTestPipelineOptions.add("--jobServerConfig=${config.jobServerConfig}") diff --git a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java index 988a94826fb..bb2b9dcbe16 100644 --- a/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java +++ b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java @@ -24,12 +24,17 @@ import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.Executors; import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; +import org.apache.beam.runners.core.construction.PipelineOptionsTranslation; import org.apache.beam.runners.core.construction.graph.ExecutableStage; import org.apache.beam.runners.fnexecution.control.StageBundleFactory; import org.apache.beam.runners.fnexecution.provisioning.JobInfo; import org.apache.beam.sdk.fn.function.ThrowingFunction; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PortablePipelineOptions; import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.java.ExecutionEnvironment; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -104,9 +109,30 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + if (this.getClass().getClassLoader() != ExecutionEnvironment.class.getClassLoader()) { +LOG.warn( +"{} is not loaded on parent Flink classloader. " ++ "Falling back to synchronous environment release for job {}.", +this.getClass(), +jobInfo.jobId()); +release(wrapper); + } else { +// Schedule task to clean the container later. +// Ensure that this class is loaded in the parent Flink classloader. +getExecutor() +.schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); + } +} else { + // Do not release this asynchronously, as the releasing could fail due to the classloader not + // being available anymore after the tasks have been removed from the execution engine. + release(wrapper); +} } private ConcurrentHashMap getCache() { diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java index 5107c389bb1..a8dfa8e0665 100644 ---
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=154104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154104 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 13/Oct/18 17:07 Start Date: 13/Oct/18 17:07 Worklog Time Spent: 10m Work Description: tweise closed pull request #6683: [BEAM-5442] Revert #6675 "Revert PRs #6557 #6589 #6600" URL: https://github.com/apache/beam/pull/6683 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154104) Time Spent: 9h 50m (was: 9h 40m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Fix For: 2.9.0 > > Time Spent: 9h 50m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5097) Increment counter for "small words" in go SDK example
[ https://issues.apache.org/jira/browse/BEAM-5097?focusedWorklogId=154100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154100 ] ASF GitHub Bot logged work on BEAM-5097: Author: ASF GitHub Bot Created on: 13/Oct/18 14:07 Start Date: 13/Oct/18 14:07 Worklog Time Spent: 10m Work Description: stale[bot] closed pull request #6157: [BEAM-5097][WIP] Add counter to combine example in go sdk URL: https://github.com/apache/beam/pull/6157 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/go/examples/cookbook/combine/combine.go b/sdks/go/examples/cookbook/combine/combine.go index 7e24aa1fb30..1950a687d24 100644 --- a/sdks/go/examples/cookbook/combine/combine.go +++ b/sdks/go/examples/cookbook/combine/combine.go @@ -63,11 +63,16 @@ type extractFn struct { MinLength int `json:"min_length"` } +// A global context for simplicity. +var ctx = context.Background() + func (f *extractFn) ProcessElement(row WordRow, emit func(string, string)) { + small_words := beam.NewCounter("example.namespace", "small_words") if len(row.Word) >= f.MinLength { emit(row.Word, row.Corpus) + } else { + small_words.Inc(ctx, 1) } - // TODO(herohde) 7/14/2017: increment counter for "small words" } // TODO(herohde) 7/14/2017: the choice of a string (instead of []string) for the This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154100) Time Spent: 1h 40m (was: 1.5h) > Increment counter for "small words" in go SDK example > - > > Key: BEAM-5097 > URL: https://issues.apache.org/jira/browse/BEAM-5097 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: holdenk >Assignee: holdenk >Priority: Trivial > Time Spent: 1h 40m > Remaining Estimate: 0h > > Increment counter for "small words" in go SDK example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5097) Increment counter for "small words" in go SDK example
[ https://issues.apache.org/jira/browse/BEAM-5097?focusedWorklogId=154099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154099 ] ASF GitHub Bot logged work on BEAM-5097: Author: ASF GitHub Bot Created on: 13/Oct/18 14:07 Start Date: 13/Oct/18 14:07 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #6157: [BEAM-5097][WIP] Add counter to combine example in go sdk URL: https://github.com/apache/beam/pull/6157#issuecomment-429545012 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154099) Time Spent: 1.5h (was: 1h 20m) > Increment counter for "small words" in go SDK example > - > > Key: BEAM-5097 > URL: https://issues.apache.org/jira/browse/BEAM-5097 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: holdenk >Assignee: holdenk >Priority: Trivial > Time Spent: 1.5h > Remaining Estimate: 0h > > Increment counter for "small words" in go SDK example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}
[ https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154082 ] ASF GitHub Bot logged work on BEAM-5626: Author: ASF GitHub Bot Created on: 13/Oct/18 04:25 Start Date: 13/Oct/18 04:25 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6587: [BEAM-5626] Fix hadoop filesystem test for py3. URL: https://github.com/apache/beam/pull/6587#discussion_r224949182 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -214,6 +214,11 @@ def setUp(self): url = self.fs.join(self.tmpdir, filename) self.fs.create(url).close() +try:# Python 2 Review comment: @tvalentynI came across this python2->python3 doc from python.org, LINK: python-2-3 Difference . Interesting read. Section "Use feature detection instead of version detection" talks about replying on sys.version_info. figured maybe we can weigh in the point this doc brought up before we apply the change to every place. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154082) Time Spent: 5h 40m (was: 5.5h) > Several IO tests fail in Python 3 with RuntimeError('dictionary changed size > during iteration',)} > - > > Key: BEAM-5626 > URL: https://issues.apache.org/jira/browse/BEAM-5626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ruoyun Huang >Priority: Major > Fix For: 2.8.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > ERROR: test_delete_dir > (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py", > line 506, in test_delete_dir > self.fs.delete([url_t1]) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py", > line 370, in delete > raise BeamIOError("Delete operation failed", exceptions) > apache_beam.io.filesystem.BeamIOError: Delete operation failed with > exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size > during iteration', )} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}
[ https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154083 ] ASF GitHub Bot logged work on BEAM-5626: Author: ASF GitHub Bot Created on: 13/Oct/18 04:28 Start Date: 13/Oct/18 04:28 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6587: [BEAM-5626] Fix hadoop filesystem test for py3. URL: https://github.com/apache/beam/pull/6587#discussion_r224949182 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -214,6 +214,11 @@ def setUp(self): url = self.fs.join(self.tmpdir, filename) self.fs.create(url).close() +try:# Python 2 Review comment: @tvalentynI came across this python2->python3 doc from python.org, LINK: python-2-3 Difference . Interesting read. Section "Use feature detection instead of version detection" talks about replying on sys.version_info. Though the point does makes our use here into a negative. just FYI. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154083) Time Spent: 5h 50m (was: 5h 40m) > Several IO tests fail in Python 3 with RuntimeError('dictionary changed size > during iteration',)} > - > > Key: BEAM-5626 > URL: https://issues.apache.org/jira/browse/BEAM-5626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ruoyun Huang >Priority: Major > Fix For: 2.8.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > ERROR: test_delete_dir > (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py", > line 506, in test_delete_dir > self.fs.delete([url_t1]) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py", > line 370, in delete > raise BeamIOError("Delete operation failed", exceptions) > apache_beam.io.filesystem.BeamIOError: Delete operation failed with > exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size > during iteration', )} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5626) Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)}
[ https://issues.apache.org/jira/browse/BEAM-5626?focusedWorklogId=154081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154081 ] ASF GitHub Bot logged work on BEAM-5626: Author: ASF GitHub Bot Created on: 13/Oct/18 04:24 Start Date: 13/Oct/18 04:24 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6587: [BEAM-5626] Fix hadoop filesystem test for py3. URL: https://github.com/apache/beam/pull/6587#discussion_r224949182 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -214,6 +214,11 @@ def setUp(self): url = self.fs.join(self.tmpdir, filename) self.fs.create(url).close() +try:# Python 2 Review comment: I came across this python2->python3 doc from python.org. python-2-3 Difference Interesting read. Section "Use feature detection instead of version detection" talks about replying on sys.version_info, figured maybe we can weigh in the point this doc brought up before we apply the change to every place. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154081) Time Spent: 5.5h (was: 5h 20m) > Several IO tests fail in Python 3 with RuntimeError('dictionary changed size > during iteration',)} > - > > Key: BEAM-5626 > URL: https://issues.apache.org/jira/browse/BEAM-5626 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ruoyun Huang >Priority: Major > Fix For: 2.8.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > ERROR: test_delete_dir > (apache_beam.io.hadoopfilesystem_test.HadoopFileSystemTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem_test.py", > line 506, in test_delete_dir > self.fs.delete([url_t1]) >File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/hadoopfilesystem.py", > line 370, in delete > raise BeamIOError("Delete operation failed", exceptions) > apache_beam.io.filesystem.BeamIOError: Delete operation failed with > exceptions {'hdfs://test_dir/new_dir1': RuntimeError('dictionary changed size > during iteration', )} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154080 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 13/Oct/18 02:44 Start Date: 13/Oct/18 02:44 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#issuecomment-429505015 @herohde this PR is ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154080) Time Spent: 1h 20m (was: 1h 10m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154078 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 13/Oct/18 01:29 Start Date: 13/Oct/18 01:29 Worklog Time Spent: 10m Work Description: HuangLED edited a comment on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108 R: @herohde cc: @boyuanzz @pabloem Addressed. Also, option definition moved to WorkerOptions per Pablo's suggestion. Thanks to Boyuan for pointing out the right place for error message. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154078) Time Spent: 2.5h (was: 2h 20m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154077 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 13/Oct/18 01:03 Start Date: 13/Oct/18 01:03 Worklog Time Spent: 10m Work Description: mwylde commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224944539 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { + private AtomicBoolean cancelled = new AtomicBoolean(false); + private AtomicLong count = new AtomicLong(); + + @Override + public void run(SourceContext> ctx) throws Exception { +while (!cancelled.get() && (messageCount == 0 || count.getAndIncrement() < messageCount)) { + ctx.collect(WindowedValue.valueInGlobalWindow(new byte[] {})); + Thread.sleep(intervalMillis); Review comment: đ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154077) Time Spent: 2h 20m (was: 2h 10m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154075 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 13/Oct/18 00:54 Start Date: 13/Oct/18 00:54 Worklog Time Spent: 10m Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429497899 I was not able to reproduce the issue where negative numbers are represented as `long` type I mentioned in https://github.com/apache/beam/pull/6602#issuecomment-429468709 using a fresh installation of apache-beam==2.7.0, so I instead opened https://issues.apache.org/jira/browse/BEAM-5744, to investigate if this is a regression in 2.8.0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154075) Time Spent: 3h (was: 2h 50m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154074 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 13/Oct/18 00:54 Start Date: 13/Oct/18 00:54 Worklog Time Spent: 10m Work Description: mwylde commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224944203 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { + private AtomicBoolean cancelled = new AtomicBoolean(false); + private AtomicLong count = new AtomicLong(); Review comment: Good to know that this will never be called concurrently. I'll change this to a long. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154074) Time Spent: 2h 10m (was: 2h) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154073 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 13/Oct/18 00:53 Start Date: 13/Oct/18 00:53 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429497899 I was not able to reproduce https://github.com/apache/beam/pull/6602#issuecomment-429468709 using a fresh installation of apache-beam==2.7.0, so I instead opened https://issues.apache.org/jira/browse/BEAM-5744, to investigate if this is a regression in 2.8.0. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154073) Time Spent: 2h 50m (was: 2h 40m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 50m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154072 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 13/Oct/18 00:51 Start Date: 13/Oct/18 00:51 Worklog Time Spent: 10m Work Description: mwylde commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224944097 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { Review comment: đ I've moved it to org.apache.beam.runners.flink.translation.wrappers.streaming.io This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154072) Time Spent: 2h (was: 1h 50m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154071 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 13/Oct/18 00:49 Start Date: 13/Oct/18 00:49 Worklog Time Spent: 10m Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429478105 Unfortunately I think https://github.com/apache/beam/pull/6602#issuecomment-429468709 may be a very unintuitive change, so we need to roll it back and either fix the underlying issue with typing of negative numbers or proceed with a different solution here. We would need to cherry-pick the change into the release branch, so I'll mark BEAM-5621 as release blocker until cherry-pick is in. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154071) Time Spent: 2h 40m (was: 2.5h) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 40m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5714) RedisIO emit error of EXEC without MULTI
[ https://issues.apache.org/jira/browse/BEAM-5714?focusedWorklogId=154070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154070 ] ASF GitHub Bot logged work on BEAM-5714: Author: ASF GitHub Bot Created on: 13/Oct/18 00:47 Start Date: 13/Oct/18 00:47 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6651: [BEAM-5714] Fix RedisIO EXEC without MULTI error URL: https://github.com/apache/beam/pull/6651#issuecomment-429497451 cc: @vvarma might have an opinion This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154070) Time Spent: 40m (was: 0.5h) > RedisIO emit error of EXEC without MULTI > > > Key: BEAM-5714 > URL: https://issues.apache.org/jira/browse/BEAM-5714 > Project: Beam > Issue Type: Bug > Components: io-java-redis >Affects Versions: 2.7.0 >Reporter: K.K. POON >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > RedisIO has EXEC without MULTI error after SET a batch of records. >  > By looking at the source code, I guess there is missing `pipeline.multi();` > after exec() the last batch. > [https://github.com/apache/beam/blob/master/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L555] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=154069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154069 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 13/Oct/18 00:42 Start Date: 13/Oct/18 00:42 Worklog Time Spent: 10m Work Description: mwylde commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224943673 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java ## @@ -63,6 +64,8 @@ public static final String GROUP_BY_KEY_TRANSFORM_URN = getUrn(StandardPTransforms.Primitives.GROUP_BY_KEY); public static final String IMPULSE_TRANSFORM_URN = getUrn(StandardPTransforms.Primitives.IMPULSE); + public static final String STREAMING_IMPULSE_TRANSFORM_URL = "flink:transform:streaming_impulse:v1"; Review comment: đ moved to FlinkStreamingPortablePipelineTranslator This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154069) Time Spent: 1h 50m (was: 1h 40m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154064 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 23:58 Start Date: 12/Oct/18 23:58 Worklog Time Spent: 10m Work Description: pabloem closed pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/model/fn-execution/src/main/proto/beam_fn_api.proto b/model/fn-execution/src/main/proto/beam_fn_api.proto index a0795a7c285..915686de6b3 100644 --- a/model/fn-execution/src/main/proto/beam_fn_api.proto +++ b/model/fn-execution/src/main/proto/beam_fn_api.proto @@ -40,6 +40,7 @@ option java_outer_classname = "BeamFnApi"; import "beam_runner_api.proto"; import "endpoints.proto"; +import "google/protobuf/descriptor.proto"; import "google/protobuf/timestamp.proto"; import "google/protobuf/wrappers.proto"; @@ -250,11 +251,16 @@ message ProcessBundleRequest { message ProcessBundleResponse { // (Optional) If metrics reporting is supported by the SDK, this represents // the final metrics to record for this bundle. + // DEPRECATED Metrics metrics = 1; // (Optional) Specifies that the bundle has been split since the last // ProcessBundleProgressResponse was sent. BundleSplit split = 2; + + // (Required) The list of metrics or other MonitoredState + // collected while processing this bundle. + repeated MonitoringInfo monitoring_infos = 3; } // A request to report progress information for a given bundle. @@ -275,9 +281,9 @@ message MonitoringInfo { // Sub types like field formats - int64, double, string. // Aggregation methods - SUM, LATEST, TOP-N, BOTTOM-N, DISTRIBUTION // valid values are: - // beam:metrics:[SumInt64|LatestInt64|Top-NInt64|Bottom-NInt64| - // SumDouble|LatestDouble|Top-NDouble|Bottom-NDouble|DistributionInt64| - // DistributionDouble|MonitoringDataTable] + // beam:metrics:[sum_int_64|latest_int_64|top_n_int_64|bottom_n_int_64| + // sum_double|latest_double|top_n_double|bottom_n_double| + // distribution_int_64|distribution_double|monitoring_data_table string type = 2; // The Metric or monitored state. @@ -302,6 +308,45 @@ message MonitoringInfo { // Some systems such as Stackdriver will be able to aggregate the metrics // using a subset of the provided labels map labels = 5; + + // The walltime of the most recent update. + // Useful for aggregation for Latest types such as LatestInt64. + google.protobuf.Timestamp timestamp = 6; +} + +message MonitoringInfoUrns { + enum Enum { +USER_COUNTER_URN_PREFIX = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:user"]; + +ELEMENT_COUNT = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:element_count:v1"]; + +START_BUNDLE_MSECS = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) = + "beam:metric:pardo_execution_time:start_bundle_msecs:v1"]; + +PROCESS_BUNDLE_MSECS = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:pardo_execution_time:process_bundle_msecs:v1"]; + +FINISH_BUNDLE_MSECS = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:pardo_execution_time:finish_bundle_msecs:v1"]; + +TOTAL_MSECS = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:ptransform_execution_time:total_msecs:v1"]; + } +} + +message MonitoringInfoTypeUrns { + enum Enum { +SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metrics:sum_int_64"]; + +DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metrics:distribution_int_64"]; + +LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) = + "beam:metrics:latest_int_64"]; + } } message Metric { @@ -525,12 +570,16 @@ message Metrics { } message ProcessBundleProgressResponse { - // (Required) + // DEPRECATED (Required) Metrics metrics = 1; // (Optional) Specifies that the bundle has been split since the last // ProcessBundleProgressResponse was sent. BundleSplit split = 2; + + // (Required) The list of metrics or other MonitoredState + // collected while processing this bundle. + repeated MonitoringInfo monitoring_infos = 3; } message ProcessBundleSplitRequest { @@ -795,7 +844,6 @@ message LogEntry { enum Enum { // Unspecified level information. Will be logged at the TRACE level. UNSPECIFIED = 0; - // Trace
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154063 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 23:58 Start Date: 12/Oct/18 23:58 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-429492504 Okay, as this looks good, I'll go ahead and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154063) Time Spent: 8h 20m (was: 8h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154062&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154062 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 23:54 Start Date: 12/Oct/18 23:54 Worklog Time Spent: 10m Work Description: herohde commented on issue #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#issuecomment-429492075 @boyuanzz Btw, please also update the doc describing this change to include Java. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154062) Time Spent: 1h 10m (was: 1h) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154061 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 23:40 Start Date: 12/Oct/18 23:40 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224939002 ## File path: sdks/java/container/boot.go ## @@ -103,7 +103,17 @@ func main() { filepath.Join(jarsDir, "slf4j-jdk14.jar"), filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } + + var hasWorkerExperiment = strings.Contains(options, "use_staged_dataflow_worker_jar") for _, md := range artifacts { + if hasWorkerExperiment { + if strings.HasPrefix(md.Name, "beam-runners-google-cloud-dataflow-java-fn-api-worker") { + continue + } + if strings.HasPrefix(md.Name, "dataflow-worker.jar") { Review comment: Small comment: this can be == instead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154061) Time Spent: 1h (was: 50m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3587) User reports TextIO failure in FlinkRunner on master
[ https://issues.apache.org/jira/browse/BEAM-3587?focusedWorklogId=154059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154059 ] ASF GitHub Bot logged work on BEAM-3587: Author: ASF GitHub Bot Created on: 12/Oct/18 23:36 Start Date: 12/Oct/18 23:36 Worklog Time Spent: 10m Work Description: swegner closed pull request #384: [BEAM-3587] Add a note to Gradle shadowJar for merge service files URL: https://github.com/apache/beam-site/pull/384 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/documentation/runners/flink.md b/src/documentation/runners/flink.md index 6dc6e7b69d..df64ba46f9 100644 --- a/src/documentation/runners/flink.md +++ b/src/documentation/runners/flink.md @@ -51,7 +51,7 @@ For more information, the [Flink Documentation](https://ci.apache.org/projects/f ```java org.apache.beam - beam-runners-flink_2.10 + beam-runners-flink_2.11 {{ site.release_latest }} runtime @@ -81,6 +81,62 @@ $ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ If you have a Flink `JobManager` running on your local machine you can give `localhost:6123` for `flinkMaster`. +Behind the hood, to create your shaded jar (containing your pipeline and the Flink runner dependencies), you have to use the `maven-shade-plugin`: + +```java + +org.apache.beam +beam-runners-flink_2.10 +{{ site.release_latest }} + +``` + +```java + +org.apache.maven.plugins +maven-shade-plugin +${maven-shade-plugin.version} + + false + + +*:* + +META-INF/*.SF +META-INF/*.DSA +META-INF/*.RSA + + + + + + +package + +shade + + + true +shaded + + + + + + + +``` + +Then, Maven build will create the shaded jar. + +If you prefer to use Gradle, you can achieve the same using `shadowJar`: + +```java +shadowJar { +mergeServiceFiles() +} +``` + ## Pipeline options for the Flink Runner When executing your pipeline with the Flink Runner, you can set these pipeline options. diff --git a/src/documentation/runners/spark.md b/src/documentation/runners/spark.md index 1502f242c0..4b4479e0e3 100644 --- a/src/documentation/runners/spark.md +++ b/src/documentation/runners/spark.md @@ -37,7 +37,7 @@ You can add a dependency on the latest version of the Spark runner by adding to ### Deploying Spark with your application -In some cases, such as running in local mode/Standalone, your (self-contained) application would be required to pack Spark by explicitly adding the following dependencies in your pom.xml: +Most of the time (running in local mode/Standalone or using `spark-submit`), your (self-contained) application would be required to pack Spark by explicitly adding the following dependencies in your pom.xml: ```java org.apache.spark @@ -94,6 +94,17 @@ After running mvn package, run ls target and you shoul beam-examples-1.0.0-shaded.jar ``` +If you are using gradle, you have to use `shadowJar` to create the shaded jar enabling `mergeServiceFiles()`: +```java +shadowJar { +transform(AppendingTransformer) { +resource = 'reference.conf' +} +relocate 'com.google.protobuf', 'shaded.protobuf' +mergeServiceFiles() +} +``` + To run against a Standalone cluster simply run: ``` spark-submit --class com.beam.examples.BeamPipeline --master spark://HOST:PORT target/beam-examples-1.0.0-shaded.jar --runner=SparkRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154059) Time Spent: 1h 20m (was: 1h 10m) > User reports TextIO failure in FlinkRunne
[jira] [Work logged] (BEAM-3587) User reports TextIO failure in FlinkRunner on master
[ https://issues.apache.org/jira/browse/BEAM-3587?focusedWorklogId=154058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154058 ] ASF GitHub Bot logged work on BEAM-3587: Author: ASF GitHub Bot Created on: 12/Oct/18 23:36 Start Date: 12/Oct/18 23:36 Worklog Time Spent: 10m Work Description: swegner commented on issue #384: [BEAM-3587] Add a note to Gradle shadowJar for merge service files URL: https://github.com/apache/beam-site/pull/384#issuecomment-429489867 I've migrated the changes for this pull request onto the migrated website sources in the `apache/beam` repository: https://github.com/swegner/beam/tree/migrated-pr-384 To pull the migrated changes into your local git client, run: ``` git remote add swegner g...@github.com:swegner/beam.git && git fetch swegner git checkout -B bigqueryio swegner/migrated-pr-384 ``` You can then push the changes to your own branch and [open a new pull request](https://github.com/apache/beam/compare?expand=1) against the apache/beam repository. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154058) Time Spent: 1h 10m (was: 1h) > User reports TextIO failure in FlinkRunner on master > > > Key: BEAM-3587 > URL: https://issues.apache.org/jira/browse/BEAM-3587 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Kenneth Knowles >Assignee: Jean-Baptiste Onofré >Priority: Minor > Fix For: Not applicable > > Attachments: screen1.png, screen2.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Reported here: > [https://lists.apache.org/thread.html/47b16c94032392782505415e010970fd2a9480891c55c2f7b5de92bd@%3Cuser.beam.apache.org%3E] > "I'm trying to run a pipeline containing just a TextIO.read() step on a Flink > cluster, using the latest Beam git revision (ff37337). The job fails to start > with the Exception: > {{java.lang.UnsupportedOperationException: The transform is currently not > supported.}} > It does work with Beam 2.2.0 though. All code, logs, and reproduction steps > [https://github.com/pelletier/beam-flink-example]"; > My initial thoughts: I have a guess that this has to do with switching to > running from a portable pipeline representation, and it looks like there's a > non-composite transform with an empty URN and it threw a bad error message. > We can try to root cause but may also mitigate short-term by removing the > round-trip through pipeline proto for now. > What is curious is that the ValidatesRunner and WordCountIT are working - > they only run on a local Flink, yet this seems to be a translation issue that > would occur for local or distributed runs. > We need to certainly run this repro on the RC if we don't totally get to the > bottom of it quickly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes
[ https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=154060&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154060 ] ASF GitHub Bot logged work on BEAM-1081: Author: ASF GitHub Bot Created on: 12/Oct/18 23:36 Start Date: 12/Oct/18 23:36 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6670: [BEAM-1081] Annotations custom message support and classes tests. URL: https://github.com/apache/beam/pull/6670#issuecomment-429489882 @jglezt it looks like there are test issues, could you look at those? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154060) Time Spent: 50m (was: 40m) > annotations should support custom messages and classes > -- > > Key: BEAM-1081 > URL: https://issues.apache.org/jira/browse/BEAM-1081 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > Time Spent: 50m > Remaining Estimate: 0h > > Update > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py > to add 2 new features: > 1. ability to customize message > 2. ability to tag classes (not only functions) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154057 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 23:35 Start Date: 12/Oct/18 23:35 Worklog Time Spent: 10m Work Description: boyuanzz commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224938513 ## File path: sdks/java/container/boot.go ## @@ -103,7 +103,17 @@ func main() { filepath.Join(jarsDir, "slf4j-jdk14.jar"), filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } + + var has_worker_experiment = strings.Contains(options, "use_staged_dataflow_worker_jar") Review comment: Fixed, thanks~ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154057) Time Spent: 50m (was: 40m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154055 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 23:17 Start Date: 12/Oct/18 23:17 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224936531 ## File path: sdks/java/container/boot.go ## @@ -103,7 +103,17 @@ func main() { filepath.Join(jarsDir, "slf4j-jdk14.jar"), filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } + + var has_worker_experiment = strings.Contains(options, "use_staged_dataflow_worker_jar") Review comment: nit: go uses camelCase for local variables. You also need to run "gofmt -w ." to fix the indentation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154055) Time Spent: 40m (was: 0.5h) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink
[ https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=154054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154054 ] ASF GitHub Bot logged work on BEAM-5708: Author: ASF GitHub Bot Created on: 12/Oct/18 23:07 Start Date: 12/Oct/18 23:07 Worklog Time Spent: 10m Work Description: tweise commented on issue #6638: [BEAM-5708] Cache environment in portable flink runner URL: https://github.com/apache/beam/pull/6638#issuecomment-429485973 @angoenka did you test the fallback case? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154054) Time Spent: 1h 50m (was: 1h 40m) > Support caching of SDKHarness environments in flink > --- > > Key: BEAM-5708 > URL: https://issues.apache.org/jira/browse/BEAM-5708 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Cache and reuse environment to improve performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154050&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154050 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 22:27 Start Date: 12/Oct/18 22:27 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#issuecomment-429479809 Re: @herohde Please take another look at this. Thanks~ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154050) Time Spent: 0.5h (was: 20m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154049 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 22:17 Start Date: 12/Oct/18 22:17 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429478105 Unfortunately I think https://github.com/apache/beam/pull/6602#issuecomment-429468709 may be a very unintuitive change, so we need to roll it back and either fix the underlying issue with typing of negative numbers or proceed with a different solution here. We would need to cherry-pick the change into the release branch, so I'll mark BEAM-5615 as release blocker until cherry-pick is in. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154049) Time Spent: 2.5h (was: 2h 20m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2.5h > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154048 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 22:16 Start Date: 12/Oct/18 22:16 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429477876 And it's green! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154048) Time Spent: 4h 10m (was: 4h) Remaining Estimate: 67h 50m (was: 68h) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 4h 10m > Remaining Estimate: 67h 50m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154047 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 22:10 Start Date: 12/Oct/18 22:10 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-429476625 Squashed all the commits, FYI I imported this PR and internal google tests are also passing. @robertwb, happy to iterate more on your suggestions but we would like to submit this PR, and finish up this iteration This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154047) Time Spent: 8h 10m (was: 8h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154046&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154046 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 12/Oct/18 22:00 Start Date: 12/Oct/18 22:00 Worklog Time Spent: 10m Work Description: swegner commented on issue #6679: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6679#issuecomment-429474553 Yup! If you pop open the "All checks have passed" bar, you'll see a link to the "Website_Stage_GCS" job [results](https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Commit/79/), which contains a link to your staged changes for review: http://apache-beam-website-pull-requests.storage.googleapis.com/6679/index.html (I'm brainstorming a way to make that link more prominent on GitHub; let me know if you have ideas) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154046) Time Spent: 22h 50m (was: 22h 40m) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Time Spent: 22h 50m > Remaining Estimate: 0h > > I have been trying to use google datalab with python3. As I see there are > several packages that does not support python3 yet which google datalab > depends on. This is one of them. > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5734) RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline
[ https://issues.apache.org/jira/browse/BEAM-5734?focusedWorklogId=154045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154045 ] ASF GitHub Bot logged work on BEAM-5734: Author: ASF GitHub Bot Created on: 12/Oct/18 21:56 Start Date: 12/Oct/18 21:56 Worklog Time Spent: 10m Work Description: casidiablo commented on a change in pull request #6682: [BEAM-5734] RedisIO: only call Jedis.exec() on finishBundle if there is something to send URL: https://github.com/apache/beam/pull/6682#discussion_r224925261 ## File path: sdks/java/io/redis/src/test/java/org/apache/beam/sdk/io/redis/RedisIOTest.java ## @@ -86,7 +86,7 @@ public void testBulkRead() throws Exception { @Test public void testWriteReadUsingDefaultAppendMethod() throws Exception { ArrayList> data = new ArrayList<>(); -for (int i = 0; i < 100; i++) { +for (int i = 0; i < 8000; i++) { Review comment: If the test had used this value instead, unit tests would have detected the issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154045) Time Spent: 20m (was: 10m) > RedisIO: finishBundle calls Jedis.exec without checking if there are > operations in the pipeline > --- > > Key: BEAM-5734 > URL: https://issues.apache.org/jira/browse/BEAM-5734 > Project: Beam > Issue Type: Bug > Components: io-java-redis >Reporter: Cristian >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > It throws: >  > {code:java} > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > redis.clients.jedis.exceptions.JedisDataException: EXEC without MULTI > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.sdk.io.redis.RedisIOTest.testWriteReadUsingDefaultAppendMethod(RedisIOTest.java:100) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestC
[jira] [Work logged] (BEAM-5734) RedisIO: finishBundle calls Jedis.exec without checking if there are operations in the pipeline
[ https://issues.apache.org/jira/browse/BEAM-5734?focusedWorklogId=154044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154044 ] ASF GitHub Bot logged work on BEAM-5734: Author: ASF GitHub Bot Created on: 12/Oct/18 21:55 Start Date: 12/Oct/18 21:55 Worklog Time Spent: 10m Work Description: casidiablo opened a new pull request #6682: [BEAM-5734] RedisIO: only call Jedis.exec() on finishBundle if there is something to send URL: https://github.com/apache/beam/pull/6682 This fixes a bug in the RedisIO.write sink. The `finishBundle()` calls Jedis' `pipeline.exec()` method without checking if there is actually something to flush. That results in this exception being thrown: ``` org.apache.beam.sdk.Pipeline$PipelineExecutionException: redis.clients.jedis.exceptions.JedisDataException: EXEC without MULTI at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) at org.apache.beam.sdk.io.redis.RedisIOTest.testWriteReadUsingDefaultAppendMethod(RedisIOTest.java:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.ap
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154043&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154043 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 21:52 Start Date: 12/Oct/18 21:52 Worklog Time Spent: 10m Work Description: HuangLED edited a comment on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108 R: @herohde cc: @boyuanzz @pabloem Addressed. Also, option definition moved to WorkerOptions. Thanks to Boyuan for pointing out the right place for error message. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154043) Time Spent: 2h 20m (was: 2h 10m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154042&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154042 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 21:47 Start Date: 12/Oct/18 21:47 Worklog Time Spent: 10m Work Description: tvalentyn edited a comment on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429468709 Another somewhat related observation: Following snippets fails on Python 2, after this PR (in Direct Runner), but will pass in Python 3 where there is no distinction between `int` and `long`. ``` p = TestPipeline(options=PipelineOptions(pipeline_args)) input_data = p | beam.Create([1, -2]) # This becomes a [1, -2L]! (Unrelated to this PR). expected_result = [-2, 1] assert_that(input_data, equal_to(expected_result)) ``` ``` apache_beam.testing.util.BeamAssertException: Failed assert: [-2, 1] == [1, -2L] [while running 'assert_that/Match'] ``` Do we know why negatives are represented as longs? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154042) Time Spent: 2h 20m (was: 2h 10m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 20m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154039 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 21:39 Start Date: 12/Oct/18 21:39 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429470182 I merged the purported fix for that, so you should have better luck. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154039) Time Spent: 4h (was: 3h 50m) Remaining Estimate: 68h (was: 68h 10m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 4h > Remaining Estimate: 68h > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154037 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 21:33 Start Date: 12/Oct/18 21:33 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429468709 Another somewhat related observation: Following snippets fails on Python 2, after this PR (in Direct Runner), but will pass in Python 3 where there is no distinction between `int` and `long`. ``` p = TestPipeline(options=PipelineOptions(pipeline_args)) input_data = p | beam.Create([1, -2]) # This becomes a [1, -2L]! (Unrelated to this PR). expected_result = [-2, 1] assert_that(input_data, equal_to(expected_result)) ``` Do we know why negatives are represented as longs? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154037) Time Spent: 2h 10m (was: 2h) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 10m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154036 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 21:30 Start Date: 12/Oct/18 21:30 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-429468108 R: @boyuanzz @herohde @pabloem Addressed. Also, option definition moved to WorkerOptions. Thanks to Boyuan for pointing out the right place for error message. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154036) Time Spent: 2h 10m (was: 2h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154028 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 21:00 Start Date: 12/Oct/18 21:00 Worklog Time Spent: 10m Work Description: boyuanzz commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224913769 ## File path: sdks/java/container/boot.go ## @@ -104,6 +104,9 @@ func main() { filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } for _, md := range artifacts { + if strings.HasPrefix(md.Name, "beam-runners-google-cloud-dataflow-java-fn-api-worker") { Review comment: The purpose here is, if there is java worker jar in artifacts, then this jar should not be included into sdk harness classpath, which seems like we don't need to check experiment. wdyt? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154028) Time Spent: 20m (was: 10m) > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154027 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 20:59 Start Date: 12/Oct/18 20:59 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-429460451 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154027) Time Spent: 2h (was: 1h 50m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154025&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154025 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 20:49 Start Date: 12/Oct/18 20:49 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429457986 Precommits fail due to flaky test see [BEAM-5709](https://issues.apache.org/jira/browse/BEAM-5709) Succeeded precommit run: https://builds.apache.org/job/beam_PreCommit_Java_Phrase/318/ Failing precommit run: https://builds.apache.org/job/beam_PreCommit_Java_Phrase/319/ I accidentally started precommit twice in a row. Can we merge this? I feel this will be safer, than to try get green precommit. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154025) Time Spent: 3h 50m (was: 3h 40m) Remaining Estimate: 68h 10m (was: 68h 20m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3h 50m > Remaining Estimate: 68h 10m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154024 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 20:47 Start Date: 12/Oct/18 20:47 Worklog Time Spent: 10m Work Description: HuangLED opened a new pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680 Python support for customer worker jar (as a staged file). Tested positive and negative case by starting actual jobs. PreCommit pass locally. Follow this checklist to help us incorporate your contribution quickly and easily: - [X ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154024) Time Spent: 1h 50m (was: 1h 40m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Iss
[jira] [Work logged] (BEAM-5636) Java support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5636?focusedWorklogId=154022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154022 ] ASF GitHub Bot logged work on BEAM-5636: Author: ASF GitHub Bot Created on: 12/Oct/18 20:46 Start Date: 12/Oct/18 20:46 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6665: [BEAM-5636] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#discussion_r224910454 ## File path: sdks/java/container/boot.go ## @@ -104,6 +104,9 @@ func main() { filepath.Join(jarsDir, "beam-sdks-java-harness.jar"), } for _, md := range artifacts { + if strings.HasPrefix(md.Name, "beam-runners-google-cloud-dataflow-java-fn-api-worker") { Review comment: We should only make this check if the experiment is set. Also, the name will change to "dataflow-worker.jar" when the artifact bug is fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154022) Time Spent: 10m Remaining Estimate: 0h > Java support for custom dataflow worker jar > --- > > Key: BEAM-5636 > URL: https://issues.apache.org/jira/browse/BEAM-5636 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Java jobs. That requires a change to the Java boot > code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/java/container/boot.go#L107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154021 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 20:44 Start Date: 12/Oct/18 20:44 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6667: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6667#discussion_r224910055 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -674,7 +674,12 @@ def _add_argparse_args(cls, parser): 'job submission, the files will be staged in the staging area ' '(--staging_location option) and the workers will install them in ' 'same order they were specified on the command line.')) - +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' +) Review comment: Thanks! Issue addressed but lost the status in this PR due to my sub-optional git operations. Opening another PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154021) Time Spent: 1h 40m (was: 1.5h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154019 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 20:42 Start Date: 12/Oct/18 20:42 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429456279 This is clearly backward incompatible change, however I think this is the right behavior. In that sense I do not think it is a regression. However, we should clearly highlight this in our release notes/blog post etc. @tvalentyn Could you create a JIRA, mark it for 2.8.0, explain the change in behaviour and mark it as fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154019) Time Spent: 2h (was: 1h 50m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 2h > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154020&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154020 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 20:42 Start Date: 12/Oct/18 20:42 Worklog Time Spent: 10m Work Description: HuangLED closed pull request #6667: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6667 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/options/pipeline_options.py b/sdks/python/apache_beam/options/pipeline_options.py index a172535b100..2c061e0ec52 100644 --- a/sdks/python/apache_beam/options/pipeline_options.py +++ b/sdks/python/apache_beam/options/pipeline_options.py @@ -674,7 +674,12 @@ def _add_argparse_args(cls, parser): 'job submission, the files will be staged in the staging area ' '(--staging_location option) and the workers will install them in ' 'same order they were specified on the command line.')) - +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' +) class PortableOptions(PipelineOptions): diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py index 1acd3488524..5be60bd701b 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py @@ -381,6 +381,12 @@ def run_pipeline(self, pipeline): self.dataflow_client = apiclient.DataflowApplicationClient( pipeline._options) +if setup_options.dataflow_worker_jar: + experiments = ["use_staged_dataflow_worker_jar"] + if debug_options.experiments is not None: +experiments = list(set(experiments + debug_options.experiments)) + debug_options.experiments = experiments + # Create the job description and send a request to the service. The result # can be None if there is no need to send a request to the service (e.g. # template creation). If a request was sent and failed then the call will diff --git a/sdks/python/apache_beam/runners/portability/stager.py b/sdks/python/apache_beam/runners/portability/stager.py index ef7401ac6aa..e336fd3f9b9 100644 --- a/sdks/python/apache_beam/runners/portability/stager.py +++ b/sdks/python/apache_beam/runners/portability/stager.py @@ -123,8 +123,7 @@ def stage_job_resources(self, Returns: A list of file names (no paths) for the resources staged. All the - files - are assumed to be staged at staging_location. + files are assumed to be staged at staging_location. Raises: RuntimeError: If files specified are not found or error encountered @@ -256,6 +255,13 @@ def stage_job_resources(self, 'The file "%s" cannot be found. Its location was specified by ' 'the --sdk_location command-line option.' % sdk_path) +if hasattr(setup_options, 'dataflow_worker_jar') and \ +setup_options.dataflow_worker_jar: + jar_staged_filename = 'dataflow-worker.jar' + staged_path = FileSystems.join(staging_location, jar_staged_filename) + self.stage_artifact(setup_options.dataflow_worker_jar, staged_path) + resources.append(jar_staged_filename) + # Delete all temp files created while staging job resources. shutil.rmtree(temp_dir) retrieval_token = self.commit_manifest() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154020) Time Spent: 1.5h (was: 1h 20m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https:
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154016 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 20:41 Start Date: 12/Oct/18 20:41 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429456093 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154016) Time Spent: 3h 20m (was: 3h 10m) Remaining Estimate: 68h 40m (was: 68h 50m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3h 20m > Remaining Estimate: 68h 40m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154018 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 20:41 Start Date: 12/Oct/18 20:41 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429396509 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154018) Time Spent: 3h 40m (was: 3.5h) Remaining Estimate: 68h 20m (was: 68.5h) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3h 40m > Remaining Estimate: 68h 20m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=154017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154017 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 20:41 Start Date: 12/Oct/18 20:41 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429456093 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154017) Time Spent: 3.5h (was: 3h 20m) Remaining Estimate: 68.5h (was: 68h 40m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3.5h > Remaining Estimate: 68.5h > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154015 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 12/Oct/18 20:37 Start Date: 12/Oct/18 20:37 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6679: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6679#issuecomment-429455025 Hey @swegner, am I using a correct way to change the Beam-site? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154015) Time Spent: 22h 40m (was: 22.5h) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Time Spent: 22h 40m > Remaining Estimate: 0h > > I have been trying to use google datalab with python3. As I see there are > several packages that does not support python3 yet which google datalab > depends on. This is one of them. > https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154014 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 12/Oct/18 20:34 Start Date: 12/Oct/18 20:34 Worklog Time Spent: 10m Work Description: tvalentyn opened a new pull request #6679: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6679 Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154014) Time Spent: 22.5h (was: 22h 20m) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Time Spe
[jira] [Work logged] (BEAM-1251) Python 3 Support
[ https://issues.apache.org/jira/browse/BEAM-1251?focusedWorklogId=154013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154013 ] ASF GitHub Bot logged work on BEAM-1251: Author: ASF GitHub Bot Created on: 12/Oct/18 20:32 Start Date: 12/Oct/18 20:32 Worklog Time Spent: 10m Work Description: tvalentyn opened a new pull request #6678: [BEAM-1251] Add a link to Python 3 Conversion Quick Start Guide to the list of ongoing efforts on Beam site. URL: https://github.com/apache/beam/pull/6678 Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154013) Time Spent: 22h 20m (was: 22h 10m) > Python 3 Support > > > Key: BEAM-1251 > URL: https://issues.apache.org/jira/browse/BEAM-1251 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Eyad Sibai >Assignee: Robbe >Priority: Major > Time S
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154011 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 20:26 Start Date: 12/Oct/18 20:26 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429452493 CC: @Juta @robertwb @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154011) Time Spent: 1h 50m (was: 1h 40m) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 50m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5621) Several tests fail on Python 3 with TypeError: unorderable types: str() < int()
[ https://issues.apache.org/jira/browse/BEAM-5621?focusedWorklogId=154010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154010 ] ASF GitHub Bot logged work on BEAM-5621: Author: ASF GitHub Bot Created on: 12/Oct/18 20:26 Start Date: 12/Oct/18 20:26 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #6602: [BEAM-5621] Fix unorderable types in python 3 URL: https://github.com/apache/beam/pull/6602#issuecomment-429452434 Because we now sort by types, we may now encounter different behavior when using different string types. For example, previously `assert_that(equal_to(['a', u'b', b'c'], ['a', 'b', 'c]))` worked, but now it may not because this sorting order now depends on the exact type (i.e. the sorting may produce `[u'b', 'a', b'c'`) even for orderable types. Should we consider this a regression? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154010) Time Spent: 1h 40m (was: 1.5h) > Several tests fail on Python 3 with TypeError: unorderable types: str() < > int() > --- > > Key: BEAM-5621 > URL: https://issues.apache.org/jira/browse/BEAM-5621 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 40m > Remaining Estimate: 0h > > == > ERROR: test_remove_duplicates > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 677, in process > self.do_fn_invoker.invoke_process(windowed_value) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/common.py", > line 414, in invoke_process > windowed_value, self.process_method(windowed_value.value)) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1068, in > wrapper = lambda x: [fn(x)] > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/testing/util.py", > line 115, in _equal > sorted_expected = sorted(expected) > TypeError: unorderable types: str() < int() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes
[ https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=154009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154009 ] ASF GitHub Bot logged work on BEAM-1081: Author: ASF GitHub Bot Created on: 12/Oct/18 20:17 Start Date: 12/Oct/18 20:17 Worklog Time Spent: 10m Work Description: jglezt commented on issue #6670: [BEAM-1081] Annotations custom message support and classes tests. URL: https://github.com/apache/beam/pull/6670#issuecomment-429450311 Hi Pablo! I resolved the merge problems. Waiting for the Jenkins build. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154009) Time Spent: 40m (was: 0.5h) > annotations should support custom messages and classes > -- > > Key: BEAM-1081 > URL: https://issues.apache.org/jira/browse/BEAM-1081 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > Time Spent: 40m > Remaining Estimate: 0h > > Update > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py > to add 2 new features: > 1. ability to customize message > 2. ability to tag classes (not only functions) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5709) Tests in BeamFnControlServiceTest are flaky.
[ https://issues.apache.org/jira/browse/BEAM-5709?focusedWorklogId=154008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154008 ] ASF GitHub Bot logged work on BEAM-5709: Author: ASF GitHub Bot Created on: 12/Oct/18 20:15 Start Date: 12/Oct/18 20:15 Worklog Time Spent: 10m Work Description: kennknowles closed pull request #6639: [BEAM-5709] Fix flaky tests in BeamFnControlServiceTest. URL: https://github.com/apache/beam/pull/6639 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java index 7ba41fd7cc2..7f08a562ce5 100644 --- a/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java +++ b/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java @@ -88,6 +88,8 @@ public void testClientConnecting() throws Exception { server.shutdown(); server.awaitTermination(1, TimeUnit.SECONDS); server.shutdownNow(); +Thread.sleep(1000); // Wait for stub to close stream. + verify(requestObserver).onCompleted(); verifyNoMoreInteractions(requestObserver); } @@ -126,6 +128,8 @@ public void testMultipleClientsConnecting() throws Exception { server.shutdown(); server.awaitTermination(1, TimeUnit.SECONDS); server.shutdownNow(); +Thread.sleep(1000); // Wait for stub to close stream. + verify(requestObserver).onCompleted(); verifyNoMoreInteractions(requestObserver); verify(anotherRequestObserver).onCompleted(); This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154008) Time Spent: 1.5h (was: 1h 20m) > Tests in BeamFnControlServiceTest are flaky. > > > Key: BEAM-5709 > URL: https://issues.apache.org/jira/browse/BEAM-5709 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/BeamFnControlServiceTest.java > Tests for BeamFnControlService are currently flaky. The test attempts to > verify that onCompleted was called on the mock streams, but that function > gets called on a separate thread, so occasionally the function will not have > been called yet, despite the server being shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=153992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153992 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 12/Oct/18 18:57 Start Date: 12/Oct/18 18:57 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r224885162 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + // Schedule task to clean the container later. + // Ensure that this class is loaded in the parent Flink classloader. + getExecutor() + .schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); Review comment: You can base a PR on another PR's branch. Not sure how much sense that makes because then you would merge into your fork instead of the upstream repo. So I take that back :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153992) Time Spent: 33.5h (was: 33h 20m) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 33.5h > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake
[ https://issues.apache.org/jira/browse/BEAM-5683?focusedWorklogId=153987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153987 ] ASF GitHub Bot logged work on BEAM-5683: Author: ASF GitHub Bot Created on: 12/Oct/18 18:48 Start Date: 12/Oct/18 18:48 Worklog Time Spent: 10m Work Description: pabloem closed pull request #6646: [BEAM-5683] Print command logs on failure URL: https://github.com/apache/beam/pull/6646 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/runners/portability/stager.py b/sdks/python/apache_beam/runners/portability/stager.py index ef7401ac6aa..885d69c7866 100644 --- a/sdks/python/apache_beam/runners/portability/stager.py +++ b/sdks/python/apache_beam/runners/portability/stager.py @@ -408,7 +408,7 @@ def _populate_requirements_cache(requirements_file, cache_dir): ':all:' ] logging.info('Executing command: %s', cmd_args) -processes.check_call(cmd_args) +processes.check_output(cmd_args) @staticmethod def _build_setup_package(setup_file, temp_dir, build_setup_args=None): @@ -421,7 +421,7 @@ def _build_setup_package(setup_file, temp_dir, build_setup_args=None): os.path.basename(setup_file), 'sdist', '--dist-dir', temp_dir ] logging.info('Executing command: %s', build_setup_args) - processes.check_call(build_setup_args) + processes.check_output(build_setup_args) output_files = glob.glob(os.path.join(temp_dir, '*.tar.gz')) if not output_files: raise RuntimeError( @@ -549,7 +549,7 @@ def _download_pypi_sdk_package(temp_dir, logging.info('Executing command: %s', cmd_args) try: - processes.check_call(cmd_args) + processes.check_output(cmd_args) except subprocess.CalledProcessError as e: raise RuntimeError(repr(e)) diff --git a/sdks/python/apache_beam/runners/portability/stager_test.py b/sdks/python/apache_beam/runners/portability/stager_test.py index 9edc4eb4a2d..43b24967d82 100644 --- a/sdks/python/apache_beam/runners/portability/stager_test.py +++ b/sdks/python/apache_beam/runners/portability/stager_test.py @@ -65,7 +65,7 @@ def populate_requirements_cache(self, requirements_file, cache_dir): self.create_temp_file(os.path.join(cache_dir, 'def.txt'), 'nothing') def build_fake_pip_download_command_handler(self, has_wheels): -"""A stub for apache_beam.utils.processes.check_call that imitates pip. +"""A stub for apache_beam.utils.processes.check_output that imitates pip. Args: has_wheels: Whether pip fake should have a whl distribution of packages. @@ -291,7 +291,7 @@ def test_sdk_location_default(self): options.view_as(SetupOptions).sdk_location = 'default' with mock.patch( -'apache_beam.utils.processes.check_call', +'apache_beam.utils.processes.check_output', self.build_fake_pip_download_command_handler(has_wheels=False)): _, staged_resources = self.stager.stage_job_resources( options, temp_dir=self.make_temp_dir(), staging_location=staging_dir) @@ -309,7 +309,7 @@ def test_sdk_location_default_with_wheels(self): options.view_as(SetupOptions).sdk_location = 'default' with mock.patch( -'apache_beam.utils.processes.check_call', +'apache_beam.utils.processes.check_output', self.build_fake_pip_download_command_handler(has_wheels=True)): _, staged_resources = self.stager.stage_job_resources( options, temp_dir=self.make_temp_dir(), staging_location=staging_dir) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153987) Time Spent: 1h (was: 50m) > [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to > pip download flake > -- > > Key: BEAM-5683 > URL: https://issues.apache.org/jira/browse/BEAM-5683 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Scott Wegner >Assignee: Ankur Goenka >Priority: Major > Labels: currently-failing > Time Spent: 1h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ >
[jira] [Work logged] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake
[ https://issues.apache.org/jira/browse/BEAM-5683?focusedWorklogId=153986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153986 ] ASF GitHub Bot logged work on BEAM-5683: Author: ASF GitHub Bot Created on: 12/Oct/18 18:48 Start Date: 12/Oct/18 18:48 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6646: [BEAM-5683] Print command logs on failure URL: https://github.com/apache/beam/pull/6646#issuecomment-429423506 Ah thanks Ankur! lgtm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153986) Time Spent: 50m (was: 40m) > [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to > pip download flake > -- > > Key: BEAM-5683 > URL: https://issues.apache.org/jira/browse/BEAM-5683 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Scott Wegner >Assignee: Ankur Goenka >Priority: Major > Labels: currently-failing > Time Spent: 50m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/] > * [Gradle Build > Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests] > * [Test source > code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390] > Initial investigation: > Seems to be failing on pip download. > == > ERROR: test_multiple_empty_outputs > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py", > line 277, in test_multiple_empty_outputs > pipeline.run() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 104, in run > result = super(TestPipeline, self).run(test_runner_api) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 403, in run > self.to_runner_api(), self.runner, self._options).run(False) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 416, in run > return self.runner.run_pipeline(self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 50, in run_pipeline > self.result = super(TestDataflowRunner, self).run_pipeline(pipeline) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", > line 389, in run_pipeline > self.dataflow_client.create_job(self.job), self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py", > line 184, in wrapper > return fun(*args, **kwargs) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 490, in create_job > self.create_job_description(job) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 519, in create_job_description > resources = self._stage_resour > ces(job.options) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 452, in _stage_resources > staging_location=google_cloud_options.staging_location) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 161, in stage_job_resources > requirements_cache_path) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 411, in _populate_re
[jira] [Work logged] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake
[ https://issues.apache.org/jira/browse/BEAM-5683?focusedWorklogId=153983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153983 ] ASF GitHub Bot logged work on BEAM-5683: Author: ASF GitHub Bot Created on: 12/Oct/18 18:42 Start Date: 12/Oct/18 18:42 Worklog Time Spent: 10m Work Description: angoenka commented on issue #6646: [BEAM-5683] Print command logs on failure URL: https://github.com/apache/beam/pull/6646#issuecomment-429421690 Fixed the test case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153983) Time Spent: 40m (was: 0.5h) > [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to > pip download flake > -- > > Key: BEAM-5683 > URL: https://issues.apache.org/jira/browse/BEAM-5683 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Scott Wegner >Assignee: Ankur Goenka >Priority: Major > Labels: currently-failing > Time Spent: 40m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/] > * [Gradle Build > Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests] > * [Test source > code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390] > Initial investigation: > Seems to be failing on pip download. > == > ERROR: test_multiple_empty_outputs > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py", > line 277, in test_multiple_empty_outputs > pipeline.run() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 104, in run > result = super(TestPipeline, self).run(test_runner_api) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 403, in run > self.to_runner_api(), self.runner, self._options).run(False) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 416, in run > return self.runner.run_pipeline(self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 50, in run_pipeline > self.result = super(TestDataflowRunner, self).run_pipeline(pipeline) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", > line 389, in run_pipeline > self.dataflow_client.create_job(self.job), self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py", > line 184, in wrapper > return fun(*args, **kwargs) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 490, in create_job > self.create_job_description(job) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 519, in create_job_description > resources = self._stage_resour > ces(job.options) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 452, in _stage_resources > staging_location=google_cloud_options.staging_location) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 161, in stage_job_resources > requirements_cache_path) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 411, in _populate_r
[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink
[ https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=153978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153978 ] ASF GitHub Bot logged work on BEAM-5708: Author: ASF GitHub Bot Created on: 12/Oct/18 18:29 Start Date: 12/Oct/18 18:29 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6638: [BEAM-5708] Cache environment in portable flink runner URL: https://github.com/apache/beam/pull/6638#discussion_r224877854 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + // Schedule task to clean the container later. + // Ensure that this class is loaded in the parent Flink classloader. + getExecutor() + .schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); Review comment: @mxm The class is first loaded when we create the environment. So we can be assured that the class is loaded before release. Also, we require classes to be loaded on parent classloader for async container destruction. This inherently mean that once the class is loaded in parent class loader, its not going to be unloaded in this scenario. With the additional check mentioned, we will enforce this requirement. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153978) Time Spent: 1h 40m (was: 1.5h) > Support caching of SDKHarness environments in flink > --- > > Key: BEAM-5708 > URL: https://issues.apache.org/jira/browse/BEAM-5708 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Cache and reuse environment to improve performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153980 ] ASF GitHub Bot logged work on BEAM-5720: Author: ASF GitHub Bot Created on: 12/Oct/18 18:29 Start Date: 12/Oct/18 18:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #6659: [BEAM-5720] Fix encoding of large python ints in Python 3. URL: https://github.com/apache/beam/pull/6659#discussion_r224876803 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -293,8 +299,21 @@ def encode_to_stream(self, value, stream, nested): if value is None: stream.write_byte(NONE_TYPE) elif t is int: - stream.write_byte(INT_TYPE) - stream.write_var_int64(value) + # In Python 3, an int may be larger than 64 bits. + # Note that an OverflowError on stream.write_var_int64 would happen + # *after* the marker byte is written, so we must check earlier. + try: +# This may throw an overflow error when compiled. +int_value = value +# Otherwise, we must check ourselves. +if not is_compiled: + if not fits_in_64_bits(value): Review comment: Thanks. Consider the following wording for comments: ``` # In Python 3, an int may be larger than 64 bits. # We need to check whether value fits into a 64 bit integer before # writing the marker byte. try: # In Cython-compiled code this will throw an overflow error # when value does not fit into int64. int_value = value # If Cython is not used, we must do a (slower) check ourselves. if not is_compiled: ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153980) Time Spent: 1h 10m (was: 1h) > Default coder breaks with large ints on Python 3 > > > Key: BEAM-5720 > URL: https://issues.apache.org/jira/browse/BEAM-5720 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > The test for `int` includes greater than 64-bit values, which causes an > overflow error later in the code. We need to only use that coding scheme for > machine-sized ints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink
[ https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=153977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153977 ] ASF GitHub Bot logged work on BEAM-5708: Author: ASF GitHub Bot Created on: 12/Oct/18 18:29 Start Date: 12/Oct/18 18:29 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6638: [BEAM-5708] Cache environment in portable flink runner URL: https://github.com/apache/beam/pull/6638#discussion_r224875048 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + // Schedule task to clean the container later. + // Ensure that this class is loaded in the parent Flink classloader. + getExecutor() + .schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); Review comment: @tweise Makes sense. I will add that check. In addition of falling back to immediate release, I will also log a warning. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153977) Time Spent: 1.5h (was: 1h 20m) > Support caching of SDKHarness environments in flink > --- > > Key: BEAM-5708 > URL: https://issues.apache.org/jira/browse/BEAM-5708 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Cache and reuse environment to improve performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153979 ] ASF GitHub Bot logged work on BEAM-5720: Author: ASF GitHub Bot Created on: 12/Oct/18 18:29 Start Date: 12/Oct/18 18:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #6659: [BEAM-5720] Fix encoding of large python ints in Python 3. URL: https://github.com/apache/beam/pull/6659#discussion_r224870551 ## File path: sdks/python/apache_beam/coders/coder_impl.pxd ## @@ -69,8 +69,9 @@ cdef class DeterministicFastPrimitivesCoderImpl(CoderImpl): cdef object NoneType -cdef char UNKNOWN_TYPE, NONE_TYPE, INT_TYPE, FLOAT_TYPE, BOOL_TYPE -cdef char BYTES_TYPE, UNICODE_TYPE, LIST_TYPE, TUPLE_TYPE, DICT_TYPE, SET_TYPE +cdef unsigned char UNKNOWN_TYPE, NONE_TYPE, INT_TYPE, FLOAT_TYPE, BOOL_TYPE Review comment: Curious, what was a motivation for this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153979) Time Spent: 1h (was: 50m) > Default coder breaks with large ints on Python 3 > > > Key: BEAM-5720 > URL: https://issues.apache.org/jira/browse/BEAM-5720 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > The test for `int` includes greater than 64-bit values, which causes an > overflow error later in the code. We need to only use that coding scheme for > machine-sized ints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=153976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153976 ] ASF GitHub Bot logged work on BEAM-5326: Author: ASF GitHub Bot Created on: 12/Oct/18 18:28 Start Date: 12/Oct/18 18:28 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #6665: [BEAM-5326] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#issuecomment-429417765 Re: @pabloem Thanks! Working on minor changes now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153976) Time Spent: 2h (was: 1h 50m) > SDK support for custom dataflow worker jar > -- > > Key: BEAM-5326 > URL: https://issues.apache.org/jira/browse/BEAM-5326 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Doc: > https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=153975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153975 ] ASF GitHub Bot logged work on BEAM-5326: Author: ASF GitHub Bot Created on: 12/Oct/18 18:21 Start Date: 12/Oct/18 18:21 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6665: [BEAM-5326] Java support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6665#issuecomment-429415545 Fwiw you can run `./gradlew :..project..:check` to run unit tests and lint checks without running dataflow jobs, which should be useful ; ) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153975) Time Spent: 1h 50m (was: 1h 40m) > SDK support for custom dataflow worker jar > -- > > Key: BEAM-5326 > URL: https://issues.apache.org/jira/browse/BEAM-5326 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Doc: > https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=153974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153974 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 12/Oct/18 18:20 Start Date: 12/Oct/18 18:20 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6667: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6667#discussion_r224874588 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -674,7 +674,12 @@ def _add_argparse_args(cls, parser): 'job submission, the files will be staged in the staging area ' '(--staging_location option) and the workers will install them in ' 'same order they were specified on the command line.')) - +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' +) Review comment: I'm thinking that the option would be better in `WorkerOptions`, or some other options class related to Dataflow+Portability. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153974) Time Spent: 1h 20m (was: 1h 10m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=153972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153972 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 12/Oct/18 18:13 Start Date: 12/Oct/18 18:13 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r224873292 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + // Schedule task to clean the container later. + // Ensure that this class is loaded in the parent Flink classloader. + getExecutor() + .schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); Review comment: Maybe I am missing some thing. For this PR, I based my branch java_pvr_jenkins on java_pvr_cache_environments and then created the PR (merge 1 commit into apache:master from angoenka:java_pvr_jenkins). This picks the commits from both branches java_pvr_jenkins and java_pvr_cache_environments. Is there anyway to avoid picking commits from the base PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153972) Time Spent: 33h 20m (was: 33h 10m) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 33h 20m > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=153971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153971 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 12/Oct/18 18:12 Start Date: 12/Oct/18 18:12 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r224873292 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + // Schedule task to clean the container later. + // Ensure that this class is loaded in the parent Flink classloader. + getExecutor() + .schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); Review comment: Maybe I am missing some thing.try t For this PR, I based my branch java_pvr_jenkins on java_pvr_cache_environments and then created the PR (merge 1 commit into apache:master from angoenka:java_pvr_jenkins). This picks the commits from both branches java_pvr_jenkins and java_pvr_cache_environments. Is there anyway to avoid picking commits from the base PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153971) Time Spent: 33h 10m (was: 33h) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 33h 10m > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes
[ https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=153967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153967 ] ASF GitHub Bot logged work on BEAM-1081: Author: ASF GitHub Bot Created on: 12/Oct/18 17:59 Start Date: 12/Oct/18 17:59 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6670: [BEAM-1081] Annotations custom message support and classes tests. URL: https://github.com/apache/beam/pull/6670#discussion_r224868863 ## File path: sdks/python/apache_beam/utils/annotations_test.py ## @@ -78,6 +131,29 @@ def fnc_test_deprecated_without_since_should_fail(): fnc_test_deprecated_without_since_should_fail() assert not w +<<< HEAD + def test_deprecated_without_since_custom_should_fail(self): +with warnings.catch_warnings(record=True) as w: + with self.assertRaises(TypeError): +@deprecated(custom_message='Test %since%') +def fnc_test_deprecated_without_since_custom_should_fail(): + return 'lol' +fnc_test_deprecated_without_since_custom_should_fail() +=== + def test_deprecated_without_since_should_fail_class(self): +with warnings.catch_warnings(record=True) as w: + with self.assertRaises(TypeError): + +@deprecated() +class Class_test_deprecated_without_since_should_fail(): + fooo = 'lol' + def foo(self): +return 'lol' +foo = Class_test_deprecated_without_since_should_fail() +foo.foo() +>>> ae3ee47cf9e131cf66644532022178d559d9237d Review comment: merge trouble This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153967) Time Spent: 20m (was: 10m) > annotations should support custom messages and classes > -- > > Key: BEAM-1081 > URL: https://issues.apache.org/jira/browse/BEAM-1081 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > Time Spent: 20m > Remaining Estimate: 0h > > Update > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py > to add 2 new features: > 1. ability to customize message > 2. ability to tag classes (not only functions) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes
[ https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=153968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153968 ] ASF GitHub Bot logged work on BEAM-1081: Author: ASF GitHub Bot Created on: 12/Oct/18 17:59 Start Date: 12/Oct/18 17:59 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6670: [BEAM-1081] Annotations custom message support and classes tests. URL: https://github.com/apache/beam/pull/6670#discussion_r224868835 ## File path: sdks/python/apache_beam/utils/annotations_test.py ## @@ -78,6 +131,29 @@ def fnc_test_deprecated_without_since_should_fail(): fnc_test_deprecated_without_since_should_fail() assert not w +<<< HEAD Review comment: it seems that there was some trouble merging here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153968) Time Spent: 20m (was: 10m) > annotations should support custom messages and classes > -- > > Key: BEAM-1081 > URL: https://issues.apache.org/jira/browse/BEAM-1081 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > Time Spent: 20m > Remaining Estimate: 0h > > Update > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py > to add 2 new features: > 1. ability to customize message > 2. ability to tag classes (not only functions) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-1081) annotations should support custom messages and classes
[ https://issues.apache.org/jira/browse/BEAM-1081?focusedWorklogId=153969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153969 ] ASF GitHub Bot logged work on BEAM-1081: Author: ASF GitHub Bot Created on: 12/Oct/18 17:59 Start Date: 12/Oct/18 17:59 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6670: [BEAM-1081] Annotations custom message support and classes tests. URL: https://github.com/apache/beam/pull/6670#discussion_r224868893 ## File path: sdks/python/apache_beam/utils/annotations_test.py ## @@ -78,6 +131,29 @@ def fnc_test_deprecated_without_since_should_fail(): fnc_test_deprecated_without_since_should_fail() assert not w +<<< HEAD + def test_deprecated_without_since_custom_should_fail(self): +with warnings.catch_warnings(record=True) as w: + with self.assertRaises(TypeError): +@deprecated(custom_message='Test %since%') +def fnc_test_deprecated_without_since_custom_should_fail(): + return 'lol' +fnc_test_deprecated_without_since_custom_should_fail() +=== Review comment: merge trouble This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153969) Time Spent: 0.5h (was: 20m) > annotations should support custom messages and classes > -- > > Key: BEAM-1081 > URL: https://issues.apache.org/jira/browse/BEAM-1081 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > Time Spent: 0.5h > Remaining Estimate: 0h > > Update > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py > to add 2 new features: > 1. ability to customize message > 2. ability to tag classes (not only functions) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5706) PubSub dependency upgrade causes internal issues for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-5706?focusedWorklogId=153961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153961 ] ASF GitHub Bot logged work on BEAM-5706: Author: ASF GitHub Bot Created on: 12/Oct/18 17:50 Start Date: 12/Oct/18 17:50 Worklog Time Spent: 10m Work Description: charlesccychen closed pull request #6674: [BEAM-5706] Revert 324f0b3e3c (pull request #6564 from udim/pubsub-0-35-4) URL: https://github.com/apache/beam/pull/6674 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py index 2fc19daa1af..6dc60d0a807 100644 --- a/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py +++ b/sdks/python/apache_beam/examples/complete/game/game_stats_it_test.py @@ -72,15 +72,15 @@ def setUp(self): # Set up PubSub environment. from google.cloud import pubsub -self.pub_client = pubsub.PublisherClient() -self.input_topic = self.pub_client.create_topic( -self.pub_client.topic_path(self.project, self.INPUT_TOPIC + _unique_id)) +self.pubsub_client = pubsub.Client(project=self.project) +unique_topic_name = self.INPUT_TOPIC + _unique_id +unique_subscrition_name = self.INPUT_SUB + _unique_id +self.input_topic = self.pubsub_client.topic(unique_topic_name) +self.input_sub = self.input_topic.subscription(unique_subscrition_name) -self.sub_client = pubsub.SubscriberClient() -self.input_sub = self.sub_client.create_subscription( -self.sub_client.subscription_path(self.project, - self.INPUT_SUB + _unique_id), -self.input_topic.name) +self.input_topic.create() +test_utils.wait_for_topics_created([self.input_topic]) +self.input_sub.create() # Set up BigQuery environment from google.cloud import bigquery @@ -95,15 +95,14 @@ def _inject_pubsub_game_events(self, topic, message_count): """Inject game events as test data to PubSub.""" logging.debug('Injecting %d game events to topic %s', - message_count, topic.name) + message_count, topic.full_name) for _ in range(message_count): - self.pub_client.publish(topic.name, - self.INPUT_EVENT % self._test_timestamp) + topic.publish(self.INPUT_EVENT % self._test_timestamp) def _cleanup_pubsub(self): -test_utils.cleanup_subscriptions(self.sub_client, [self.input_sub]) -test_utils.cleanup_topics(self.pub_client, [self.input_topic]) +test_utils.cleanup_subscriptions([self.input_sub]) +test_utils.cleanup_topics([self.input_topic]) def _cleanup_dataset(self): self.dataset.delete() @@ -124,9 +123,9 @@ def test_game_stats_it(self): # TODO(mariagh): Add teams table verifier once game_stats.py is fixed. -extra_opts = {'subscription': self.input_sub.name, +extra_opts = {'subscription': self.input_sub.full_name, 'dataset': self.dataset.name, - 'topic': self.input_topic.name, + 'topic': self.input_topic.full_name, 'fixed_window_duration': 1, 'user_activity_window_duration': 1, 'wait_until_finish_duration': @@ -144,6 +143,8 @@ def test_game_stats_it(self): self.dataset.name, self.OUTPUT_TABLE_TEAMS) # Generate input data and inject to PubSub. +test_utils.wait_for_subscriptions_created([self.input_topic, + self.input_sub]) self._inject_pubsub_game_events(self.input_topic, self.DEFAULT_INPUT_COUNT) # Get pipeline options from command argument: --test-pipeline-options, diff --git a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py index e0e309b1265..ab109425eb6 100644 --- a/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py +++ b/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py @@ -73,16 +73,15 @@ def setUp(self): # Set up PubSub environment. from google.cloud import pubsub +self.pubsub_client = pubsub.Client(project=self.project) +unique_topic_name = self.INPUT_TOPIC + _unique_id +unique_subscrition_name = self.INPUT_SUB + _unique_id +self.input_topic = self.pubsub_client.topic(unique_topic_name) +self.input_sub = self.input_topic.subscription(unique_subscrition_name) -self.pub_client = pubsub.PublisherClient() -self.input_
[jira] [Work logged] (BEAM-5615) Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword argument for this function
[ https://issues.apache.org/jira/browse/BEAM-5615?focusedWorklogId=153952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153952 ] ASF GitHub Bot logged work on BEAM-5615: Author: ASF GitHub Bot Created on: 12/Oct/18 17:22 Start Date: 12/Oct/18 17:22 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6570: [BEAM-5615] fix cmp is an invalid keyword for sort function in python 3 URL: https://github.com/apache/beam/pull/6570#issuecomment-429399025 Disallowing the compare parameter starting now in Python 3 sounds reasonable to me. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153952) Time Spent: 3.5h (was: 3h 20m) > Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword > argument for this function > - > > Key: BEAM-5615 > URL: https://issues.apache.org/jira/browse/BEAM-5615 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Fix For: Not applicable > > Time Spent: 3.5h > Remaining Estimate: 0h > > ERROR: test_top (apache_beam.transforms.combiners_test.CombineTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners_test.py", > line 89, in test_top > names) # Note parameter passed to comparator. > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py", > line 111, in __or__ > return self.pipeline.apply(ptransform, self) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 467, in apply > label or transform.label) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 477, in apply > return self.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 513, in apply > pvalueish_result = self.runner.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 759, in expand > return self._fn(pcoll, *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 185, in Of > TopCombineFn(n, compare, key, reverse), *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pvalue.py", > line 111, in __or__ > return self.pipeline.apply(ptransform, self) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/pipeline.py", > line 513, in apply > pvalueish_result = self.runner.apply(transform, pvalueish) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 1251, in expand > default_value = combine_fn.apply([], *self.args, **self.kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/core.py", > line 623, in apply > *args, **kwargs) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/transforms/combiners.py", > line 362, in extract_output > self._sort_buffer(buffer, lt) > File > "/usr/local/google/home/valentyn
[jira] [Work logged] (BEAM-5706) PubSub dependency upgrade causes internal issues for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-5706?focusedWorklogId=153951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153951 ] ASF GitHub Bot logged work on BEAM-5706: Author: ASF GitHub Bot Created on: 12/Oct/18 17:18 Start Date: 12/Oct/18 17:18 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #6674: [BEAM-5706] Revert 324f0b3e3c (pull request #6564 from udim/pubsub-0-35-4) URL: https://github.com/apache/beam/pull/6674#issuecomment-429397904 Thanks, this LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153951) Time Spent: 10m Remaining Estimate: 0h > PubSub dependency upgrade causes internal issues for Dataflow > - > > Key: BEAM-5706 > URL: https://issues.apache.org/jira/browse/BEAM-5706 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.8.0 >Reporter: Charles Chen >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The PubSub dependency upgrade in https://github.com/apache/beam/pull/6564 > causes internal issues for Dataflow. The Dataflow team needs to resolve this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=153949&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153949 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 17:13 Start Date: 12/Oct/18 17:13 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429396509 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153949) Time Spent: 3h 10m (was: 3h) Remaining Estimate: 68h 50m (was: 69h) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3h 10m > Remaining Estimate: 68h 50m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153948 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 12/Oct/18 17:09 Start Date: 12/Oct/18 17:09 Worklog Time Spent: 10m Work Description: mxm commented on issue #6600: [BEAM-5442] Store duplicate unknown options in a list argument URL: https://github.com/apache/beam/pull/6600#issuecomment-429395314 @charlesccychen Fair points. Let's fix this more programmatically then. Builtin SDK options and user-defined options should be the only top-level options. "Unknown" options should only be available to the Runners via a separate option list which is transmitted through the Proto alongside the regular options. @aaltay +1 Would make sense to revert this on the release branch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153948) Time Spent: 9h 40m (was: 9.5h) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Fix For: 2.8.0 > > Time Spent: 9h 40m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153947&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153947 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 12/Oct/18 17:03 Start Date: 12/Oct/18 17:03 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6600: [BEAM-5442] Store duplicate unknown options in a list argument URL: https://github.com/apache/beam/pull/6600#issuecomment-429393801 Should we revert this change (and the 2 related changes before) for the release branch? I think we should address @charlesccychen's concerns before we release with these changes. (Perhaps a mailing discussion would help.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153947) Time Spent: 9.5h (was: 9h 20m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Fix For: 2.8.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=153944&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153944 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 16:59 Start Date: 12/Oct/18 16:59 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429392567 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153944) Time Spent: 2h 50m (was: 2h 40m) Remaining Estimate: 69h 10m (was: 69h 20m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 2h 50m > Remaining Estimate: 69h 10m > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5653) Dataflow FnApi Worker overrides some of Coders due to coder ID generation collision.
[ https://issues.apache.org/jira/browse/BEAM-5653?focusedWorklogId=153945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153945 ] ASF GitHub Bot logged work on BEAM-5653: Author: ASF GitHub Bot Created on: 12/Oct/18 16:59 Start Date: 12/Oct/18 16:59 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on issue #6649: [BEAM-5653] Fix overriding coders due to duplicate coderId generation URL: https://github.com/apache/beam/pull/6649#issuecomment-429392567 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153945) Time Spent: 3h (was: 2h 50m) Remaining Estimate: 69h (was: 69h 10m) > Dataflow FnApi Worker overrides some of Coders due to coder ID generation > collision. > > > Key: BEAM-5653 > URL: https://issues.apache.org/jira/browse/BEAM-5653 > Project: Beam > Issue Type: Test > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Blocker > Fix For: 2.8.0 > > Original Estimate: 72h > Time Spent: 3h > Remaining Estimate: 69h > > Due to one of latest refactorings, we got a bug in Java FnApi Worker that it > overrides Coders in ProcessBundleDescriptor sent to SDK Harness that causes > jobs to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5717) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 404 The destination bucket gs://apache-beam-website-pull-requests does not exist or the write to the destination
[ https://issues.apache.org/jira/browse/BEAM-5717?focusedWorklogId=153937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153937 ] ASF GitHub Bot logged work on BEAM-5717: Author: ASF GitHub Bot Created on: 12/Oct/18 16:39 Start Date: 12/Oct/18 16:39 Worklog Time Spent: 10m Work Description: swegner closed pull request #: [BEAM-5717] Use the PR context from the environment rather than Gradle properties URL: https://github.com/apache/beam/pull/ This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.test-infra/jenkins/PrecommitJobBuilder.groovy b/.test-infra/jenkins/PrecommitJobBuilder.groovy index af0ef837c53..34f6e6f45ec 100644 --- a/.test-infra/jenkins/PrecommitJobBuilder.groovy +++ b/.test-infra/jenkins/PrecommitJobBuilder.groovy @@ -103,9 +103,6 @@ class PrecommitJobBuilder { rootBuildScriptDir(commonJobProperties.checkoutDir) tasks(gradleTask) commonJobProperties.setGradleSwitches(delegate) - if (scope.binding.hasVariable('ghprbPullId')) { -switches('-PgithubPullRequestId=${ghprbPullId}') - } if (nameBase == 'Java') { // BEAM-5035: Parallel builds are very flaky switches('--no-parallel') diff --git a/website/build.gradle b/website/build.gradle index db852dc7ce8..0a2dab65bb9 100644 --- a/website/build.gradle +++ b/website/build.gradle @@ -139,7 +139,7 @@ createBuildTask( name:'Gcs', useTestConfig: true, baseUrl: getBaseUrl(), dockerW createBuildTask( name:'Apache', dockerWorkDir: dockerWorkDir) def getBaseUrl() { - project.findProperty('githubPullRequestId')?.trim() ?: 'latest' + System.getenv('ghprbPullId')?.trim() ?: 'latest' } def buildContentDir(name) { "${project.rootDir}/build/website/generated-${name.toLowerCase()}-content" @@ -255,7 +255,7 @@ publishWebsite.dependsOn commitWebsite /* * Stages a pull request on GCS * For example: - * ./gradlew :beam-website:stageWebsite -PgithubPullRequestId=${ghprbPullId} -PwebsiteBucket=foo + * ./gradlew :beam-website:stageWebsite -PwebsiteBucket=foo */ task stageWebsite << { def baseUrl = getBaseUrl() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153937) Time Spent: 4h 10m (was: 4h) > [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 404 The destination > bucket gs://apache-beam-website-pull-requests does not exist or the write to > the destination must be restarted > - > > Key: BEAM-5717 > URL: https://issues.apache.org/jira/browse/BEAM-5717 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 4h 10m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/28/] > * [Gradle Build > Scan|https://scans.gradle.com/s/3a7ujky4ekojw/console-log?task=:beam-website:stageWebsite#L4610] > * [Test source > code|https://github.com/apache/beam/blob/280277e7788b4c28680dd8ca02d54a55195b24ba/website/build.gradle#L271] > Initial investigation: > I haven't seen this issue before; it's not clear if this is a persistent > failure or a flake. I will investigate further. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5717) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 404 The destination bucket gs://apache-beam-website-pull-requests does not exist or the write to the destination
[ https://issues.apache.org/jira/browse/BEAM-5717?focusedWorklogId=153935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153935 ] ASF GitHub Bot logged work on BEAM-5717: Author: ASF GitHub Bot Created on: 12/Oct/18 16:38 Start Date: 12/Oct/18 16:38 Worklog Time Spent: 10m Work Description: alanmyrvold commented on issue #: [BEAM-5717] Use the PR context from the environment rather than Gradle properties URL: https://github.com/apache/beam/pull/#issuecomment-429386629 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153935) Time Spent: 4h (was: 3h 50m) > [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] 404 The destination > bucket gs://apache-beam-website-pull-requests does not exist or the write to > the destination must be restarted > - > > Key: BEAM-5717 > URL: https://issues.apache.org/jira/browse/BEAM-5717 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 4h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/28/] > * [Gradle Build > Scan|https://scans.gradle.com/s/3a7ujky4ekojw/console-log?task=:beam-website:stageWebsite#L4610] > * [Test source > code|https://github.com/apache/beam/blob/280277e7788b4c28680dd8ca02d54a55195b24ba/website/build.gradle#L271] > Initial investigation: > I haven't seen this issue before; it's not clear if this is a persistent > failure or a flake. I will investigate further. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5714) RedisIO emit error of EXEC without MULTI
[ https://issues.apache.org/jira/browse/BEAM-5714?focusedWorklogId=153936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153936 ] ASF GitHub Bot logged work on BEAM-5714: Author: ASF GitHub Bot Created on: 12/Oct/18 16:38 Start Date: 12/Oct/18 16:38 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6651: [BEAM-5714] Fix RedisIO EXEC without MULTI error URL: https://github.com/apache/beam/pull/6651#issuecomment-429386703 R: @jbonofre are you the expert on RedisIO? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153936) Time Spent: 0.5h (was: 20m) > RedisIO emit error of EXEC without MULTI > > > Key: BEAM-5714 > URL: https://issues.apache.org/jira/browse/BEAM-5714 > Project: Beam > Issue Type: Bug > Components: io-java-redis >Affects Versions: 2.7.0 >Reporter: K.K. POON >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > RedisIO has EXEC without MULTI error after SET a batch of records. >  > By looking at the source code, I guess there is missing `pipeline.multi();` > after exec() the last batch. > [https://github.com/apache/beam/blob/master/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java#L555] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4796) SpannerIO waits for all input before writing
[ https://issues.apache.org/jira/browse/BEAM-4796?focusedWorklogId=153924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153924 ] ASF GitHub Bot logged work on BEAM-4796: Author: ASF GitHub Bot Created on: 12/Oct/18 16:03 Start Date: 12/Oct/18 16:03 Worklog Time Spent: 10m Work Description: nielm commented on issue #6409: [BEAM-4796] SpannerIO: Add option to wait for Schema to be ready. URL: https://github.com/apache/beam/pull/6409#issuecomment-429376495 > Can we close this PR given that #6478 includes this ? yes, closed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153924) Time Spent: 40m (was: 0.5h) > SpannerIO waits for all input before writing > > > Key: BEAM-4796 > URL: https://issues.apache.org/jira/browse/BEAM-4796 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.5.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Priority: Major > Fix For: 2.9.0 > > Time Spent: 40m > Remaining Estimate: 0h > > SpannerIO.Write waits for all input in the window to arrive before getting > the schema: > [https://github.com/apache/beam/blame/release-2.5.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L841] >  > In streaming mode, this is not an issue, but in batch mode, this causes the > pipeline to stall until all input is read, which could be a significant > amount of time (and temp data). >  >  -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4796) SpannerIO waits for all input before writing
[ https://issues.apache.org/jira/browse/BEAM-4796?focusedWorklogId=153925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153925 ] ASF GitHub Bot logged work on BEAM-4796: Author: ASF GitHub Bot Created on: 12/Oct/18 16:03 Start Date: 12/Oct/18 16:03 Worklog Time Spent: 10m Work Description: nielm closed pull request #6409: [BEAM-4796] SpannerIO: Add option to wait for Schema to be ready. URL: https://github.com/apache/beam/pull/6409 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java index af2a3b00d1f..5cb2469bcfb 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java @@ -163,6 +163,11 @@ * Write#withBatchSizeBytes(long)}. Setting batch size to a small value or zero practically disables * batching. * + * The write transform reads the database schema on pipeline start. If the schema is created as + * part of the same pipline, this transform needs to wait until the schema has been created. Use + * {@link Write#withSchemaReadySignal(PCollection)} to pass a {@link PCollection} which will be used + * with {@link Wait#on(PCollection[])} to prevent the schema from being read until it is ready. + * * The transform does not provide same transactional guarantees as Cloud Spanner. In particular, * * @@ -682,6 +687,9 @@ public CreateTransaction withTimestampBound(TimestampBound timestampBound) { abstract PTransform>, PCollection>>> getSampler(); +@Nullable +abstract PCollection getSchemaReadySignal(); + abstract Builder toBuilder(); @AutoValue.Builder @@ -701,6 +709,8 @@ abstract Builder setSampler( PTransform>, PCollection>>> sampler); + abstract Builder setSchemaReadySignal(PCollection schemaReadySignal); + abstract Write build(); } @@ -786,6 +796,15 @@ public Write withMaxNumMutations(long maxNumMutations) { return toBuilder().setMaxNumMutations(maxNumMutations).build(); } +/** + * Specifies an input PCollection that can be used with a {@code Wait.on(signal)} to indicate + * when the database schema is ready. To be used when the schema creation is part of the + * pipeline to prevent the connector reading the schema too early. + */ +public Write withSchemaReadySignal(PCollection signal) { + return toBuilder().setSchemaReadySignal(signal).build(); +} + @Override public SpannerWriteResult expand(PCollection input) { getSpannerConfig().validate(); @@ -835,13 +854,16 @@ public SpannerWriteResult expand(PCollection input) { if (sampler == null) { sampler = createDefaultSampler(); } + // First, read the Cloud Spanner schema. + PCollection schemaSeed = + input.getPipeline().apply("Create Seed", Create.of((Void) null)); + if (spec.getSchemaReadySignal() != null) { +// Wait for external signal before reading schema. +schemaSeed = schemaSeed.apply("Wait for schema", Wait.on(spec.getSchemaReadySignal())); + } final PCollectionView schemaView = - input - .getPipeline() - .apply("Create seed", Create.of((Void) null)) - // Wait for input mutations so it is possible to chain transforms. - .apply(Wait.on(input)) + schemaSeed .apply( "Read information schema", ParDo.of(new ReadSpannerSchema(spec.getSpannerConfig( This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153925) Time Spent: 50m (was: 40m) > SpannerIO waits for all input before writing > > > Key: BEAM-4796 > URL: https://issues.apache.org/jira/browse/BEAM-4796 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.5.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Prio
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153918 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 12/Oct/18 15:36 Start Date: 12/Oct/18 15:36 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #6600: [BEAM-5442] Store duplicate unknown options in a list argument URL: https://github.com/apache/beam/pull/6600#issuecomment-429367754 CC: @tweise This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153918) Time Spent: 9h 20m (was: 9h 10m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Fix For: 2.8.0 > > Time Spent: 9h 20m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153917 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 12/Oct/18 15:36 Start Date: 12/Oct/18 15:36 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #6600: [BEAM-5442] Store duplicate unknown options in a list argument URL: https://github.com/apache/beam/pull/6600#issuecomment-429367628 To clarify again, I believe that with the current approach, a user no longer needs to explicitly define a pipeline option to use it--they can just pass it (as `--myparam abc`) and it will be "magically" available for use (as `options.myparam`). This is not good for backwards compatibility, since the user should not rely on this implementation detail, and it will become problematic if we decide to change this after the user starts using it. I would therefore prefer to isolate these options so that they are at least not user-visible. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153917) Time Spent: 9h 10m (was: 9h) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Fix For: 2.8.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5442) PortableRunner swallows custom options for Runner
[ https://issues.apache.org/jira/browse/BEAM-5442?focusedWorklogId=153916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153916 ] ASF GitHub Bot logged work on BEAM-5442: Author: ASF GitHub Bot Created on: 12/Oct/18 15:33 Start Date: 12/Oct/18 15:33 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #6600: [BEAM-5442] Store duplicate unknown options in a list argument URL: https://github.com/apache/beam/pull/6600#issuecomment-429366775 Thanks. My concern wasn't about the runtime cost. It introduces an inconsistency where single and multiply passed options are treated differently (and this requires special casing when using the value) and it also promotes the use of the "magical" behavior as opposed to explicit definition of pipeline options. We should not have users depend on this, since it would discourage explicit definition of pipeline options in the user pipeline. I would therefore suggest passing "unused options" which can be parsed by the runner using a (potentially runner-specific) explicitly-defined parser. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153916) Time Spent: 9h (was: 8h 50m) > PortableRunner swallows custom options for Runner > - > > Key: BEAM-5442 > URL: https://issues.apache.org/jira/browse/BEAM-5442 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Fix For: 2.8.0 > > Time Spent: 9h > Remaining Estimate: 0h > > The PortableRunner doesn't pass custom PipelineOptions to the executing > Runner. > Example: {{--parallelism=4}} won't be forwarded to the FlinkRunner. > (The option is just removed during proto translation without any warning) > We should allow some form of customization through the options, even for the > PortableRunner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4130) Portable Flink runner JobService entry point in a Docker container
[ https://issues.apache.org/jira/browse/BEAM-4130?focusedWorklogId=153903&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153903 ] ASF GitHub Bot logged work on BEAM-4130: Author: ASF GitHub Bot Created on: 12/Oct/18 15:01 Start Date: 12/Oct/18 15:01 Worklog Time Spent: 10m Work Description: mxm opened a new pull request #6672: [BEAM-4130] Add tests for FlinkJobServerDriver URL: https://github.com/apache/beam/pull/6672 This adds a few test cases for FlinkJobServerDriver. It also changes the default host from empty host to `localhost`. CC @angoenka @tweise Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153903) Time Spent: 11h 40m (was: 11.5h) > Portable Flink runner JobService entry point in a Docker container > -- > > Key: BEAM-4130 > URL: https://issues.apache.org/jira/browse/BEAM-4130 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Maximilian Michels >Priority: Minor > Fix For: 2.7.0 > > Time Spent: 11h 40m > Remaining Estimate: 0h > > The portable Flink runner exists as a Job Service that runs somewhere. We > need a main entry point that itself spins up the job service (and artifact > staging service). The main program itself should be packaged into an uberjar > such that it can be run locally or submitted to a Flink deployment via `flink > run`. -- This message was sent by
[jira] [Work logged] (BEAM-5708) Support caching of SDKHarness environments in flink
[ https://issues.apache.org/jira/browse/BEAM-5708?focusedWorklogId=153897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153897 ] ASF GitHub Bot logged work on BEAM-5708: Author: ASF GitHub Bot Created on: 12/Oct/18 14:27 Start Date: 12/Oct/18 14:27 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #6638: [BEAM-5708] Cache environment in portable flink runner URL: https://github.com/apache/beam/pull/6638#discussion_r224803034 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + // Schedule task to clean the container later. + // Ensure that this class is loaded in the parent Flink classloader. + getExecutor() + .schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); Review comment: Something like this (disclaimer: not tested): ``` if (environmentCacheTTLMillis > 0 && this.getClass().getClassLoader() == ExecutionEnvironment.class.getClassLoader()) ``` For execution in the job server, class loader will be same (applies for Jenkins). On the remote Flink cluster (by default), the user class loader will be different and will be removed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153897) Time Spent: 1h 20m (was: 1h 10m) > Support caching of SDKHarness environments in flink > --- > > Key: BEAM-5708 > URL: https://issues.apache.org/jira/browse/BEAM-5708 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Cache and reuse environment to improve performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5713) Flink portable runner schedules all tasks of streaming job on same task manager
[ https://issues.apache.org/jira/browse/BEAM-5713?focusedWorklogId=153877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153877 ] ASF GitHub Bot logged work on BEAM-5713: Author: ASF GitHub Bot Created on: 12/Oct/18 12:59 Start Date: 12/Oct/18 12:59 Worklog Time Spent: 10m Work Description: mxm commented on issue #6654: [BEAM-5713] Make ImpulseSourceFunction execute in parallel URL: https://github.com/apache/beam/pull/6654#issuecomment-429316980 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153877) Time Spent: 40m (was: 0.5h) > Flink portable runner schedules all tasks of streaming job on same task > manager > --- > > Key: BEAM-5713 > URL: https://issues.apache.org/jira/browse/BEAM-5713 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.8.0 >Reporter: Thomas Weise >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Attachments: Different SlotSharingGroup.png, With > RichParallelSourceFunction and parallelism 5.png, > image-2018-10-11-11-43-50-333.png, image-2018-10-11-16-20-45-221.png > > Time Spent: 40m > Remaining Estimate: 0h > > The cluster has 9 task managers and 144 task slots total. A simple streaming > pipeline with parallelism of 8 will get all tasks scheduled on the same task > manager, causing the host to be fully booked and the remaining cluster idle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153847 ] ASF GitHub Bot logged work on BEAM-5720: Author: ASF GitHub Bot Created on: 12/Oct/18 09:40 Start Date: 12/Oct/18 09:40 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #6659: [BEAM-5720] Fix encoding of large python ints in Python 3. URL: https://github.com/apache/beam/pull/6659#discussion_r224728870 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -293,8 +299,21 @@ def encode_to_stream(self, value, stream, nested): if value is None: stream.write_byte(NONE_TYPE) elif t is int: - stream.write_byte(INT_TYPE) - stream.write_var_int64(value) + # In Python 3, an int may be larger than 64 bits. + # Note that an OverflowError on stream.write_var_int64 would happen + # *after* the marker byte is written, so we must check earlier. + try: +# This may throw an overflow error when compiled. +int_value = value +# Otherwise, we must check ourselves. +if not is_compiled: + if not fits_in_64_bits(value): Review comment: Yep. Current code: ``` small_int, FastPrimitivesCoder, 1000 element(s): per element median time cost: 1.1301e-07 sec, relative std: 19.00% ``` Removing the is_compiled check ``` small_int, FastPrimitivesCoder, 1000 element(s): per element median time cost: 1.88589e-07 sec, relative std: 18.08% ``` That's over a 60% increase. I tried inlining it and using constants rather than computing the bounds each time, which helps some but the check is entirely redundant on compiled code. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153847) Time Spent: 50m (was: 40m) > Default coder breaks with large ints on Python 3 > > > Key: BEAM-5720 > URL: https://issues.apache.org/jira/browse/BEAM-5720 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The test for `int` includes greater than 64-bit values, which causes an > overflow error later in the code. We need to only use that coding scheme for > machine-sized ints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153844 ] ASF GitHub Bot logged work on BEAM-5720: Author: ASF GitHub Bot Created on: 12/Oct/18 09:35 Start Date: 12/Oct/18 09:35 Worklog Time Spent: 10m Work Description: robertwb commented on issue #6659: [BEAM-5720] Fix encoding of large python ints in Python 3. URL: https://github.com/apache/beam/pull/6659#issuecomment-429266297 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153844) Time Spent: 40m (was: 0.5h) > Default coder breaks with large ints on Python 3 > > > Key: BEAM-5720 > URL: https://issues.apache.org/jira/browse/BEAM-5720 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The test for `int` includes greater than 64-bit values, which causes an > overflow error later in the code. We need to only use that coding scheme for > machine-sized ints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4176) Java: Portable batch runner passes all ValidatesRunner tests that non-portable runner passes
[ https://issues.apache.org/jira/browse/BEAM-4176?focusedWorklogId=153840&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153840 ] ASF GitHub Bot logged work on BEAM-4176: Author: ASF GitHub Bot Created on: 12/Oct/18 09:04 Start Date: 12/Oct/18 09:04 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6592: [BEAM-4176] Enable Post Commit JAVA PVR tests for Flink URL: https://github.com/apache/beam/pull/6592#discussion_r224718474 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/ReferenceCountingFlinkExecutableStageContextFactory.java ## @@ -104,9 +108,21 @@ private void scheduleRelease(JobInfo jobInfo) { WrappedContext wrapper = getCache().get(jobInfo.jobId()); Preconditions.checkState( wrapper != null, "Releasing context for unknown job: " + jobInfo.jobId()); -// Do not release this asynchronously, as the releasing could fail due to the classloader not being -// available anymore after the tasks have been removed from the execution engine. -release(wrapper); + +PipelineOptions pipelineOptions = +PipelineOptionsTranslation.fromProto(jobInfo.pipelineOptions()); +int environmentCacheTTLMillis = + pipelineOptions.as(PortablePipelineOptions.class).getEnvironmentCacheMillis(); +if (environmentCacheTTLMillis > 0) { + // Schedule task to clean the container later. + // Ensure that this class is loaded in the parent Flink classloader. + getExecutor() + .schedule(() -> release(wrapper), environmentCacheTTLMillis, TimeUnit.MILLISECONDS); Review comment: > There is no way to make PR depend upon other PR in git so had to resort to adding 2 separate commits. Not true, you can just specify the branch of the other PR as base branch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153840) Time Spent: 33h (was: 32h 50m) > Java: Portable batch runner passes all ValidatesRunner tests that > non-portable runner passes > > > Key: BEAM-4176 > URL: https://issues.apache.org/jira/browse/BEAM-4176 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Ben Sidhom >Assignee: Ankur Goenka >Priority: Major > Attachments: 81VxNWtFtke.png, Screen Shot 2018-08-14 at 4.18.31 > PM.png, Screen Shot 2018-09-03 at 11.07.38 AM.png > > Time Spent: 33h > Remaining Estimate: 0h > > We need this as a sanity check that runner execution is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=153837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153837 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 12/Oct/18 09:03 Start Date: 12/Oct/18 09:03 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224718069 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { + private AtomicBoolean cancelled = new AtomicBoolean(false); + private AtomicLong count = new AtomicLong(); + + @Override + public void run(SourceContext> ctx) throws Exception { +while (!cancelled.get() && (messageCount == 0 || count.getAndIncrement() < messageCount)) { + ctx.collect(WindowedValue.valueInGlobalWindow(new byte[] {})); + Thread.sleep(intervalMillis); Review comment: You could also handle `InterruptedException` here since we typically want to continue processing the source, unless `cancel()` has been called. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153837) Time Spent: 1h 40m (was: 1.5h) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=153836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153836 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 12/Oct/18 09:02 Start Date: 12/Oct/18 09:02 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224717924 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { + private AtomicBoolean cancelled = new AtomicBoolean(false); + private AtomicLong count = new AtomicLong(); + + @Override + public void run(SourceContext> ctx) throws Exception { +while (!cancelled.get() && (messageCount == 0 || count.getAndIncrement() < messageCount)) { + ctx.collect(WindowedValue.valueInGlobalWindow(new byte[] {})); + Thread.sleep(intervalMillis); Review comment: Does the source have to be checkpointing? If so we should synchronize on the checkpoint lock in `ctx` and release it before sleeping. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153836) Time Spent: 1.5h (was: 1h 20m) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=153832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153832 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 12/Oct/18 09:01 Start Date: 12/Oct/18 09:01 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224716717 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { + private AtomicBoolean cancelled = new AtomicBoolean(false); + private AtomicLong count = new AtomicLong(); Review comment: Why is this an `AtomicLong`. A normal `long` should suffice because it is not accessed concurrently. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153832) Time Spent: 1h 10m (was: 1h) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5707) Add a portable Flink streaming synthetic source for testing
[ https://issues.apache.org/jira/browse/BEAM-5707?focusedWorklogId=153831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153831 ] ASF GitHub Bot logged work on BEAM-5707: Author: ASF GitHub Bot Created on: 12/Oct/18 09:01 Start Date: 12/Oct/18 09:01 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6637: [BEAM-5707] Add a periodic, streaming impulse source for Flink portable pipelines URL: https://github.com/apache/beam/pull/6637#discussion_r224717078 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java ## @@ -406,6 +417,56 @@ private void translateImpulse( context.addDataStream(Iterables.getOnlyElement(pTransform.getOutputsMap().values()), source); } + @AutoService(NativeTransforms.IsNativeTransform.class) + public static class IsFlinkNativeTransform implements NativeTransforms.IsNativeTransform { +@Override +public boolean test(RunnerApi.PTransform pTransform) { + return STREAMING_IMPULSE_TRANSFORM_URL.equals(PTransformTranslation.urnForTransformOrNull(pTransform)); +} + } + + private void translateStreamingImpulse( + String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext context) { +RunnerApi.PTransform pTransform = pipeline.getComponents().getTransformsOrThrow(id); + +ObjectMapper objectMapper = new ObjectMapper(); + +int intervalMillis; +int messageCount; +try { + JsonNode config = objectMapper.readTree(pTransform.getSpec().getPayload().toByteArray()); + intervalMillis = config.path("interval_ms").asInt(100); + messageCount = config.path("message_count").asInt(0); +} catch (IOException e) { +throw new RuntimeException("Failed to parse configuration for streaming impulse", e); +} + +DataStreamSource> source = +context +.getExecutionEnvironment() +.addSource( +new RichParallelSourceFunction>() { Review comment: This should be a top-level class with the parameters passed via a constructor. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153831) Time Spent: 1h 10m (was: 1h) > Add a portable Flink streaming synthetic source for testing > --- > > Key: BEAM-5707 > URL: https://issues.apache.org/jira/browse/BEAM-5707 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Micah Wylde >Assignee: Aljoscha Krettek >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently there are no built-in streaming sources for portable pipelines. > This makes it hard to test streaming functionality in the Python SDK. > It would be very useful to add a periodic impulse source that (with some > configurable frequency) outputs an empty byte array, which can then be > transformed as desired inside the python pipeline. More context in this > [mailing list > discussion|https://lists.apache.org/thread.html/b44a648ab1d0cb200d8bfe4b280e9dad6368209c4725609cbfbbe410@%3Cdev.beam.apache.org%3E]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)