[jira] [Work logged] (BEAM-6155) Migrate the Go SDK to the modern GCS library
[ https://issues.apache.org/jira/browse/BEAM-6155?focusedWorklogId=172911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172911 ] ASF GitHub Bot logged work on BEAM-6155: Author: ASF GitHub Bot Created on: 07/Dec/18 02:03 Start Date: 07/Dec/18 02:03 Worklog Time Spent: 10m Work Description: aaltay closed pull request #7182: [BEAM-6155] Updates the GCS library the Go SDK uses. URL: https://github.com/apache/beam/pull/7182 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go b/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go index dede7a51c1af..a59b81d93ffc 100644 --- a/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go +++ b/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go @@ -19,11 +19,11 @@ import ( "fmt" "io" + "cloud.google.com/go/storage" pb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" "github.com/apache/beam/sdks/go/pkg/beam/util/gcsx" "github.com/golang/protobuf/proto" "golang.org/x/net/context" - "google.golang.org/api/storage/v1" ) // RetrievalServer is a artifact retrieval server backed by Google @@ -41,7 +41,7 @@ func ReadProxyManifest(ctx context.Context, object string) (*pb.ProxyManifest, e return nil, fmt.Errorf("invalid manifest object %v: %v", object, err) } - cl, err := gcsx.NewClient(ctx, storage.DevstorageReadOnlyScope) + cl, err := gcsx.NewClient(ctx, storage.ScopeReadOnly) if err != nil { return nil, fmt.Errorf("failed to create GCS client: %v", err) } @@ -88,22 +88,22 @@ func (s *RetrievalServer) GetArtifact(req *pb.GetArtifactRequest, stream pb.Arti bucket, object := parseObject(blob) - client, err := gcsx.NewClient(stream.Context(), storage.DevstorageReadOnlyScope) + client, err := gcsx.NewClient(stream.Context(), storage.ScopeReadOnly) if err != nil { return fmt.Errorf("Failed to create client for %v: %v", key, err) } // Stream artifact in up to 1MB chunks. - - resp, err := client.Objects.Get(bucket, object).Download() + ctx := context.TODO() + r, err := client.Bucket(bucket).Object(object).NewReader(ctx) if err != nil { return fmt.Errorf("Failed to read object for %v: %v", key, err) } - defer resp.Body.Close() + defer r.Close() data := make([]byte, 1<<20) for { - n, err := resp.Body.Read(data) + n, err := r.Read(data) if n > 0 { if err := stream.Send({Data: data[:n]}); err != nil { return fmt.Errorf("chunk send failed: %v", err) diff --git a/sdks/go/pkg/beam/artifact/gcsproxy/staging.go b/sdks/go/pkg/beam/artifact/gcsproxy/staging.go index 51ffec6fe129..109eb29cdb43 100644 --- a/sdks/go/pkg/beam/artifact/gcsproxy/staging.go +++ b/sdks/go/pkg/beam/artifact/gcsproxy/staging.go @@ -26,11 +26,11 @@ import ( "path" "sync" + "cloud.google.com/go/storage" pb "github.com/apache/beam/sdks/go/pkg/beam/model/jobmanagement_v1" "github.com/apache/beam/sdks/go/pkg/beam/util/gcsx" "github.com/golang/protobuf/proto" "golang.org/x/net/context" - "google.golang.org/api/storage/v1" ) // StagingServer is a artifact staging server backed by Google Cloud Storage @@ -81,7 +81,7 @@ func (s *StagingServer) CommitManifest(ctx context.Context, req *pb.CommitManife return nil, fmt.Errorf("failed to marshal proxy manifest: %v", err) } - cl, err := gcsx.NewClient(ctx, storage.DevstorageReadWriteScope) + cl, err := gcsx.NewClient(ctx, storage.ScopeReadWrite) if err != nil { return nil, fmt.Errorf("failed to create GCS client: %v", err) } @@ -135,7 +135,7 @@ func (s *StagingServer) PutArtifact(ps pb.ArtifactStagingService_PutArtifactServ // Stream content to GCS. We don't have to worry about partial // or abandoned writes, because object writes are atomic. - cl, err := gcsx.NewClient(ps.Context(), storage.DevstorageReadWriteScope) + cl, err := gcsx.NewClient(ps.Context(), storage.ScopeReadWrite) if err != nil { return fmt.Errorf("failed to create GCS client: %v", err) } diff --git a/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go b/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go index 9a7b265796d7..356c718a024d 100644 --- a/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go +++ b/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go @@
[jira] [Created] (BEAM-6192) Release related scripts should be more resilient to failures
Chamikara Jayalath created BEAM-6192: Summary: Release related scripts should be more resilient to failures Key: BEAM-6192 URL: https://issues.apache.org/jira/browse/BEAM-6192 Project: Beam Issue Type: Improvement Components: project-management, testing Reporter: Chamikara Jayalath Assignee: Boyuan Zhang Currently we have a number of great release related scripts that are mentioned in Beam release guide. [https://beam.apache.org/contribute/release-guide/] These tools are great but I think given the length of these scripts, we should make them more resilient to error conditions and make them retry commands (after user action) as much as possible. For example, RC creation script might fail for a number of reasons (it failed for me due to tox not being available for example) and halt at an intermediate state (some files staged, some commits done to Github but not complete). Re-running script fails due to previous state. After this it is up to the release manager to figure out where the script failed, do cleanups, and resume the process. Also, have we considered using some release automation tooling instead of shell scripts ? [~boyuanz] assigning this to you since you developed some of these. Please feel free to re-direct. cc: [~altay] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.
[ https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172909 ] ASF GitHub Bot logged work on BEAM-5850: Author: ASF GitHub Bot Created on: 07/Dec/18 01:37 Start Date: 07/Dec/18 01:37 Worklog Time Spent: 10m Work Description: lukecwik closed pull request #6786: [BEAM-5850] Add QueueingBeamFnDataClient and make process, finish and start run on the same thread to support metrics. URL: https://github.com/apache/beam/pull/6786 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java index 748efb47005a..f53257fe07f6 100644 --- a/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java +++ b/runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/RemoteExecutionTest.java @@ -21,6 +21,7 @@ import static org.hamcrest.Matchers.containsInAnyOrder; import static org.hamcrest.Matchers.equalTo; import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertTrue; import static org.junit.Assert.fail; import com.google.common.base.Optional; @@ -278,6 +279,91 @@ public void process(ProcessContext ctxt) { } } + @Test + public void testBundleProcessorThrowsExecutionExceptionWhenUserCodeThrows() throws Exception { +Pipeline p = Pipeline.create(); +p.apply("impulse", Impulse.create()) +.apply( +"create", +ParDo.of( +new DoFn>() { + @ProcessElement + public void process(ProcessContext ctxt) throws Exception { +String element = +CoderUtils.decodeFromByteArray(StringUtf8Coder.of(), ctxt.element()); +if (element.equals("X")) { + throw new Exception("testBundleExecutionFailure"); +} +ctxt.output(KV.of(element, element)); + } +})) +.apply("gbk", GroupByKey.create()); + +RunnerApi.Pipeline pipelineProto = PipelineTranslation.toProto(p); +FusedPipeline fused = GreedyPipelineFuser.fuse(pipelineProto); +checkState(fused.getFusedStages().size() == 1, "Expected exactly one fused stage"); +ExecutableStage stage = fused.getFusedStages().iterator().next(); + +ExecutableProcessBundleDescriptor descriptor = +ProcessBundleDescriptors.fromExecutableStage( +"my_stage", stage, dataServer.getApiServiceDescriptor()); + +BundleProcessor processor = +controlClient.getProcessor( +descriptor.getProcessBundleDescriptor(), descriptor.getRemoteInputDestinations()); +Map>> outputTargets = descriptor.getOutputTargetCoders(); +Map>> outputValues = new HashMap<>(); +Map> outputReceivers = new HashMap<>(); +for (Entry>> targetCoder : outputTargets.entrySet()) { + List> outputContents = + Collections.synchronizedList(new ArrayList<>()); + outputValues.put(targetCoder.getKey(), outputContents); + outputReceivers.put( + targetCoder.getKey(), + RemoteOutputReceiver.of( + (Coder) targetCoder.getValue(), + (FnDataReceiver>) outputContents::add)); +} + +try (ActiveBundle bundle = +processor.newBundle(outputReceivers, BundleProgressHandler.ignored())) { + Iterables.getOnlyElement(bundle.getInputReceivers().values()) + .accept( + WindowedValue.valueInGlobalWindow( + CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "Y"))); +} + +try { + try (ActiveBundle bundle = + processor.newBundle(outputReceivers, BundleProgressHandler.ignored())) { +Iterables.getOnlyElement(bundle.getInputReceivers().values()) +.accept( +WindowedValue.valueInGlobalWindow( +CoderUtils.encodeToByteArray(StringUtf8Coder.of(), "X"))); + } + // Fail the test if we reach this point and never threw the exception. + fail(); +} catch (ExecutionException e) { + assertTrue(e.getMessage().contains("testBundleExecutionFailure")); +} + +try (ActiveBundle bundle = +processor.newBundle(outputReceivers, BundleProgressHandler.ignored())) { + Iterables.getOnlyElement(bundle.getInputReceivers().values()) + .accept( + WindowedValue.valueInGlobalWindow(
[jira] [Work logged] (BEAM-6155) Migrate the Go SDK to the modern GCS library
[ https://issues.apache.org/jira/browse/BEAM-6155?focusedWorklogId=172905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172905 ] ASF GitHub Bot logged work on BEAM-6155: Author: ASF GitHub Bot Created on: 07/Dec/18 00:59 Start Date: 07/Dec/18 00:59 Worklog Time Spent: 10m Work Description: aaltay commented on issue #7182: [BEAM-6155] Updates the GCS library the Go SDK uses. URL: https://github.com/apache/beam/pull/7182#issuecomment-445085546 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172905) Time Spent: 20m (was: 10m) > Migrate the Go SDK to the modern GCS library > > > Key: BEAM-6155 > URL: https://issues.apache.org/jira/browse/BEAM-6155 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > The gcsx package is using the google.golang.org/api/storage/v1 GCS library. > That library has been deprecated for ~6 months, and the recommendation is to > use the newer > [cloud.google.com/go/storage|https://godoc.org/cloud.google.com/go/storage] > package. That package supports newer features, and has built in connection > pooling, timeout support, retry with exponential backoff, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172902 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 07/Dec/18 00:27 Start Date: 07/Dec/18 00:27 Worklog Time Spent: 10m Work Description: garrettjonesgoogle commented on issue #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#issuecomment-445079897 Your high-level understanding is correct. In isolation, using a beam version property works just fine. The problem is when you want to set up another BOM that specifies the modules of Beam. In that case, it's not sustainable for that aggregated BOM to specify each module of Beam; it's much more sustainable for Beam to generate a BOM with Beam's module versions, which adds dependency declarations automatically when new modules are added to Beam, and then have the aggregated BOM just import the Beam BOM. We are trying to set up such an aggregator BOM at https://github.com/GoogleCloudPlatform/cloud-opensource-java/blob/master/boms/cloud-oss-bom/pom.xml . This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172902) Time Spent: 2h 20m (was: 2h 10m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 2h 20m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5321) Finish Python 3 porting for transforms module
[ https://issues.apache.org/jira/browse/BEAM-5321?focusedWorklogId=172901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172901 ] ASF GitHub Bot logged work on BEAM-5321: Author: ASF GitHub Bot Created on: 07/Dec/18 00:26 Start Date: 07/Dec/18 00:26 Worklog Time Spent: 10m Work Description: aaltay closed pull request #7104: [BEAM-5321] Port transforms package to Python 3 URL: https://github.com/apache/beam/pull/7104 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/transforms/ptransform_test.py b/sdks/python/apache_beam/transforms/ptransform_test.py index 95ea03f2ba28..ffb43317e221 100644 --- a/sdks/python/apache_beam/transforms/ptransform_test.py +++ b/sdks/python/apache_beam/transforms/ptransform_test.py @@ -668,7 +668,7 @@ def test_chained_ptransforms(self): def test_apply_to_list(self): self.assertCountEqual( [1, 2, 3], [0, 1, 2] | 'AddOne' >> beam.Map(lambda x: x + 1)) -self.assertItemsEqual([1], +self.assertCountEqual([1], [0, 1, 2] | 'Odd' >> beam.Filter(lambda x: x % 2)) self.assertCountEqual([1, 2, 100, 3], ([1, 2, 3], [100]) | beam.Flatten()) @@ -947,7 +947,7 @@ def process(self, element, prefix): | 'Upper' >> beam.ParDo(ToUpperCaseWithPrefix(), 'hello')) self.assertEqual("Type hint violation for 'Upper': " - "requires but got for element", + "requires {} but got {} for element".format(str, int), e.exception.args[0]) def test_do_fn_pipeline_runtime_type_check_satisfied(self): @@ -982,7 +982,7 @@ def process(self, element, num): self.p.run() self.assertEqual("Type hint violation for 'Add': " - "requires but got for element", + "requires {} but got {} for element".format(int, str), e.exception.args[0]) def test_pardo_does_not_type_check_using_type_hint_decorators(self): @@ -999,7 +999,7 @@ def int_to_str(a): | 'ToStr' >> beam.FlatMap(int_to_str)) self.assertEqual("Type hint violation for 'ToStr': " - "requires but got for a", + "requires {} but got {} for a".format(int, str), e.exception.args[0]) def test_pardo_properly_type_checks_using_type_hint_decorators(self): @@ -1031,7 +1031,7 @@ def test_pardo_does_not_type_check_using_type_hint_methods(self): .with_input_types(str).with_output_types(str))) self.assertEqual("Type hint violation for 'Upper': " - "requires but got for x", + "requires {} but got {} for x".format(str, int), e.exception.args[0]) def test_pardo_properly_type_checks_using_type_hint_methods(self): @@ -1056,7 +1056,7 @@ def test_map_does_not_type_check_using_type_hints_methods(self): .with_input_types(str).with_output_types(str)) self.assertEqual("Type hint violation for 'Upper': " - "requires but got for x", + "requires {} but got {} for x".format(str, int), e.exception.args[0]) def test_map_properly_type_checks_using_type_hints_methods(self): @@ -1082,7 +1082,7 @@ def upper(s): | 'Upper' >> beam.Map(upper)) self.assertEqual("Type hint violation for 'Upper': " - "requires but got for s", + "requires {} but got {} for s".format(str, int), e.exception.args[0]) def test_map_properly_type_checks_using_type_hints_decorator(self): @@ -1109,7 +1109,7 @@ def test_filter_does_not_type_check_using_type_hints_method(self): | 'Below 3' >> beam.Filter(lambda x: x < 3).with_input_types(int)) self.assertEqual("Type hint violation for 'Below 3': " - "requires but got for x", + "requires {} but got {} for x".format(int, str), e.exception.args[0]) def test_filter_type_checks_using_type_hints_method(self): @@ -1134,7 +1134,7 @@ def more_than_half(a): | 'Half' >> beam.Filter(more_than_half)) self.assertEqual("Type hint violation for 'Half': " - "requires but got for a", + "requires {} but got {} for a".format(float, int), e.exception.args[0]) def test_filter_type_checks_using_type_hints_decorator(self): @@ -1183,7 +1183,7 @@ def test_group_by_key_only_does_not_type_check(self): self.assertEqual("Input type hint violation at F: " "expected
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172899 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 07/Dec/18 00:24 Start Date: 07/Dec/18 00:24 Worklog Time Spent: 10m Work Description: garrettjonesgoogle commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239662361 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name)) { +evaluationDependsOn(projName) + } +} + +// Copy our pom.xml to the location where a generated POM would go +task copyPom() { + doLast { +def moduleNames = new ArrayList<>() +for (String projName : rootProject.ext.allProjectNames) { + def subproject = project(projName) + if (subproject.ext.properties.containsKey('includeInJavaBom') && + subproject.ext.properties.includeInJavaBom) { +moduleNames.add(subproject.name) + } +} + +new File(mavenJavaDir).mkdirs() +copy { + from 'pom.xml.template' + into mavenJavaDir + rename 'pom.xml.template', 'pom-default.xml' + expand(version: project.version, modules: moduleNames) +} + } +} + +assemble.dependsOn copyPom + +// We want to use our own pom.xml instead of the generated one, so we disable +// the pom.xml generation and have the publish tasks depend on `copyPom` instead. +tasks.whenTaskAdded { task -> + if (task.name == 'generatePomFileForMavenJavaPublication') { +task.enabled = false + } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') { +task.dependsOn copyPom + } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') { +task.dependsOn copyPom + } +} + +jar.enabled = false + +// Remove the default jar archive which is added by the 'java' plugin. +configurations.archives.artifacts.with { archives -> + def artifacts = [] + archives.each { +if (it.file =~ 'jar') { + // We can't just call `archives.remove(it)` here because it triggers + // a `ConcurrentModificationException`, so we add matching artifacts + // to another list, then remove those elements outside of this iteration. + artifacts.add(it) +} + } + artifacts.each { +archives.remove(it) + } +} + +artifacts { + archives(mavenJavaBomOutputFile) { +builtBy copyPom + } +} + +afterEvaluate { + // We can't use the `publishing` section from applyJavaNature because + // we don't want all the Java artifacts, and we want to use our own pom.xml + // instead of the generated one. + publishing { +publications { + mavenJava(MavenPublication) { Review comment: I think it should become the recommended method. I was thinking that the archetypes could be modified in a following PR. (FYI I tried it out on a project generated from an archetype locally and it worked great.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172899) Time Spent: 2h 10m (was: 2h) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172897 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 07/Dec/18 00:22 Start Date: 07/Dec/18 00:22 Worklog Time Spent: 10m Work Description: garrettjonesgoogle commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239662081 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name)) { +evaluationDependsOn(projName) + } +} + +// Copy our pom.xml to the location where a generated POM would go +task copyPom() { + doLast { +def moduleNames = new ArrayList<>() +for (String projName : rootProject.ext.allProjectNames) { + def subproject = project(projName) + if (subproject.ext.properties.containsKey('includeInJavaBom') && + subproject.ext.properties.includeInJavaBom) { +moduleNames.add(subproject.name) + } +} + +new File(mavenJavaDir).mkdirs() +copy { + from 'pom.xml.template' + into mavenJavaDir + rename 'pom.xml.template', 'pom-default.xml' + expand(version: project.version, modules: moduleNames) +} + } +} + +assemble.dependsOn copyPom + +// We want to use our own pom.xml instead of the generated one, so we disable +// the pom.xml generation and have the publish tasks depend on `copyPom` instead. +tasks.whenTaskAdded { task -> + if (task.name == 'generatePomFileForMavenJavaPublication') { +task.enabled = false + } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') { +task.dependsOn copyPom + } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') { +task.dependsOn copyPom + } +} + +jar.enabled = false + +// Remove the default jar archive which is added by the 'java' plugin. +configurations.archives.artifacts.with { archives -> Review comment: It is not sufficient unfortunately (as I discovered when implementing bom support in gax-java). If I remember correctly, the problem is that there is something else in the build process that still expects the jar archive, so when you only set `jar.enabled = false`, then you get an error. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172897) Time Spent: 1h 50m (was: 1h 40m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 1h 50m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172898 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 07/Dec/18 00:22 Start Date: 07/Dec/18 00:22 Worklog Time Spent: 10m Work Description: garrettjonesgoogle commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239662155 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name)) { +evaluationDependsOn(projName) + } +} + +// Copy our pom.xml to the location where a generated POM would go +task copyPom() { + doLast { +def moduleNames = new ArrayList<>() +for (String projName : rootProject.ext.allProjectNames) { + def subproject = project(projName) + if (subproject.ext.properties.containsKey('includeInJavaBom') && + subproject.ext.properties.includeInJavaBom) { +moduleNames.add(subproject.name) + } +} + +new File(mavenJavaDir).mkdirs() +copy { + from 'pom.xml.template' + into mavenJavaDir + rename 'pom.xml.template', 'pom-default.xml' + expand(version: project.version, modules: moduleNames) +} + } +} + +assemble.dependsOn copyPom + +// We want to use our own pom.xml instead of the generated one, so we disable +// the pom.xml generation and have the publish tasks depend on `copyPom` instead. +tasks.whenTaskAdded { task -> + if (task.name == 'generatePomFileForMavenJavaPublication') { +task.enabled = false + } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') { +task.dependsOn copyPom + } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') { +task.dependsOn copyPom + } +} + +jar.enabled = false + +// Remove the default jar archive which is added by the 'java' plugin. +configurations.archives.artifacts.with { archives -> + def artifacts = [] + archives.each { +if (it.file =~ 'jar') { + // We can't just call `archives.remove(it)` here because it triggers + // a `ConcurrentModificationException`, so we add matching artifacts + // to another list, then remove those elements outside of this iteration. + artifacts.add(it) +} + } + artifacts.each { +archives.remove(it) + } +} + +artifacts { + archives(mavenJavaBomOutputFile) { +builtBy copyPom + } +} + +afterEvaluate { + // We can't use the `publishing` section from applyJavaNature because Review comment: I played around with that a little but kept hitting evaluation ordering problems. I can try to take another crack at it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172898) Time Spent: 2h (was: 1h 50m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 2h > Remaining Estimate: 0h > > Add a module to Beam which generates
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172896 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 07/Dec/18 00:20 Start Date: 07/Dec/18 00:20 Worklog Time Spent: 10m Work Description: garrettjonesgoogle commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239661750 ## File path: runners/samza/build.gradle ## @@ -19,7 +19,7 @@ import groovy.json.JsonOutput apply plugin: org.apache.beam.gradle.BeamModulePlugin -applyJavaNature() +applyJavaNature(exportJavadoc: false) Review comment: That was my thought process. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172896) Time Spent: 1h 40m (was: 1.5h) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 1h 40m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172895 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 07/Dec/18 00:19 Start Date: 07/Dec/18 00:19 Worklog Time Spent: 10m Work Description: garrettjonesgoogle commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239661705 ## File path: sdks/java/javadoc/build.gradle ## @@ -26,64 +26,33 @@ description = "Apache Beam :: SDKs :: Java :: Aggregated Javadoc" apply plugin: 'java' -def exportedJavadocProjects = [ - ':beam-runners-apex', - ':beam-runners-core-construction-java', - ':beam-runners-core-java', - ':beam-runners-java-fn-execution', - ':beam-runners-local-java-core', - ':beam-runners-direct-java', - ':beam-runners-reference-java', - ':beam-runners-flink_2.11', - ':beam-runners-google-cloud-dataflow-java', - ':beam-runners-spark', - ':beam-runners-gearpump', - ':beam-sdks-java-core', - ':beam-sdks-java-fn-execution', - ':beam-sdks-java-extensions-google-cloud-platform-core', - ':beam-sdks-java-extensions-join-library', - ':beam-sdks-java-extensions-json-jackson', - ':beam-sdks-java-extensions-protobuf', - ':beam-sdks-java-extensions-sketching', - ':beam-sdks-java-extensions-sorter', - ':beam-sdks-java-extensions-sql', - ':beam-sdks-java-harness', - ':beam-sdks-java-io-amazon-web-services', - ':beam-sdks-java-io-amqp', - ':beam-sdks-java-io-cassandra', - ':beam-sdks-java-io-elasticsearch', - ':beam-sdks-java-io-elasticsearch-tests-2', - ':beam-sdks-java-io-elasticsearch-tests-5', - ':beam-sdks-java-io-elasticsearch-tests-6', - ':beam-sdks-java-io-elasticsearch-tests-common', - ':beam-sdks-java-io-google-cloud-platform', - ':beam-sdks-java-io-hadoop-common', - ':beam-sdks-java-io-hadoop-file-system', - ':beam-sdks-java-io-hadoop-input-format', - ':beam-sdks-java-io-hbase', - ':beam-sdks-java-io-hcatalog', - ':beam-sdks-java-io-jdbc', - ':beam-sdks-java-io-jms', - ':beam-sdks-java-io-kafka', - ':beam-sdks-java-io-kinesis', - ':beam-sdks-java-io-mongodb', - ':beam-sdks-java-io-mqtt', - ':beam-sdks-java-io-parquet', - ':beam-sdks-java-io-redis', - ':beam-sdks-java-io-solr', - ':beam-sdks-java-io-tika', - ':beam-sdks-java-io-xml', -] +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name) && !projName.equals(':beam-sdks-java-bom')) { +evaluationDependsOn(projName) Review comment: The opposite of your first sentence is what is happening - this project depends on every other project. But anyway, the rest of what you say seems true, you can encounter cycles. I actually did encounter a cycle because `beam-sdks-java-bom` also uses `evaluationDependsOn` for all modules. When there is a cycle, you just need to determine the order of precedence. My decision was all modules -> javadoc -> bom. That is why if you look down further in this file, you will see that `:beam-sdks-java-bom` is skipped. Thus, you can never have javadoc for the bom, but the bom can include the javadoc module. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172895) Time Spent: 1.5h (was: 1h 20m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172891 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 07/Dec/18 00:07 Start Date: 07/Dec/18 00:07 Worklog Time Spent: 10m Work Description: swegner closed pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java index 2d840e3f4356..28f239784bf7 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java @@ -263,6 +263,7 @@ private synchronized WorkItemStatus createStatusUpdate(boolean isFinal) { return status; } + // todo(migryz) this method should return List instead of updating member variable @VisibleForTesting synchronized void populateCounterUpdates(WorkItemStatus status) { if (worker == null) { @@ -270,13 +271,18 @@ synchronized void populateCounterUpdates(WorkItemStatus status) { } boolean isFinalUpdate = Boolean.TRUE.equals(status.getCompleted()); -ImmutableList.Builder counterUpdatesBuilder = ImmutableList.builder(); -counterUpdatesBuilder.addAll(extractCounters(worker.getOutputCounters())); -counterUpdatesBuilder.addAll(extractMetrics(isFinalUpdate)); -counterUpdatesBuilder.addAll(extractMsecCounters(isFinalUpdate)); -counterUpdatesBuilder.addAll(worker.extractMetricUpdates()); -ImmutableList counterUpdates = counterUpdatesBuilder.build(); +ImmutableList.Builder counterUpdatesListBuilder = ImmutableList.builder(); +// Output counters + counterUpdatesListBuilder.addAll(extractCounters(worker.getOutputCounters())); +// User metrics reported in Worker +counterUpdatesListBuilder.addAll(extractMetrics(isFinalUpdate)); +// MSec counters reported in worker +counterUpdatesListBuilder.addAll(extractMsecCounters(isFinalUpdate)); +// Metrics reported in SDK runner. +counterUpdatesListBuilder.addAll(worker.extractMetricUpdates()); + +ImmutableList counterUpdates = counterUpdatesListBuilder.build(); status.setCounterUpdates(counterUpdates); } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java index bc0fb54d9521..6c1d43f952f9 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java @@ -17,20 +17,25 @@ */ package org.apache.beam.runners.dataflow.worker.fn.control; +import com.google.api.services.dataflow.model.CounterMetadata; +import com.google.api.services.dataflow.model.CounterStructuredName; +import com.google.api.services.dataflow.model.CounterStructuredNameAndMetadata; import com.google.api.services.dataflow.model.CounterUpdate; import com.google.common.annotations.VisibleForTesting; import com.google.common.base.Preconditions; -import com.google.common.collect.FluentIterable; import com.google.common.collect.Iterables; import io.opencensus.common.Scope; import io.opencensus.trace.SpanBuilder; import io.opencensus.trace.Tracer; import io.opencensus.trace.Tracing; +import java.util.ArrayList; import java.util.Arrays; +import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.NoSuchElementException; +import java.util.Objects; import java.util.concurrent.CancellationException; import java.util.concurrent.ExecutionException; import java.util.concurrent.Executors; @@ -39,20 +44,27 @@ import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicReference; import java.util.function.Consumer; +import java.util.stream.Collectors; +import java.util.stream.StreamSupport; import javax.annotation.Nullable; import
[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.
[ https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172892 ] ASF GitHub Bot logged work on BEAM-5850: Author: ASF GitHub Bot Created on: 07/Dec/18 00:08 Start Date: 07/Dec/18 00:08 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6786: [BEAM-5850] Add QueueingBeamFnDataClient and make process, finish and start run on the same thread to support metrics. URL: https://github.com/apache/beam/pull/6786#issuecomment-445076183 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172892) Time Spent: 16h (was: 15h 50m) > Make process, finish and start run on the same thread to support metrics. > - > > Key: BEAM-5850 > URL: https://issues.apache.org/jira/browse/BEAM-5850 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > > Update BeamFnDataReceiver to place elements into a Queue and consumer then > and call the element processing receiver in blockTillReadFinishes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6172) Flink metrics are not generated in standard format
[ https://issues.apache.org/jira/browse/BEAM-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712160#comment-16712160 ] Micah Wylde commented on BEAM-6172: --- In addition to those two changes, there's an extra "group" in the name. For example, a group-specific native flink metric (as output from the slf4j reporter with default configs) looks like {code} 10.100.208.242.taskmanager.4f13adf64e7315b7198911465ca44119.BeamApp-mwylde-1206234804-16211aec.group.0.numRecordsIn: 41 {code} while a beam one looks like {code} 10.100.208.242.taskmanager.4f13adf64e7315b7198911465ca44119.BeamApp-mwylde-1206234804-16211aec.group.0.__counter__group__org.apache.beam.runners.core.ReduceFnRunner__droppedDueToClosedWindow: 41 {code} I think ideally the beam metric would be {code} 10.100.208.242.taskmanager.4f13adf64e7315b7198911465ca44119.BeamApp-mwylde-1206234804-16211aec.group.0.org.apache.beam.runners.core.ReduceFnRunner.droppedDueToClosedWindow: 41 {code} Our high-level goal is to be able to use the same metric parsing logic as we use for existing flink metrics (both internal and application-specific) and get reasonable results back. > Flink metrics are not generated in standard format > -- > > Key: BEAM-6172 > URL: https://issues.apache.org/jira/browse/BEAM-6172 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 2.8.0 >Reporter: Micah Wylde >Assignee: Maximilian Michels >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > The metrics that the flink runner exports do not follow the standard format > used by Flink, and doesn't respect Flink metric configuration options. > For example (with the default metrics configuration) beam produces a metric: > {code} > 10-100-209-71.taskmanager.0f29b420b63fea58f6f321bc0cbf45f3.BeamApp-mwylde-1203224439-a7d8fdf6.group.0.__counter__group__org-apache-beam-runners-core-ReduceFnRunner__droppedDueToClosedWindow > {code} > whereas a native Flink metric looks like: > {code} > 10-100-209-71.taskmanager.0f29b420b63fea58f6f321bc0cbf45f3.BeamApp-mwylde-1203224439-a7d8fdf6.Source-Custom-Source-7Kinesis-None-beam-env-docker-v1-0-ToKeyedWorkItem.0.numRecordsOut > {code} > In particular, Beam should respect the > [metric.scope.delimiter|https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/config.html#metrics-scope-delimiter] > configuration for separating components of a metric (currently it uses > "__"), and should not include the type of metric (counter, gauge, etc.). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3660) Port ReadSpannerSchemaTest off DoFnTester
[ https://issues.apache.org/jira/browse/BEAM-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712155#comment-16712155 ] Kenneth Knowles commented on BEAM-3660: --- I've added you to the Contributors role so you can assign tickets to yourself now. > Port ReadSpannerSchemaTest off DoFnTester > - > > Key: BEAM-3660 > URL: https://issues.apache.org/jira/browse/BEAM-3660 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Kenneth Knowles >Assignee: Evgeniy Musin >Priority: Major > Labels: beginner, newbie, starter > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3660) Port ReadSpannerSchemaTest off DoFnTester
[ https://issues.apache.org/jira/browse/BEAM-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-3660: - Assignee: Evgeniy Musin > Port ReadSpannerSchemaTest off DoFnTester > - > > Key: BEAM-3660 > URL: https://issues.apache.org/jira/browse/BEAM-3660 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Kenneth Knowles >Assignee: Evgeniy Musin >Priority: Major > Labels: beginner, newbie, starter > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.
[ https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172887 ] ASF GitHub Bot logged work on BEAM-5850: Author: ASF GitHub Bot Created on: 06/Dec/18 23:36 Start Date: 06/Dec/18 23:36 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6786: [BEAM-5850] Add QueueingBeamFnDataClient and make process, finish and start run on the same thread to support metrics. URL: https://github.com/apache/beam/pull/6786#issuecomment-445069759 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172887) Time Spent: 15h 40m (was: 15.5h) > Make process, finish and start run on the same thread to support metrics. > - > > Key: BEAM-5850 > URL: https://issues.apache.org/jira/browse/BEAM-5850 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 15h 40m > Remaining Estimate: 0h > > Update BeamFnDataReceiver to place elements into a Queue and consumer then > and call the element processing receiver in blockTillReadFinishes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.
[ https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172888 ] ASF GitHub Bot logged work on BEAM-5850: Author: ASF GitHub Bot Created on: 06/Dec/18 23:37 Start Date: 06/Dec/18 23:37 Worklog Time Spent: 10m Work Description: Ardagan edited a comment on issue #6786: [BEAM-5850] Add QueueingBeamFnDataClient and make process, finish and start run on the same thread to support metrics. URL: https://github.com/apache/beam/pull/6786#issuecomment-445069662 Failed precommit: https://builds.apache.org/job/beam_PreCommit_Java_Phrase/468/ This one seem to be flake. Will trigger another one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172888) Time Spent: 15h 50m (was: 15h 40m) > Make process, finish and start run on the same thread to support metrics. > - > > Key: BEAM-5850 > URL: https://issues.apache.org/jira/browse/BEAM-5850 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 15h 50m > Remaining Estimate: 0h > > Update BeamFnDataReceiver to place elements into a Queue and consumer then > and call the element processing receiver in blockTillReadFinishes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.
[ https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172886 ] ASF GitHub Bot logged work on BEAM-5850: Author: ASF GitHub Bot Created on: 06/Dec/18 23:36 Start Date: 06/Dec/18 23:36 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #6786: [BEAM-5850] Add QueueingBeamFnDataClient and make process, finish and start run on the same thread to support metrics. URL: https://github.com/apache/beam/pull/6786#issuecomment-445069662 Failed precommit: https://builds.apache.org/job/beam_PreCommit_Java_Phrase/468/ Will trigger another one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172886) Time Spent: 15.5h (was: 15h 20m) > Make process, finish and start run on the same thread to support metrics. > - > > Key: BEAM-5850 > URL: https://issues.apache.org/jira/browse/BEAM-5850 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 15.5h > Remaining Estimate: 0h > > Update BeamFnDataReceiver to place elements into a Queue and consumer then > and call the element processing receiver in blockTillReadFinishes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172885 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 23:34 Start Date: 06/Dec/18 23:34 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#issuecomment-445069349 Thank you Scott. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172885) Time Spent: 6h 20m (was: 6h 10m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172884 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 23:34 Start Date: 06/Dec/18 23:34 Worklog Time Spent: 10m Work Description: swegner commented on issue #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#issuecomment-445069348 Pre-commits before my fixup are green: ![image](https://user-images.githubusercontent.com/674021/49618183-6456e280-f96c-11e8-937c-9e5f41604c88.png) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172884) Time Spent: 6h 10m (was: 6h) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172882 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 23:33 Start Date: 06/Dec/18 23:33 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239653685 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -423,32 +420,28 @@ private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInf String type = monitoringInfo.getType(); -// todomigryz: run MonitoringInfo through validation process. -// refer to https://github.com/apache/beam/pull/6799 - +// todo(migryz): run MonitoringInfo through Proto validation process. +// Requires https://github.com/apache/beam/pull/6799 to be merged. if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) { if (!type.equals("beam:metrics:sum_int_64")) { -LOG.warn( +throw new RuntimeException( "Ignoring user-counter MonitoringInfo with unexpected type." Review comment: I pushed a commit with this fixup. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172882) Time Spent: 5h 50m (was: 5h 40m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172883 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 23:33 Start Date: 06/Dec/18 23:33 Worklog Time Spent: 10m Work Description: swegner commented on issue #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#issuecomment-445069206 LGTM, will squash and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172883) Time Spent: 6h (was: 5h 50m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172881 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 23:29 Start Date: 06/Dec/18 23:29 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239652977 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -423,32 +420,28 @@ private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInf String type = monitoringInfo.getType(); -// todomigryz: run MonitoringInfo through validation process. -// refer to https://github.com/apache/beam/pull/6799 - +// todo(migryz): run MonitoringInfo through Proto validation process. +// Requires https://github.com/apache/beam/pull/6799 to be merged. if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) { if (!type.equals("beam:metrics:sum_int_64")) { -LOG.warn( +throw new RuntimeException( "Ignoring user-counter MonitoringInfo with unexpected type." Review comment: These error messages don't quite make sense for throwing exceptions. We're no longer ignoring them, we're failing execution. Suggestion: ```java throw new RuntimeException("Encountered MonitoringInfo with unexpected type. Expected [..]") ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172881) Time Spent: 5h 40m (was: 5.5h) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6191) Redundant error messages for failures in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6191?focusedWorklogId=172880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172880 ] ASF GitHub Bot logged work on BEAM-6191: Author: ASF GitHub Bot Created on: 06/Dec/18 23:25 Start Date: 06/Dec/18 23:25 Worklog Time Spent: 10m Work Description: swegner commented on issue #7220: [BEAM-6191] Remove redundant error logging for Dataflow exception handling URL: https://github.com/apache/beam/pull/7220#issuecomment-445067501 R: @ajamato @foegler This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172880) Time Spent: 20m (was: 10m) > Redundant error messages for failures in Dataflow runner > > > Key: BEAM-6191 > URL: https://issues.apache.org/jira/browse/BEAM-6191 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > The Dataflow runner harness has redundant error logging from a couple > different components, which creates log spam and confusion when failures do > occur. We should dedupe redundant logs. > From a typical user-code exception, we see at least 3 error logs from the > worker: > http://screen/QZxsJOVnvt6 > "Aborting operations" > "Uncaught exception occurred during work unit execution. This will be > retried." > "Failure processing work item" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6191) Redundant error messages for failures in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-6191?focusedWorklogId=172879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172879 ] ASF GitHub Bot logged work on BEAM-6191: Author: ASF GitHub Bot Created on: 06/Dec/18 23:19 Start Date: 06/Dec/18 23:19 Worklog Time Spent: 10m Work Description: swegner opened a new pull request #7220: [BEAM-6191] Remove redundant error logging for Dataflow exception handling URL: https://github.com/apache/beam/pull/7220 The Dataflow runner harness has redundant error logging from a couple different components, which creates log spam and confusion when failures do occur. This change deduplicates some of the redundant logs. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172879)
[jira] [Work logged] (BEAM-5321) Finish Python 3 porting for transforms module
[ https://issues.apache.org/jira/browse/BEAM-5321?focusedWorklogId=172877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172877 ] ASF GitHub Bot logged work on BEAM-5321: Author: ASF GitHub Bot Created on: 06/Dec/18 23:17 Start Date: 06/Dec/18 23:17 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7104: [BEAM-5321] Port transforms package to Python 3 URL: https://github.com/apache/beam/pull/7104#issuecomment-445065635 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172877) Time Spent: 2h 50m (was: 2h 40m) > Finish Python 3 porting for transforms module > - > > Key: BEAM-5321 > URL: https://issues.apache.org/jira/browse/BEAM-5321 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6191) Redundant error messages for failures in Dataflow runner
Scott Wegner created BEAM-6191: -- Summary: Redundant error messages for failures in Dataflow runner Key: BEAM-6191 URL: https://issues.apache.org/jira/browse/BEAM-6191 Project: Beam Issue Type: New Feature Components: runner-dataflow Reporter: Scott Wegner Assignee: Scott Wegner The Dataflow runner harness has redundant error logging from a couple different components, which creates log spam and confusion when failures do occur. We should dedupe redundant logs. >From a typical user-code exception, we see at least 3 error logs from the >worker: http://screen/QZxsJOVnvt6 "Aborting operations" "Uncaught exception occurred during work unit execution. This will be retried." "Failure processing work item" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (BEAM-4150) Standardize use of PCollection coder proto attribute
[ https://issues.apache.org/jira/browse/BEAM-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw reopened BEAM-4150: --- We need to update the SDKs and workers (and remove fallback code). > Standardize use of PCollection coder proto attribute > > > Key: BEAM-4150 > URL: https://issues.apache.org/jira/browse/BEAM-4150 > Project: Beam > Issue Type: Task > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Kenneth Knowles >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In some places it's expected to be a WindowedCoder, in others the raw > ElementCoder. We should use the same convention (decided in discussion to be > the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the > attached windowing strategy, and the input/output ports should specify the > encoding directly rather than read the adjacent PCollection coder fields. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3660) Port ReadSpannerSchemaTest off DoFnTester
[ https://issues.apache.org/jira/browse/BEAM-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712125#comment-16712125 ] Evgeniy Musin commented on BEAM-3660: - Hello! Can I get this ticket? > Port ReadSpannerSchemaTest off DoFnTester > - > > Key: BEAM-3660 > URL: https://issues.apache.org/jira/browse/BEAM-3660 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Kenneth Knowles >Priority: Major > Labels: beginner, newbie, starter > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.
[ https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172874 ] ASF GitHub Bot logged work on BEAM-6120: Author: ASF GitHub Bot Created on: 06/Dec/18 22:50 Start Date: 06/Dec/18 22:50 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #7127: [BEAM-6120] Support retrieval of large gbk iterables over the state API. URL: https://github.com/apache/beam/pull/7127#discussion_r239644468 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -690,6 +720,35 @@ def decode_from_stream(self, in_stream, nested): elements.append(self._elem_coder.decode_from_stream(in_stream, True)) count = in_stream.read_var_int64() + if count == -1: +if self._read_state is None: + raise ValueError( + 'Cannot read state-written iterable without state reader.') + +class FullIterable(object): + def __init__(self, head, tail): +self._head = head +self._tail = tail + + def __iter__(self): +for elem in self._head: + yield elem +for elem in self._tail: + yield elem + + def __eq__(self, other): +return list(self) == list(other) + + def __hash__(self): +raise NotImplementedError + + def __reduce__(self): +return list, (list(self),) + +state_token = in_stream.read_all(True) Review comment: Yes, that's what the `read_all([nested=]True)` does. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172874) Time Spent: 3h 20m (was: 3h 10m) > Support retrieval of large gbk iterables over the state API. > > > Key: BEAM-6120 > URL: https://issues.apache.org/jira/browse/BEAM-6120 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.
[ https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172873 ] ASF GitHub Bot logged work on BEAM-6120: Author: ASF GitHub Bot Created on: 06/Dec/18 22:48 Start Date: 06/Dec/18 22:48 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #7127: [BEAM-6120] Support retrieval of large gbk iterables over the state API. URL: https://github.com/apache/beam/pull/7127#discussion_r239643988 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -637,20 +637,33 @@ class SequenceCoderImpl(StreamCoderImpl): countX element(0) element(1) ... element(countX - 1) 0 + If writing to state is enabled, the final terminating 0 will instead be + repaced with:: + +-1 +len(state_token) Review comment: I would rather keep other negative values for possible future expansion. 9 bytes is inconsequential once this code is hit. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172873) Time Spent: 3h 10m (was: 3h) > Support retrieval of large gbk iterables over the state API. > > > Key: BEAM-6120 > URL: https://issues.apache.org/jira/browse/BEAM-6120 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172867 ] ASF GitHub Bot logged work on BEAM-5514: Author: ASF GitHub Bot Created on: 06/Dec/18 21:55 Start Date: 06/Dec/18 21:55 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly URL: https://github.com/apache/beam/pull/7189#discussion_r239629510 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java ## @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String datasetId) try { return insert.execute().getInsertErrors(); } catch (IOException e) { - if (new ApiErrorExtractor().rateLimited(e)) { + if (ApiErrorExtractor.INSTANCE.rateLimited(e)) { LOG.info("BigQuery insertAll exceeded rate limit, retrying"); -try { - sleeper.sleep(backoff1.nextBackOffMillis()); -} catch (InterruptedException interrupted) { - throw new IOException( - "Interrupted while waiting before retrying insertAll"); -} + } else if (ApiErrorExtractor.INSTANCE + .getErrorMessage(e) + .startsWith("Quota exceeded")) { Review comment: > My biggest concern is that parsing text out of the error message seems very brittle. Rather than risk problems there, seems better to just retry all es here. ApiErrorExtractor doesn't have a method for determining if the given exception indicates 'quota exceeded'. https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java Is there any other class we can safely extract such information? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172867) Time Spent: 2h 50m (was: 2h 40m) > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Heejong Lee >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172861 ] ASF GitHub Bot logged work on BEAM-5514: Author: ASF GitHub Bot Created on: 06/Dec/18 21:40 Start Date: 06/Dec/18 21:40 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly URL: https://github.com/apache/beam/pull/7189#discussion_r239624631 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java ## @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String datasetId) try { return insert.execute().getInsertErrors(); } catch (IOException e) { - if (new ApiErrorExtractor().rateLimited(e)) { + if (ApiErrorExtractor.INSTANCE.rateLimited(e)) { LOG.info("BigQuery insertAll exceeded rate limit, retrying"); -try { - sleeper.sleep(backoff1.nextBackOffMillis()); -} catch (InterruptedException interrupted) { - throw new IOException( - "Interrupted while waiting before retrying insertAll"); -} + } else if (ApiErrorExtractor.INSTANCE + .getErrorMessage(e) + .startsWith("Quota exceeded")) { Review comment: @rangadi should know the exact place. I think we inject a 10 second wait before re-trying a failed workitem for Dataflow streaming. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172861) Time Spent: 2h 40m (was: 2.5h) > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Heejong Lee >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172856 ] ASF GitHub Bot logged work on BEAM-5514: Author: ASF GitHub Bot Created on: 06/Dec/18 21:33 Start Date: 06/Dec/18 21:33 Worklog Time Spent: 10m Work Description: reuvenlax commented on a change in pull request #7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly URL: https://github.com/apache/beam/pull/7189#discussion_r239622676 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java ## @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String datasetId) try { return insert.execute().getInsertErrors(); } catch (IOException e) { - if (new ApiErrorExtractor().rateLimited(e)) { + if (ApiErrorExtractor.INSTANCE.rateLimited(e)) { LOG.info("BigQuery insertAll exceeded rate limit, retrying"); -try { - sleeper.sleep(backoff1.nextBackOffMillis()); -} catch (InterruptedException interrupted) { - throw new IOException( - "Interrupted while waiting before retrying insertAll"); -} + } else if (ApiErrorExtractor.INSTANCE + .getErrorMessage(e) + .startsWith("Quota exceeded")) { Review comment: Where do you get a 10-second wait for failed workitem? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172856) Time Spent: 2.5h (was: 2h 20m) > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Heejong Lee >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172840 ] ASF GitHub Bot logged work on BEAM-5514: Author: ASF GitHub Bot Created on: 06/Dec/18 21:21 Start Date: 06/Dec/18 21:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly URL: https://github.com/apache/beam/pull/7189#discussion_r239618701 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java ## @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String datasetId) try { return insert.execute().getInsertErrors(); } catch (IOException e) { - if (new ApiErrorExtractor().rateLimited(e)) { + if (ApiErrorExtractor.INSTANCE.rateLimited(e)) { LOG.info("BigQuery insertAll exceeded rate limit, retrying"); -try { - sleeper.sleep(backoff1.nextBackOffMillis()); -} catch (InterruptedException interrupted) { - throw new IOException( - "Interrupted while waiting before retrying insertAll"); -} + } else if (ApiErrorExtractor.INSTANCE + .getErrorMessage(e) + .startsWith("Quota exceeded")) { Review comment: Error code rateLimitExceeded is well documented and probably it will be less brittle to check for that instead of text "Quota exceeded". https://cloud.google.com/bigquery/troubleshooting-errors Also, I think the main concern so far has been Beam sending large number of messages to BQ even after BQ service raises quota exceeded errors. I think this will be somewhat exacerbated by introducing exponential backoff here (more messages before the 10 second failed workitem wait) so this has to be combined with a solution where we perform exponential backoff across all BQ streaming write threads started by a given workitme (which can be a separate PR). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172840) Time Spent: 2h 10m (was: 2h) > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Heejong Lee >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172842 ] ASF GitHub Bot logged work on BEAM-5514: Author: ASF GitHub Bot Created on: 06/Dec/18 21:22 Start Date: 06/Dec/18 21:22 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly URL: https://github.com/apache/beam/pull/7189#discussion_r239618917 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java ## @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String datasetId) try { return insert.execute().getInsertErrors(); } catch (IOException e) { - if (new ApiErrorExtractor().rateLimited(e)) { + if (ApiErrorExtractor.INSTANCE.rateLimited(e)) { LOG.info("BigQuery insertAll exceeded rate limit, retrying"); -try { - sleeper.sleep(backoff1.nextBackOffMillis()); -} catch (InterruptedException interrupted) { - throw new IOException( - "Interrupted while waiting before retrying insertAll"); -} + } else if (ApiErrorExtractor.INSTANCE + .getErrorMessage(e) + .startsWith("Quota exceeded")) { Review comment: Sorry I meant error code quotaExceeded. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172842) Time Spent: 2h 20m (was: 2h 10m) > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Heejong Lee >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172830 ] ASF GitHub Bot logged work on BEAM-5514: Author: ASF GitHub Bot Created on: 06/Dec/18 20:57 Start Date: 06/Dec/18 20:57 Worklog Time Spent: 10m Work Description: reuvenlax commented on a change in pull request #7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly URL: https://github.com/apache/beam/pull/7189#discussion_r239611514 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java ## @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String datasetId) try { return insert.execute().getInsertErrors(); } catch (IOException e) { - if (new ApiErrorExtractor().rateLimited(e)) { + if (ApiErrorExtractor.INSTANCE.rateLimited(e)) { LOG.info("BigQuery insertAll exceeded rate limit, retrying"); -try { - sleeper.sleep(backoff1.nextBackOffMillis()); -} catch (InterruptedException interrupted) { - throw new IOException( - "Interrupted while waiting before retrying insertAll"); -} + } else if (ApiErrorExtractor.INSTANCE + .getErrorMessage(e) + .startsWith("Quota exceeded")) { Review comment: Failing is very expensive. If we throw an exception, not only this record, but all other records in the bundle will be retried. Realistically those other errors will be retried anyway (as the whole work item will be retried). My biggest concern is that parsing text out of the error message seems very brittle. Rather than risk problems there, seems better to just retry all es here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172830) Time Spent: 2h (was: 1h 50m) > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Heejong Lee >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.
[ https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172831 ] ASF GitHub Bot logged work on BEAM-5850: Author: ASF GitHub Bot Created on: 06/Dec/18 20:59 Start Date: 06/Dec/18 20:59 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6786: [BEAM-5850] Add QueueingBeamFnDataClient and make process, finish and start run on the same thread to support metrics. URL: https://github.com/apache/beam/pull/6786#issuecomment-445027620 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172831) Time Spent: 15h 20m (was: 15h 10m) > Make process, finish and start run on the same thread to support metrics. > - > > Key: BEAM-5850 > URL: https://issues.apache.org/jira/browse/BEAM-5850 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 15h 20m > Remaining Estimate: 0h > > Update BeamFnDataReceiver to place elements into a Queue and consumer then > and call the element processing receiver in blockTillReadFinishes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?focusedWorklogId=172828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172828 ] ASF GitHub Bot logged work on BEAM-5514: Author: ASF GitHub Bot Created on: 06/Dec/18 20:50 Start Date: 06/Dec/18 20:50 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly URL: https://github.com/apache/beam/pull/7189#discussion_r239609349 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java ## @@ -736,17 +736,21 @@ public void deleteDataset(String projectId, String datasetId) try { return insert.execute().getInsertErrors(); } catch (IOException e) { - if (new ApiErrorExtractor().rateLimited(e)) { + if (ApiErrorExtractor.INSTANCE.rateLimited(e)) { LOG.info("BigQuery insertAll exceeded rate limit, retrying"); -try { - sleeper.sleep(backoff1.nextBackOffMillis()); -} catch (InterruptedException interrupted) { - throw new IOException( - "Interrupted while waiting before retrying insertAll"); -} + } else if (ApiErrorExtractor.INSTANCE + .getErrorMessage(e) + .startsWith("Quota exceeded")) { Review comment: AFAIK, the worker will fail and retry on all other errors after ten seconds anyway. The question here is whether the given error needs to be silently (no explicit error log) retried with exponential backoff or not. I think it makes sense to use exponential backoff for `quota exceeded` and `rate limit exceeded` errors since they are temporal and there's a high chance of getting resolved by themselves in next few retrials. I'm not sure the same holds true for other possible errors like `field size too large`, `unauthorized`, or `user project missing`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172828) Time Spent: 1h 50m (was: 1h 40m) > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Heejong Lee >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.
[ https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172824=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172824 ] ASF GitHub Bot logged work on BEAM-6120: Author: ASF GitHub Bot Created on: 06/Dec/18 20:29 Start Date: 06/Dec/18 20:29 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7127: [BEAM-6120] Support retrieval of large gbk iterables over the state API. URL: https://github.com/apache/beam/pull/7127#discussion_r239601740 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -690,6 +720,35 @@ def decode_from_stream(self, in_stream, nested): elements.append(self._elem_coder.decode_from_stream(in_stream, True)) count = in_stream.read_var_int64() + if count == -1: +if self._read_state is None: + raise ValueError( + 'Cannot read state-written iterable without state reader.') + +class FullIterable(object): + def __init__(self, head, tail): +self._head = head +self._tail = tail + + def __iter__(self): +for elem in self._head: + yield elem +for elem in self._tail: + yield elem + + def __eq__(self, other): +return list(self) == list(other) + + def __hash__(self): +raise NotImplementedError + + def __reduce__(self): +return list, (list(self),) + +state_token = in_stream.read_all(True) Review comment: shouldn't this read the length of the state token from in_stream? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172824) Time Spent: 3h (was: 2h 50m) > Support retrieval of large gbk iterables over the state API. > > > Key: BEAM-6120 > URL: https://issues.apache.org/jira/browse/BEAM-6120 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.
[ https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172822 ] ASF GitHub Bot logged work on BEAM-6120: Author: ASF GitHub Bot Created on: 06/Dec/18 20:29 Start Date: 06/Dec/18 20:29 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7127: [BEAM-6120] Support retrieval of large gbk iterables over the state API. URL: https://github.com/apache/beam/pull/7127#discussion_r239593123 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -637,20 +637,33 @@ class SequenceCoderImpl(StreamCoderImpl): countX element(0) element(1) ... element(countX - 1) 0 + If writing to state is enabled, the final terminating 0 will instead be + repaced with:: + +-1 +len(state_token) Review comment: important to state that len(state_token) is encoded as varint64 nit: not that important but you could drop the `-1` and encode it as `-len(state_token)`. Will save a few bytes within the encoding since -1 will be 9 bytes already as a varint64 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172822) Time Spent: 2h 50m (was: 2h 40m) > Support retrieval of large gbk iterables over the state API. > > > Key: BEAM-6120 > URL: https://issues.apache.org/jira/browse/BEAM-6120 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6120) Support retrieval of large gbk iterables over the state API.
[ https://issues.apache.org/jira/browse/BEAM-6120?focusedWorklogId=172823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172823 ] ASF GitHub Bot logged work on BEAM-6120: Author: ASF GitHub Bot Created on: 06/Dec/18 20:29 Start Date: 06/Dec/18 20:29 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7127: [BEAM-6120] Support retrieval of large gbk iterables over the state API. URL: https://github.com/apache/beam/pull/7127#discussion_r239591997 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -588,6 +588,11 @@ message StandardCoders { // of the element // Components: The element coder and the window coder, in that order WINDOWED_VALUE = 8 [(beam_urn) = "beam:coder:windowed_value:v1"]; + +// Encodes an iterable of elements, some of which may be stored in state. Review comment: Can we document the encoding here similar to TIMER? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172823) > Support retrieval of large gbk iterables over the state API. > > > Key: BEAM-6120 > URL: https://issues.apache.org/jira/browse/BEAM-6120 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172807 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 19:49 Start Date: 06/Dec/18 19:49 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239590504 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from + // proto after rebasing above https://github.com/apache/beam/pull/6799 that + // introduces relevant proto entries. + final String BEAM_METRICS_USER_PREFIX = "beam:metric:user"; + + private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInfo) { +long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +String urn = monitoringInfo.getUrn(); + +String type = monitoringInfo.getType(); + +// todomigryz: run MonitoringInfo through validation process. +// refer to https://github.com/apache/beam/pull/6799 + +if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) { + if (!type.equals("beam:metrics:sum_int_64")) { +LOG.warn( Review comment: It would be good to understand the scenarios we could reach these states and design error handling appropriately. If the only scenario that would produce this is a bug in the SDK, then we should expose that bug as early as possible to get it fixed. Log-and-continue means that we'll ignore it, and unless we have an explicit test our users will notice before we do. I doubt we have a consistent methodology for error handling in Beam, but I remember this came up when designing DIsplayData in the java SDK. We decided errors in constructing DisplayData from user-defined PTransforms should throw and block job submission, rather than continuing with empty Display Data. This feels similar. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172807) Time Spent: 5.5h (was: 5h 20m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5920) Add additional OWNERS
[ https://issues.apache.org/jira/browse/BEAM-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-5920. Resolution: Fixed Fix Version/s: 2.10.0 > Add additional OWNERS > - > > Key: BEAM-5920 > URL: https://issues.apache.org/jira/browse/BEAM-5920 > Project: Beam > Issue Type: Sub-task > Components: project-management >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Minor > Fix For: 2.10.0 > > Time Spent: 50m > Remaining Estimate: 0h > > We should spread knowledge and ownership of this new project. We have two > owners [currently > documented|https://github.com/apache/beam/blob/master/.test-infra/metrics/OWNERS]. > I plan to monitor the new infrastructure closely during its initial rollout, > and then once it's stabilized add additional owners. > I'd like to add additional owners in ~1 month -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172794 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239569411 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name)) { +evaluationDependsOn(projName) + } +} + +// Copy our pom.xml to the location where a generated POM would go +task copyPom() { + doLast { +def moduleNames = new ArrayList<>() +for (String projName : rootProject.ext.allProjectNames) { + def subproject = project(projName) + if (subproject.ext.properties.containsKey('includeInJavaBom') && + subproject.ext.properties.includeInJavaBom) { +moduleNames.add(subproject.name) + } +} + +new File(mavenJavaDir).mkdirs() +copy { Review comment: I believe it would be more idiomatic to make the `copyPom` task `type: Copy`, and move this logic outside of the `doLast {}` block. That would move the `moduleNames` aggregation to the evaluation phase, which also seems idiomatic. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172794) Time Spent: 1h (was: 50m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 1h > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172792 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239541511 ## File path: build.gradle ## @@ -281,3 +281,107 @@ release { requireBranch = 'release-.*|master' } } + +project.ext.allProjectNames = [ Review comment: This set is already defined in [settings.gradle](https://github.com/apache/beam/blob/master/settings.gradle). Would it work to reference it via [`Project.allProjects`](https://docs.gradle.org/current/dsl/org.gradle.api.Project.html#org.gradle.api.Project:allprojects) ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172792) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172797 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239578916 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name)) { +evaluationDependsOn(projName) + } +} + +// Copy our pom.xml to the location where a generated POM would go +task copyPom() { + doLast { +def moduleNames = new ArrayList<>() +for (String projName : rootProject.ext.allProjectNames) { + def subproject = project(projName) + if (subproject.ext.properties.containsKey('includeInJavaBom') && + subproject.ext.properties.includeInJavaBom) { +moduleNames.add(subproject.name) + } +} + +new File(mavenJavaDir).mkdirs() +copy { + from 'pom.xml.template' + into mavenJavaDir + rename 'pom.xml.template', 'pom-default.xml' + expand(version: project.version, modules: moduleNames) +} + } +} + +assemble.dependsOn copyPom + +// We want to use our own pom.xml instead of the generated one, so we disable +// the pom.xml generation and have the publish tasks depend on `copyPom` instead. +tasks.whenTaskAdded { task -> + if (task.name == 'generatePomFileForMavenJavaPublication') { +task.enabled = false + } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') { +task.dependsOn copyPom + } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') { +task.dependsOn copyPom + } +} + +jar.enabled = false + +// Remove the default jar archive which is added by the 'java' plugin. +configurations.archives.artifacts.with { archives -> + def artifacts = [] + archives.each { +if (it.file =~ 'jar') { + // We can't just call `archives.remove(it)` here because it triggers + // a `ConcurrentModificationException`, so we add matching artifacts + // to another list, then remove those elements outside of this iteration. + artifacts.add(it) +} + } + artifacts.each { +archives.remove(it) + } +} + +artifacts { + archives(mavenJavaBomOutputFile) { +builtBy copyPom + } +} + +afterEvaluate { + // We can't use the `publishing` section from applyJavaNature because + // we don't want all the Java artifacts, and we want to use our own pom.xml + // instead of the generated one. + publishing { +publications { + mavenJava(MavenPublication) { Review comment: Should consuming this BOM become the recommended way to depend on Beam? If so, perhaps we should implement it in the [maven-archetypes](https://github.com/apache/beam/tree/master/sdks/java/maven-archetypes). The archetype projects also get continuous testing with nightly releases, so that would help ensure our produced BOM is valid. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172797) Time Spent: 1h 20m (was: 1h 10m) > Create BOM for Beam > --- > > Key: BEAM-6178 >
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172795 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239574541 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name)) { +evaluationDependsOn(projName) + } +} + +// Copy our pom.xml to the location where a generated POM would go +task copyPom() { + doLast { +def moduleNames = new ArrayList<>() +for (String projName : rootProject.ext.allProjectNames) { + def subproject = project(projName) + if (subproject.ext.properties.containsKey('includeInJavaBom') && + subproject.ext.properties.includeInJavaBom) { +moduleNames.add(subproject.name) + } +} + +new File(mavenJavaDir).mkdirs() +copy { + from 'pom.xml.template' + into mavenJavaDir + rename 'pom.xml.template', 'pom-default.xml' + expand(version: project.version, modules: moduleNames) +} + } +} + +assemble.dependsOn copyPom + +// We want to use our own pom.xml instead of the generated one, so we disable +// the pom.xml generation and have the publish tasks depend on `copyPom` instead. +tasks.whenTaskAdded { task -> + if (task.name == 'generatePomFileForMavenJavaPublication') { +task.enabled = false + } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') { +task.dependsOn copyPom + } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') { +task.dependsOn copyPom + } +} + +jar.enabled = false + +// Remove the default jar archive which is added by the 'java' plugin. +configurations.archives.artifacts.with { archives -> + def artifacts = [] + archives.each { +if (it.file =~ 'jar') { + // We can't just call `archives.remove(it)` here because it triggers + // a `ConcurrentModificationException`, so we add matching artifacts + // to another list, then remove those elements outside of this iteration. + artifacts.add(it) +} + } + artifacts.each { +archives.remove(it) + } +} + +artifacts { + archives(mavenJavaBomOutputFile) { +builtBy copyPom + } +} + +afterEvaluate { + // We can't use the `publishing` section from applyJavaNature because Review comment: Same comment as above-- this is a lot of redundancy and I worry that as they are maintained over time the implementations will drift. Can we refactor the main publishing config to have the necessary hooks for this task? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172795) Time Spent: 1h 10m (was: 1h) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical >
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172789 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239572487 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name)) { +evaluationDependsOn(projName) + } +} + +// Copy our pom.xml to the location where a generated POM would go +task copyPom() { + doLast { +def moduleNames = new ArrayList<>() +for (String projName : rootProject.ext.allProjectNames) { + def subproject = project(projName) + if (subproject.ext.properties.containsKey('includeInJavaBom') && + subproject.ext.properties.includeInJavaBom) { +moduleNames.add(subproject.name) + } +} + +new File(mavenJavaDir).mkdirs() +copy { + from 'pom.xml.template' + into mavenJavaDir + rename 'pom.xml.template', 'pom-default.xml' + expand(version: project.version, modules: moduleNames) +} + } +} + +assemble.dependsOn copyPom + +// We want to use our own pom.xml instead of the generated one, so we disable +// the pom.xml generation and have the publish tasks depend on `copyPom` instead. +tasks.whenTaskAdded { task -> + if (task.name == 'generatePomFileForMavenJavaPublication') { Review comment: This introduces a second method of pom file generation and a duplicated set of POM boilerplate. I worry that when one gets updated we'll forget to update the other. Do you think we could consolidate on a single way of generating POM's? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172789) Time Spent: 50m (was: 40m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172788 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239566618 ## File path: runners/samza/build.gradle ## @@ -19,7 +19,7 @@ import groovy.json.JsonOutput apply plugin: org.apache.beam.gradle.BeamModulePlugin -applyJavaNature() +applyJavaNature(exportJavadoc: false) Review comment: I suspect that some of these exclusions are oversight. I appreciate that you've matched the existing behavior. It will be easy to follow-up later to enable javadocs for some of these modules which should probably have them. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172788) Time Spent: 50m (was: 40m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172790 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239546811 ## File path: sdks/java/javadoc/build.gradle ## @@ -26,64 +26,33 @@ description = "Apache Beam :: SDKs :: Java :: Aggregated Javadoc" apply plugin: 'java' -def exportedJavadocProjects = [ - ':beam-runners-apex', - ':beam-runners-core-construction-java', - ':beam-runners-core-java', - ':beam-runners-java-fn-execution', - ':beam-runners-local-java-core', - ':beam-runners-direct-java', - ':beam-runners-reference-java', - ':beam-runners-flink_2.11', - ':beam-runners-google-cloud-dataflow-java', - ':beam-runners-spark', - ':beam-runners-gearpump', - ':beam-sdks-java-core', - ':beam-sdks-java-fn-execution', - ':beam-sdks-java-extensions-google-cloud-platform-core', - ':beam-sdks-java-extensions-join-library', - ':beam-sdks-java-extensions-json-jackson', - ':beam-sdks-java-extensions-protobuf', - ':beam-sdks-java-extensions-sketching', - ':beam-sdks-java-extensions-sorter', - ':beam-sdks-java-extensions-sql', - ':beam-sdks-java-harness', - ':beam-sdks-java-io-amazon-web-services', - ':beam-sdks-java-io-amqp', - ':beam-sdks-java-io-cassandra', - ':beam-sdks-java-io-elasticsearch', - ':beam-sdks-java-io-elasticsearch-tests-2', - ':beam-sdks-java-io-elasticsearch-tests-5', - ':beam-sdks-java-io-elasticsearch-tests-6', - ':beam-sdks-java-io-elasticsearch-tests-common', - ':beam-sdks-java-io-google-cloud-platform', - ':beam-sdks-java-io-hadoop-common', - ':beam-sdks-java-io-hadoop-file-system', - ':beam-sdks-java-io-hadoop-input-format', - ':beam-sdks-java-io-hbase', - ':beam-sdks-java-io-hcatalog', - ':beam-sdks-java-io-jdbc', - ':beam-sdks-java-io-jms', - ':beam-sdks-java-io-kafka', - ':beam-sdks-java-io-kinesis', - ':beam-sdks-java-io-mongodb', - ':beam-sdks-java-io-mqtt', - ':beam-sdks-java-io-parquet', - ':beam-sdks-java-io-redis', - ':beam-sdks-java-io-solr', - ':beam-sdks-java-io-tika', - ':beam-sdks-java-io-xml', -] +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name) && !projName.equals(':beam-sdks-java-bom')) { +evaluationDependsOn(projName) Review comment: Forcing every other project to depend on this one feels a bit dirty. If another project used the same pattern, would it create a build cycle? The previous behavior was still hacky, but it only included the projects listed here for javadoc support; so if there were another uber project using `evaluationDependsOn` it wouldn't be contained in the previous list and thus avoid a build cycle. However, I had a lot of trouble finding a better way of structuring this. Some relevant docs/threads: * [Multi-project Builds: Depending on the task output produced by another project](https://docs.gradle.org/current/userguide/multi_project_builds.html#sec:depending_on_output_of_another_project) * [Discuss: Best approach Gradle multi-module project: generate just one global javadoc](https://discuss.gradle.org/t/best-approach-gradle-multi-module-project-generate-just-one-global-javadoc/18657/22) * [gradle-aggregate-javadocs-plugin](https://github.com/nebula-plugins/gradle-aggregate-javadocs-plugin) One thought is that it seems easier to structure aggregation logic by putting it in the root build script. By default the root build script is configured first so you have some guarantees on ordering. With all that said, I'm happy deferring this. You've stepped into an awkward part of the build process, and I don't expect you to fix it here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172790) Time Spent: 50m (was: 40m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 50m >
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172796 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239573794 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { + if (!projName.equals(':' + project.name)) { +evaluationDependsOn(projName) + } +} + +// Copy our pom.xml to the location where a generated POM would go +task copyPom() { + doLast { +def moduleNames = new ArrayList<>() +for (String projName : rootProject.ext.allProjectNames) { + def subproject = project(projName) + if (subproject.ext.properties.containsKey('includeInJavaBom') && + subproject.ext.properties.includeInJavaBom) { +moduleNames.add(subproject.name) + } +} + +new File(mavenJavaDir).mkdirs() +copy { + from 'pom.xml.template' + into mavenJavaDir + rename 'pom.xml.template', 'pom-default.xml' + expand(version: project.version, modules: moduleNames) +} + } +} + +assemble.dependsOn copyPom + +// We want to use our own pom.xml instead of the generated one, so we disable +// the pom.xml generation and have the publish tasks depend on `copyPom` instead. +tasks.whenTaskAdded { task -> + if (task.name == 'generatePomFileForMavenJavaPublication') { +task.enabled = false + } else if (task.name == 'publishMavenJavaPublicationToMavenLocal') { +task.dependsOn copyPom + } else if (task.name == 'publishMavenJavaPublicationToMavenRepository') { +task.dependsOn copyPom + } +} + +jar.enabled = false + +// Remove the default jar archive which is added by the 'java' plugin. +configurations.archives.artifacts.with { archives -> Review comment: Why is this needed? is the above `jar.enabled = false` not sufficient? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172796) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 1h 10m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172793 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239567323 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" +apply plugin: "maven-publish" + +def isRelease(Project project) { + return project.hasProperty('isRelease') +} + +ext { + mavenJavaDir = "$project.buildDir/publications/mavenJava" + mavenJavaBomOutputFile = file(mavenJavaDir + "/pom-default.xml") +} + +for (String projName : rootProject.ext.allProjectNames) { Review comment: Same comment from javadocs; can this instead use `allProjects` to remove the redundant `allProjectNames` array? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172793) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172791 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 19:16 Start Date: 06/Dec/18 19:16 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#discussion_r239567036 ## File path: sdks/java/bom/build.gradle ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +apply plugin: "java" Review comment: Can you add a small comment up here about what a BOM is and why its useful? Feel free to link to existing context/documentation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172791) Time Spent: 50m (was: 40m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6190) "Processing stuck" messages should be visible on Pantheon
Dustin Rhodes created BEAM-6190: --- Summary: "Processing stuck" messages should be visible on Pantheon Key: BEAM-6190 URL: https://issues.apache.org/jira/browse/BEAM-6190 Project: Beam Issue Type: Improvement Components: runner-dataflow Affects Versions: 2.8.0 Environment: Running on Google Cloud Dataflow Reporter: Dustin Rhodes Assignee: Tyler Akidau Fix For: Not applicable When user processing results in an exception, it is clearly visible on the Pantheon landing page for a streaming Dataflow job. But when user processing becomes stuck, there is no indication, even though the worker logs it. Most users don't check worker logs and it is not that convenient to check for most users. Ideally a stuck worker would result in a visible error on the Pantheon landing page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5850) Make process, finish and start run on the same thread to support metrics.
[ https://issues.apache.org/jira/browse/BEAM-5850?focusedWorklogId=172780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172780 ] ASF GitHub Bot logged work on BEAM-5850: Author: ASF GitHub Bot Created on: 06/Dec/18 18:52 Start Date: 06/Dec/18 18:52 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6786: [BEAM-5850] Add QueueingBeamFnDataClient and make process, finish and start run on the same thread to support metrics. URL: https://github.com/apache/beam/pull/6786#issuecomment-444985648 @lukecwik Rebased this PR after #7214 was merged into master. Would you please merge this after precommit finishes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172780) Time Spent: 15h 10m (was: 15h) > Make process, finish and start run on the same thread to support metrics. > - > > Key: BEAM-5850 > URL: https://issues.apache.org/jira/browse/BEAM-5850 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Time Spent: 15h 10m > Remaining Estimate: 0h > > Update BeamFnDataReceiver to place elements into a Queue and consumer then > and call the element processing receiver in blockTillReadFinishes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5920) Add additional OWNERS
[ https://issues.apache.org/jira/browse/BEAM-5920?focusedWorklogId=172765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172765 ] ASF GitHub Bot logged work on BEAM-5920: Author: ASF GitHub Bot Created on: 06/Dec/18 18:34 Start Date: 06/Dec/18 18:34 Worklog Time Spent: 10m Work Description: yifanzou commented on issue #7186: [BEAM-5920] Add additional owners for Community Metrics URL: https://github.com/apache/beam/pull/7186#issuecomment-444979962 LGTM, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172765) Time Spent: 40m (was: 0.5h) > Add additional OWNERS > - > > Key: BEAM-5920 > URL: https://issues.apache.org/jira/browse/BEAM-5920 > Project: Beam > Issue Type: Sub-task > Components: project-management >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > We should spread knowledge and ownership of this new project. We have two > owners [currently > documented|https://github.com/apache/beam/blob/master/.test-infra/metrics/OWNERS]. > I plan to monitor the new infrastructure closely during its initial rollout, > and then once it's stabilized add additional owners. > I'd like to add additional owners in ~1 month -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172762 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 18:29 Start Date: 06/Dec/18 18:29 Worklog Time Spent: 10m Work Description: Ardagan commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239541582 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from + // proto after rebasing above https://github.com/apache/beam/pull/6799 that + // introduces relevant proto entries. + final String BEAM_METRICS_USER_PREFIX = "beam:metric:user"; + + private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInfo) { +long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +String urn = monitoringInfo.getUrn(); + +String type = monitoringInfo.getType(); + +// todomigryz: run MonitoringInfo through validation process. +// refer to https://github.com/apache/beam/pull/6799 + +if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) { + if (!type.equals("beam:metrics:sum_int_64")) { +LOG.warn( Review comment: I believe that writing warning is good balance here. Most of the time you want your job to process data, and use counters for monitoring. In this situation usually, you do not want to lose data due to monitoring issues. This approach allows SDK authors to find errors in their SDK implementations on one hand. And not break user pipelines on the other hand. @ajamato Alex, what's your opinion on this topic? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172762) Time Spent: 5h 20m (was: 5h 10m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle
[ https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172750 ] ASF GitHub Bot logged work on BEAM-6159: Author: ASF GitHub Bot Created on: 06/Dec/18 18:13 Start Date: 06/Dec/18 18:13 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7015: [BEAM-6159] Migrate dataflow portable worker using shared library to process bundle URL: https://github.com/apache/beam/pull/7015#discussion_r239556933 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.environment; + +import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; +import org.apache.beam.runners.fnexecution.control.ControlClientPool; +import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; +import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler; +import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService; +import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; +import org.apache.beam.sdk.fn.IdGenerator; + +/** + * An {@link EnvironmentFactory} that creates StaticRemoteEnvironment used by Dataflow runner + * harness. + */ +public class StaticRemoteEnvironmentFactory implements EnvironmentFactory { + public static StaticRemoteEnvironmentFactory forService( + InstructionRequestHandler instructionRequestHandler) { +StaticRemoteEnvironmentFactory factory = new StaticRemoteEnvironmentFactory(); Review comment: nit: In a follow up PR, why not define and use a constructor that takes this parameter. This would allow you to make the member variable final. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172750) > Dataflow portable runner harness should use ExecutableStage to process bundle > - > > Key: BEAM-6159 > URL: https://issues.apache.org/jira/browse/BEAM-6159 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle
[ https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172753 ] ASF GitHub Bot logged work on BEAM-6159: Author: ASF GitHub Bot Created on: 06/Dec/18 18:13 Start Date: 06/Dec/18 18:13 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7015: [BEAM-6159] Migrate dataflow portable worker using shared library to process bundle URL: https://github.com/apache/beam/pull/7015#discussion_r239554369 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/CreateExecutableStageNodeFunction.java ## @@ -0,0 +1,449 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.worker.graph; + +import static com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.runners.dataflow.util.Structs.getBytes; +import static org.apache.beam.runners.dataflow.util.Structs.getString; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.api.services.dataflow.model.InstructionOutput; +import com.google.api.services.dataflow.model.MapTask; +import com.google.api.services.dataflow.model.MultiOutputInfo; +import com.google.api.services.dataflow.model.ParDoInstruction; +import com.google.api.services.dataflow.model.ParallelInstruction; +import com.google.api.services.dataflow.model.ReadInstruction; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Iterables; +import com.google.common.graph.MutableNetwork; +import com.google.common.graph.Network; +import java.io.IOException; +import java.util.Base64; +import java.util.HashMap; +import java.util.HashSet; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; +import javax.annotation.Nullable; +import org.apache.beam.model.pipeline.v1.RunnerApi; +import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; +import org.apache.beam.model.pipeline.v1.RunnerApi.StandardPTransforms; +import org.apache.beam.runners.core.construction.BeamUrns; +import org.apache.beam.runners.core.construction.CoderTranslation; +import org.apache.beam.runners.core.construction.Environments; +import org.apache.beam.runners.core.construction.PTransformTranslation; +import org.apache.beam.runners.core.construction.ParDoTranslation; +import org.apache.beam.runners.core.construction.RehydratedComponents; +import org.apache.beam.runners.core.construction.SdkComponents; +import org.apache.beam.runners.core.construction.WindowingStrategyTranslation; +import org.apache.beam.runners.core.construction.graph.ExecutableStage; +import org.apache.beam.runners.core.construction.graph.ImmutableExecutableStage; +import org.apache.beam.runners.core.construction.graph.PipelineNode; +import org.apache.beam.runners.core.construction.graph.PipelineNode.PCollectionNode; +import org.apache.beam.runners.core.construction.graph.PipelineNode.PTransformNode; +import org.apache.beam.runners.core.construction.graph.SideInputReference; +import org.apache.beam.runners.core.construction.graph.TimerReference; +import org.apache.beam.runners.core.construction.graph.UserStateReference; +import org.apache.beam.runners.dataflow.util.CloudObject; +import org.apache.beam.runners.dataflow.util.CloudObjects; +import org.apache.beam.runners.dataflow.util.PropertyNames; +import org.apache.beam.runners.dataflow.worker.CombinePhase; +import org.apache.beam.runners.dataflow.worker.counters.NameContext; +import org.apache.beam.runners.dataflow.worker.graph.Edges.DefaultEdge; +import org.apache.beam.runners.dataflow.worker.graph.Edges.Edge; +import org.apache.beam.runners.dataflow.worker.graph.Edges.MultiOutputInfoEdge; +import org.apache.beam.runners.dataflow.worker.graph.Nodes.ExecutableStageNode; +import org.apache.beam.runners.dataflow.worker.graph.Nodes.InstructionOutputNode; +import org.apache.beam.runners.dataflow.worker.graph.Nodes.Node; +import org.apache.beam.runners.dataflow.worker.graph.Nodes.ParallelInstructionNode; +import
[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle
[ https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172749 ] ASF GitHub Bot logged work on BEAM-6159: Author: ASF GitHub Bot Created on: 06/Dec/18 18:13 Start Date: 06/Dec/18 18:13 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7015: [BEAM-6159] Migrate dataflow portable worker using shared library to process bundle URL: https://github.com/apache/beam/pull/7015#discussion_r239550773 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ProcessRemoteBundleOperation.java ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.worker.fn.control; + +import com.google.common.collect.Iterables; +import java.io.Closeable; +import org.apache.beam.runners.dataflow.worker.util.common.worker.OperationContext; +import org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver; +import org.apache.beam.runners.dataflow.worker.util.common.worker.ReceivingOperation; +import org.apache.beam.runners.fnexecution.control.BundleProgressHandler; +import org.apache.beam.runners.fnexecution.control.OutputReceiverFactory; +import org.apache.beam.runners.fnexecution.control.RemoteBundle; +import org.apache.beam.runners.fnexecution.control.StageBundleFactory; +import org.apache.beam.runners.fnexecution.state.StateRequestHandler; +import org.apache.beam.sdk.fn.data.FnDataReceiver; +import org.apache.beam.sdk.util.WindowedValue; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * This {@link org.apache.beam.runners.dataflow.worker.util.common.worker.Operation} is responsible + * for communicating with the SDK harness and asking it to process a bundle of work. This operation + * request a RemoteBundle{@link org.apache.beam.runners.fnexecution.control.RemoteBundle}, send data + * elements to SDK and receive processed results from SDK, then pass these elements to next + * Operations. + */ Review comment: In a follow up PR: ``` * This {@link org.apache.beam.runners.dataflow.worker.util.common.worker.Operation} is responsible * for communicating with the SDK harness and asking it to process a bundle of work. This operation * requests a {@link org.apache.beam.runners.fnexecution.control.RemoteBundle}, sends * elements to SDK and receive processed results from SDK, passing these elements downstream. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172749) Time Spent: 1h 10m (was: 1h) > Dataflow portable runner harness should use ExecutableStage to process bundle > - > > Key: BEAM-6159 > URL: https://issues.apache.org/jira/browse/BEAM-6159 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle
[ https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172748 ] ASF GitHub Bot logged work on BEAM-6159: Author: ASF GitHub Bot Created on: 06/Dec/18 18:13 Start Date: 06/Dec/18 18:13 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7015: [BEAM-6159] Migrate dataflow portable worker using shared library to process bundle URL: https://github.com/apache/beam/pull/7015#discussion_r239557879 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.environment; + +import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; +import org.apache.beam.runners.fnexecution.GrpcFnServer; +import org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService; +import org.apache.beam.runners.fnexecution.control.ControlClientPool; +import org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService; +import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler; +import org.apache.beam.runners.fnexecution.logging.GrpcLoggingService; +import org.apache.beam.runners.fnexecution.provisioning.StaticGrpcProvisionService; +import org.apache.beam.sdk.fn.IdGenerator; + +/** + * An {@link EnvironmentFactory} that creates StaticRemoteEnvironment used by Dataflow runner Review comment: In a follow up PR. This isn't specific to Dataflow, anyone should be able to use a an already existing InstructionRequestHandler if one chooses. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172748) Time Spent: 1h 10m (was: 1h) > Dataflow portable runner harness should use ExecutableStage to process bundle > - > > Key: BEAM-6159 > URL: https://issues.apache.org/jira/browse/BEAM-6159 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle
[ https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172747 ] ASF GitHub Bot logged work on BEAM-6159: Author: ASF GitHub Bot Created on: 06/Dec/18 18:13 Start Date: 06/Dec/18 18:13 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7015: [BEAM-6159] Migrate dataflow portable worker using shared library to process bundle URL: https://github.com/apache/beam/pull/7015#discussion_r23962 ## File path: runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/IntrinsicMapTaskExecutorFactoryTest.java ## @@ -174,8 +174,8 @@ public void testCreateMapTaskExecutor() throws Exception { mapTaskExecutorFactory.create( null /* beamFnControlClientHandler */, null /* beamFnDataService */, -null, /* dataApiServiceDescriptor */ null /* beamFnStateService */, +null, Review comment: nit: in a follow up PR add the comment stating what the `null` represents. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172747) Time Spent: 1h 10m (was: 1h) > Dataflow portable runner harness should use ExecutableStage to process bundle > - > > Key: BEAM-6159 > URL: https://issues.apache.org/jira/browse/BEAM-6159 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle
[ https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172754 ] ASF GitHub Bot logged work on BEAM-6159: Author: ASF GitHub Bot Created on: 06/Dec/18 18:13 Start Date: 06/Dec/18 18:13 Worklog Time Spent: 10m Work Description: lukecwik closed pull request #7015: [BEAM-6159] Migrate dataflow portable worker using shared library to process bundle URL: https://github.com/apache/beam/pull/7015 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java index 3feb7064b895..1d1c56cb80f3 100644 --- a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java +++ b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/portable/ReferenceRunner.java @@ -197,7 +197,7 @@ public void execute() throws Exception { EnvironmentFactory environmentFactory = createEnvironmentFactory(control, logging, artifact, provisioning, controlClientPool); JobBundleFactory jobBundleFactory = - SingleEnvironmentInstanceJobBundleFactory.create(environmentFactory, data, state); + SingleEnvironmentInstanceJobBundleFactory.create(environmentFactory, data, state, null); TransformEvaluatorRegistry transformRegistry = TransformEvaluatorRegistry.portableRegistry( diff --git a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/RemoteStageEvaluatorFactoryTest.java b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/RemoteStageEvaluatorFactoryTest.java index f0f877283154..2dfaac966827 100644 --- a/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/RemoteStageEvaluatorFactoryTest.java +++ b/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/RemoteStageEvaluatorFactoryTest.java @@ -111,7 +111,7 @@ public void setup() throws Exception { bundleFactory = ImmutableListBundleFactory.create(); JobBundleFactory jobBundleFactory = SingleEnvironmentInstanceJobBundleFactory.create( -environmentFactory, dataServer, stateServer); +environmentFactory, dataServer, stateServer, null); factory = new RemoteStageEvaluatorFactory(bundleFactory, jobBundleFactory); } diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java index 1433702f714d..f0e8ddb73456 100644 --- a/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchDataflowWorker.java @@ -37,6 +37,7 @@ import org.apache.beam.runners.dataflow.worker.apiary.FixMultiOutputInfosOnParDoInstructions; import org.apache.beam.runners.dataflow.worker.counters.CounterSet; import org.apache.beam.runners.dataflow.worker.graph.CloneAmbiguousFlattensFunction; +import org.apache.beam.runners.dataflow.worker.graph.CreateExecutableStageNodeFunction; import org.apache.beam.runners.dataflow.worker.graph.CreateRegisterFnOperationFunction; import org.apache.beam.runners.dataflow.worker.graph.DeduceFlattenLocationsFunction; import org.apache.beam.runners.dataflow.worker.graph.DeduceNodeLocationsFunction; @@ -218,18 +219,32 @@ protected BatchDataflowWorker( // TODO: this conditional -> two implementations of common interface, or // param/injection if (DataflowRunner.hasExperiment(options, "beam_fn_api")) { - Function, Node> sdkFusedStage = - pipeline == null - ? RegisterNodeFunction.withoutPipeline( - idGenerator, sdkHarnessRegistry.beamFnStateApiServiceDescriptor()) - : RegisterNodeFunction.forPipeline( - pipeline, idGenerator, sdkHarnessRegistry.beamFnStateApiServiceDescriptor()); + Function, MutableNetwork> transformToRunnerNetwork; + Function, Node> sdkFusedStage; Function, MutableNetwork> lengthPrefixUnknownCoders = LengthPrefixUnknownCoders::forSdkNetwork; - Function, MutableNetwork> transformToRunnerNetwork = - new CreateRegisterFnOperationFunction( - idGenerator, this::createPortNode,
[jira] [Work logged] (BEAM-6159) Dataflow portable runner harness should use ExecutableStage to process bundle
[ https://issues.apache.org/jira/browse/BEAM-6159?focusedWorklogId=172752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172752 ] ASF GitHub Bot logged work on BEAM-6159: Author: ASF GitHub Bot Created on: 06/Dec/18 18:13 Start Date: 06/Dec/18 18:13 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #7015: [BEAM-6159] Migrate dataflow portable worker using shared library to process bundle URL: https://github.com/apache/beam/pull/7015#discussion_r239556452 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironment.java ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.environment; + +import net.jcip.annotations.ThreadSafe; +import org.apache.beam.model.pipeline.v1.RunnerApi.Environment; +import org.apache.beam.runners.fnexecution.control.InstructionRequestHandler; + +/** A {@link RemoteEnvironment} that connects to Dataflow runner harness. */ +@ThreadSafe +public class StaticRemoteEnvironment implements RemoteEnvironment { + + static StaticRemoteEnvironment create( + Environment environment, InstructionRequestHandler instructionRequestHandler) { +return new StaticRemoteEnvironment(environment, instructionRequestHandler); + } + + private final Object lock = new Object(); + private final Environment environment; + private final InstructionRequestHandler instructionRequestHandler; + + private boolean isClosed = false; + + private StaticRemoteEnvironment( + Environment environment, InstructionRequestHandler instructionRequestHandler) { +this.environment = environment; +this.instructionRequestHandler = instructionRequestHandler; + } + + @Override + public Environment getEnvironment() { +return this.environment; + } + + @Override + public InstructionRequestHandler getInstructionRequestHandler() { +return this.instructionRequestHandler; + } + + @Override + public void close() throws Exception { Review comment: nit: in a follow up PR, instead of using a lock object, make this method synchronized. Also prefer guard style statements: ``` if (isClosed) { return; } ... do other stuff to perform close ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172752) > Dataflow portable runner harness should use ExecutableStage to process bundle > - > > Key: BEAM-6159 > URL: https://issues.apache.org/jira/browse/BEAM-6159 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6189) Deprecate and cleanup BeamFnApi.Metrics from Dataflow Worker
Mikhail Gryzykhin created BEAM-6189: --- Summary: Deprecate and cleanup BeamFnApi.Metrics from Dataflow Worker Key: BEAM-6189 URL: https://issues.apache.org/jira/browse/BEAM-6189 Project: Beam Issue Type: Bug Components: java-fn-execution Reporter: Mikhail Gryzykhin Current Dataflow Worker code contains multiple references to Metrics proto. Some of these references include fallback scenario for cases where migration to MonitoringInfo did not happen yet. Once all SDKs migrated to new approach for reporting monitoring information, we need to cleanup Worker code and completely deprecate Metrics proto. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172714 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 17:24 Start Date: 06/Dec/18 17:24 Worklog Time Spent: 10m Work Description: Ardagan commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239541582 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from + // proto after rebasing above https://github.com/apache/beam/pull/6799 that + // introduces relevant proto entries. + final String BEAM_METRICS_USER_PREFIX = "beam:metric:user"; + + private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInfo) { +long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +String urn = monitoringInfo.getUrn(); + +String type = monitoringInfo.getType(); + +// todomigryz: run MonitoringInfo through validation process. +// refer to https://github.com/apache/beam/pull/6799 + +if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) { + if (!type.equals("beam:metrics:sum_int_64")) { +LOG.warn( Review comment: I believe that writing warning is good balance here. Most of the time you want your job to process data, and use counters for monitoring. This approach allows SDK authors to find errors in their SDK implementations on one hand. And not break user pipelines on the other hand. @ajamato Alex, what's your opinion on this topic? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172714) Time Spent: 5h 10m (was: 5h) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172708 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 17:20 Start Date: 06/Dec/18 17:20 Worklog Time Spent: 10m Work Description: Ardagan commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239540159 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from Review comment: This depends on https://github.com/apache/beam/pull/6799 Will be implemented in further PRs. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172708) Time Spent: 4h 50m (was: 4h 40m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172710 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 17:20 Start Date: 06/Dec/18 17:20 Worklog Time Spent: 10m Work Description: Ardagan commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239538937 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from + // proto after rebasing above https://github.com/apache/beam/pull/6799 that + // introduces relevant proto entries. + final String BEAM_METRICS_USER_PREFIX = "beam:metric:user"; + + private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInfo) { +long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +String urn = monitoringInfo.getUrn(); + +String type = monitoringInfo.getType(); + +// todomigryz: run MonitoringInfo through validation process. Review comment: This requires PR mentioned on next line to go in. (https://github.com/apache/beam/pull/6799) Technically, I can copy data from another PR, but format of counters is still being defined. It will cause more merging and resolving in both PRs as well. I suggest we keep it as is for this PR and generalize/fix it in following PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172710) Time Spent: 5h (was: 4h 50m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172706=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172706 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 17:16 Start Date: 06/Dec/18 17:16 Worklog Time Spent: 10m Work Description: Ardagan commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239538937 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from + // proto after rebasing above https://github.com/apache/beam/pull/6799 that + // introduces relevant proto entries. + final String BEAM_METRICS_USER_PREFIX = "beam:metric:user"; + + private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInfo) { +long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +String urn = monitoringInfo.getUrn(); + +String type = monitoringInfo.getType(); + +// todomigryz: run MonitoringInfo through validation process. Review comment: This requires PR mentioned below to go in. Technically, I can copy data from another PR, but format of counters is still being defined. It will cause more merging and resolving in both PRs. I suggest we keep it as is for this PR and generalize/fix it in following PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172706) Time Spent: 4h 40m (was: 4.5h) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6067) Dataflow runner should include portable pipeline coder id in CloudObject coder representation
[ https://issues.apache.org/jira/browse/BEAM-6067?focusedWorklogId=172701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172701 ] ASF GitHub Bot logged work on BEAM-6067: Author: ASF GitHub Bot Created on: 06/Dec/18 17:08 Start Date: 06/Dec/18 17:08 Worklog Time Spent: 10m Work Description: robertwb commented on issue #7081: [BEAM-6067] In Python SDK, specify pipeline_proto_coder_id property in non-Beam-standard CloudObject coders URL: https://github.com/apache/beam/pull/7081#issuecomment-444950367 Tests look good. Done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172701) Time Spent: 7.5h (was: 7h 20m) Remaining Estimate: 160.5h (was: 160h 40m) > Dataflow runner should include portable pipeline coder id in CloudObject > coder representation > - > > Key: BEAM-6067 > URL: https://issues.apache.org/jira/browse/BEAM-6067 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Craig Chambers >Assignee: Craig Chambers >Priority: Major > Original Estimate: 168h > Time Spent: 7.5h > Remaining Estimate: 160.5h > > When translating a BeamJava Coder into the DataflowRunner's CloudObject > property map, include a property that specifies the id in the Beam model > Pipeline coders map corresponding to that Coder. This will allow the > DataflowRunner to reference the corresponding Beam coder in the FnAPI > processing bundle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6067) Dataflow runner should include portable pipeline coder id in CloudObject coder representation
[ https://issues.apache.org/jira/browse/BEAM-6067?focusedWorklogId=172700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172700 ] ASF GitHub Bot logged work on BEAM-6067: Author: ASF GitHub Bot Created on: 06/Dec/18 17:08 Start Date: 06/Dec/18 17:08 Worklog Time Spent: 10m Work Description: robertwb closed pull request #7081: [BEAM-6067] In Python SDK, specify pipeline_proto_coder_id property in non-Beam-standard CloudObject coders URL: https://github.com/apache/beam/pull/7081 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/coders/coders.py b/sdks/python/apache_beam/coders/coders.py index f5c90a8ec99e..f2a4b2ee724a 100644 --- a/sdks/python/apache_beam/coders/coders.py +++ b/sdks/python/apache_beam/coders/coders.py @@ -190,7 +190,7 @@ def _get_component_coders(self): # refined in user-defined Coders. return [] - def as_cloud_object(self): + def as_cloud_object(self, coders_context=None): """For internal use only; no backwards-compatibility guarantees. Returns Google Cloud Dataflow API description of this coder.""" @@ -201,12 +201,17 @@ def as_cloud_object(self): # We pass coders in the form "$" to make the # job description JSON more readable. Data before the $ is ignored by # the worker. -'@type': serialize_coder(self), -'component_encodings': list( -component.as_cloud_object() +'@type': +serialize_coder(self), +'component_encodings': [ +component.as_cloud_object(coders_context) for component in self._get_component_coders() -), +], } + +if coders_context: + value['pipeline_proto_coder_id'] = coders_context.get_id(self) + return value def __repr__(self): @@ -370,7 +375,7 @@ def _create_impl(self): def is_deterministic(self): return True - def as_cloud_object(self): + def as_cloud_object(self, coders_context=None): return { '@type': 'kind:bytes', } @@ -394,7 +399,7 @@ def _create_impl(self): def is_deterministic(self): return True - def as_cloud_object(self): + def as_cloud_object(self, coders_context=None): return { '@type': 'kind:varint', } @@ -516,8 +521,8 @@ def is_deterministic(self): # GroupByKey operations. return False - def as_cloud_object(self, is_pair_like=True): -value = super(_PickleCoderBase, self).as_cloud_object() + def as_cloud_object(self, coders_context=None, is_pair_like=True): +value = super(_PickleCoderBase, self).as_cloud_object(coders_context) # We currently use this coder in places where we cannot infer the coder to # use for the value type in a more granular way. In places where the # service expects a pair, it checks for the "is_pair_like" key, in which @@ -525,8 +530,8 @@ def as_cloud_object(self, is_pair_like=True): if is_pair_like: value['is_pair_like'] = True value['component_encodings'] = [ - self.as_cloud_object(is_pair_like=False), - self.as_cloud_object(is_pair_like=False) + self.as_cloud_object(coders_context, is_pair_like=False), + self.as_cloud_object(coders_context, is_pair_like=False) ] return value @@ -615,8 +620,8 @@ def as_deterministic_coder(self, step_label, error_message=None): else: return DeterministicFastPrimitivesCoder(self, step_label) - def as_cloud_object(self, is_pair_like=True): -value = super(FastCoder, self).as_cloud_object() + def as_cloud_object(self, coders_context=None, is_pair_like=True): +value = super(FastCoder, self).as_cloud_object(coders_context) # We currently use this coder in places where we cannot infer the coder to # use for the value type in a more granular way. In places where the # service expects a pair, it checks for the "is_pair_like" key, in which @@ -624,8 +629,8 @@ def as_cloud_object(self, is_pair_like=True): if is_pair_like: value['is_pair_like'] = True value['component_encodings'] = [ - self.as_cloud_object(is_pair_like=False), - self.as_cloud_object(is_pair_like=False) + self.as_cloud_object(coders_context, is_pair_like=False), + self.as_cloud_object(coders_context, is_pair_like=False) ] return value @@ -744,18 +749,20 @@ def as_deterministic_coder(self, step_label, error_message=None): def from_type_hint(typehint, registry): return TupleCoder([registry.get_coder(t) for t in typehint.tuple_types]) - def as_cloud_object(self): + def as_cloud_object(self, coders_context=None):
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172698 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 17:02 Start Date: 06/Dec/18 17:02 Worklog Time Spent: 10m Work Description: garrettjonesgoogle commented on issue #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#issuecomment-444948074 *sigh* it was clean 2 days ago. I will resolve the conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172698) Time Spent: 40m (was: 0.5h) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6067) Dataflow runner should include portable pipeline coder id in CloudObject coder representation
[ https://issues.apache.org/jira/browse/BEAM-6067?focusedWorklogId=172696=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172696 ] ASF GitHub Bot logged work on BEAM-6067: Author: ASF GitHub Bot Created on: 06/Dec/18 17:00 Start Date: 06/Dec/18 17:00 Worklog Time Spent: 10m Work Description: CraigChambersG commented on issue #7081: [BEAM-6067] In Python SDK, specify pipeline_proto_coder_id property in non-Beam-standard CloudObject coders URL: https://github.com/apache/beam/pull/7081#issuecomment-444947320 Is there something I need to do at this point to get this PR checked in? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172696) Time Spent: 7h 10m (was: 7h) Remaining Estimate: 160h 50m (was: 161h) > Dataflow runner should include portable pipeline coder id in CloudObject > coder representation > - > > Key: BEAM-6067 > URL: https://issues.apache.org/jira/browse/BEAM-6067 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Craig Chambers >Assignee: Craig Chambers >Priority: Major > Original Estimate: 168h > Time Spent: 7h 10m > Remaining Estimate: 160h 50m > > When translating a BeamJava Coder into the DataflowRunner's CloudObject > property map, include a property that specifies the id in the Beam model > Pipeline coders map corresponding to that Coder. This will allow the > DataflowRunner to reference the corresponding Beam coder in the FnAPI > processing bundle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172695 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 16:59 Start Date: 06/Dec/18 16:59 Worklog Time Spent: 10m Work Description: swegner commented on issue #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#issuecomment-444947221 FYI, there are some merge conflicts. I will get started on the review, but before we can merge please resolve conflicts by either merge in master or rebase your branch on top. (`git fetch origin && git rebase origin/master`) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172695) Time Spent: 0.5h (was: 20m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 0.5h > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172686 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239525848 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from Review comment: Should this be fixed before merging? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172686) Time Spent: 4h 10m (was: 4h) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172690 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239524179 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -307,10 +344,17 @@ void updateProgress() { grpcWriteOperation.abortWait(); } -BeamFnApi.Metrics metrics = MoreFutures.get(bundleProcessOperation.getMetrics()); +//TODO: Replace getProcessBundleProgress with getMonitoringInfos when Metrics is deprecated. Review comment: +1 to tag a JIRA. It's useful to have some additional context for when the TODO inevitably gets stale and needs clarification to be actionable. The JIRA doesn't have to be explicitly for doing this cleanup. It should track when this TODO is ready to implement or obsolete. In this case, it would make sense to have a JIRA for deprecating the legacy Metrics. Google's public style guides [offer similar advice](https://google.github.io/styleguide/cppguide.html#TODO_Comments) (we don't follow Google styleguides in Beam, although the advice is useful): > Use TODO comments for code that is temporary, a short-term solution, or good-enough but not perfect. > > TODOs should include the string TODO in all caps, followed by the name, e-mail address, bug ID, or other identifier of the person or issue with the best context about the problem referenced by the TODO. The main purpose is to have a consistent TODO that can be searched to find out how to get more details upon request. A TODO is not a commitment that the person referenced will fix the problem. Thus when you create a TODO with a name, it is almost always your name that is given. > > `// TODO(k...@gmail.com): Use a "*" here for concatenation operator.` > `// TODO(Zeke) change this to use relations.` > `// TODO(bug 12345): remove the "Last visitors" feature` > > If your TODO is of the form "At a future date do something" make sure that you either include a very specific date ("Fix by November 2005") or a very specific event ("Remove this code when all clients can handle XML responses."). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172690) Time Spent: 4.5h (was: 4h 20m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172691 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239518531 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WorkItemStatusClient.java ## @@ -263,20 +263,26 @@ private synchronized WorkItemStatus createStatusUpdate(boolean isFinal) { return status; } + // todomigryz this method should return List instead of updating member variable Review comment: `todomigryz` -> `TODO(migryz)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172691) Time Spent: 4.5h (was: 4h 20m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172688 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239524339 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -307,10 +347,17 @@ void updateProgress() { grpcWriteOperation.abortWait(); } -BeamFnApi.Metrics metrics = MoreFutures.get(bundleProcessOperation.getMetrics()); +//TODO: Replace getProcessBundleProgress with getMonitoringInfos when Metrics is deprecated. +ProcessBundleProgressResponse processBundleProgressResponse = +MoreFutures.get(bundleProcessOperation.getProcessBundleProgress()); +updateMetrics(processBundleProgressResponse.getMonitoringInfosList()); -updateMetrics(metrics); +// Supporting deprecated metrics until all supported runners are migrated to using +// MonitoringInfos +Metrics metrics = processBundleProgressResponse.getMetrics(); +updateMetricsDeprecated(metrics); +// todomigryz: utilize monitoringInfos here. Review comment: `todomigryz` -> `TODO(migryz)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172688) Time Spent: 4.5h (was: 4h 20m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172685 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239524498 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory Review comment: `todomigryz` -> `TODO(migryz)` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172685) Time Spent: 4h (was: 3h 50m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172687 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239527436 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from + // proto after rebasing above https://github.com/apache/beam/pull/6799 that + // introduces relevant proto entries. + final String BEAM_METRICS_USER_PREFIX = "beam:metric:user"; + + private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInfo) { +long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +String urn = monitoringInfo.getUrn(); + +String type = monitoringInfo.getType(); + +// todomigryz: run MonitoringInfo through validation process. +// refer to https://github.com/apache/beam/pull/6799 + +if (urn.startsWith(BEAM_METRICS_USER_PREFIX)) { + if (!type.equals("beam:metrics:sum_int_64")) { +LOG.warn( Review comment: Are there scenarios where we expect invalid metric types which should be ignored? If so, please comment when this might be the case. If a correct implementation should always send valid data, it's best to fail rather than be resilient such that it's easier to find and fix bugs. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172687) Time Spent: 4h 20m (was: 4h 10m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172689 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239525981 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize +// when more counter types are added. +// todomigryz: define counter transformer factory +// that can provide respective counter transformer for different type of counters. +// (ie RowCountCounterTranformer, MSecCounterTransformer, UserCounterTransformer, etc) +private static class MonitoringInfoToCounterUpdateTransformer { + + private final Map transformIdMapping; + + public MonitoringInfoToCounterUpdateTransformer( + final Map transformIdMapping) { +this.transformIdMapping = transformIdMapping; + } + + // todo: search code for "beam:metrics"... and replace them with relevant enums from + // proto after rebasing above https://github.com/apache/beam/pull/6799 that + // introduces relevant proto entries. + final String BEAM_METRICS_USER_PREFIX = "beam:metric:user"; + + private CounterUpdate monitoringInfoToCounterUpdate(MonitoringInfo monitoringInfo) { +long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +String urn = monitoringInfo.getUrn(); + +String type = monitoringInfo.getType(); + +// todomigryz: run MonitoringInfo through validation process. Review comment: Should this be fixed before merging? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172689) Time Spent: 4.5h (was: 4h 20m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172684 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239520301 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -139,30 +150,44 @@ private AutoCloseable progressTrackerCloseable(ProgressTracker progressTracker) */ @Override public Iterable extractMetricUpdates() { +List result = progressTracker.extractCounterUpdates(); +if ((result != null) && (result.size() > 0)) { + return result; +} + +// BeamFnApi.Metrics was deprecated and replaced with more flexible MonitoringInfo. +// However some of SDKs have implementations that utilize BeamFnApi.Metrics and were not yet +// migrated to using new approach. +// Falling back to using deprecated approach until all officially supported SDKs complete migration. Review comment: Is there a JIRA that tracks this migration? More generally: is it possible to make this comment more actionable? Something of the form: `once this condition is true (all SDKs are migrated), do this action (remove this fallback)`. It should be straightforward for anybody to check on the migration status and then delete the code. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172684) Time Spent: 3h 50m (was: 3h 40m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6181) Utilize MetricInfo for reporting user metrics in Portable Dataflow Java Runner.
[ https://issues.apache.org/jira/browse/BEAM-6181?focusedWorklogId=172692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172692 ] ASF GitHub Bot logged work on BEAM-6181: Author: ASF GitHub Bot Created on: 06/Dec/18 16:51 Start Date: 06/Dec/18 16:51 Worklog Time Spent: 10m Work Description: swegner commented on a change in pull request #7202: [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. URL: https://github.com/apache/beam/pull/7202#discussion_r239525644 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java ## @@ -351,7 +398,115 @@ void updateProgress() { } } -private void updateMetrics(BeamFnApi.Metrics metrics) { +// Keeping this as static class for this iteration. Will extract to separate file and generalize Review comment: This first sentence comment seems to refer specifically to the PR. I suggest removing it as the rest of the comment is sufficiently descriptive. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172692) Time Spent: 4.5h (was: 4h 20m) > Utilize MetricInfo for reporting user metrics in Portable Dataflow Java > Runner. > --- > > Key: BEAM-6181 > URL: https://issues.apache.org/jira/browse/BEAM-6181 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > New approach to report metrics in FnApi is to utilize MetricInfo structures. > This approach is implemented in Python SDK and work is ongoing in Java SDK. > This tasks includes plumbing User metrics reported via MetricInfos through > Dataflow Java Runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6188) Create unbounded synthetic source
Lukasz Gajowy created BEAM-6188: --- Summary: Create unbounded synthetic source Key: BEAM-6188 URL: https://issues.apache.org/jira/browse/BEAM-6188 Project: Beam Issue Type: Bug Components: testing Reporter: Lukasz Gajowy Assignee: Lukasz Gajowy It is needed for streaming scenarios. It should provide ways to reason about time and recovering from checkpoints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6178) Create BOM for Beam
[ https://issues.apache.org/jira/browse/BEAM-6178?focusedWorklogId=172674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172674 ] ASF GitHub Bot logged work on BEAM-6178: Author: ASF GitHub Bot Created on: 06/Dec/18 16:22 Start Date: 06/Dec/18 16:22 Worklog Time Spent: 10m Work Description: garrettjonesgoogle commented on issue #7197: [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature URL: https://github.com/apache/beam/pull/7197#issuecomment-444931932 R: @swegner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172674) Time Spent: 20m (was: 10m) > Create BOM for Beam > --- > > Key: BEAM-6178 > URL: https://issues.apache.org/jira/browse/BEAM-6178 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.10.0 >Reporter: Garrett Jones >Assignee: Garrett Jones >Priority: Critical > Time Spent: 20m > Remaining Estimate: 0h > > Add a module to Beam which generates a BOM for Beam modules. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5321) Finish Python 3 porting for transforms module
[ https://issues.apache.org/jira/browse/BEAM-5321?focusedWorklogId=172675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172675 ] ASF GitHub Bot logged work on BEAM-5321: Author: ASF GitHub Bot Created on: 06/Dec/18 16:23 Start Date: 06/Dec/18 16:23 Worklog Time Spent: 10m Work Description: RobbeSneyders commented on issue #7104: [BEAM-5321] Port transforms package to Python 3 URL: https://github.com/apache/beam/pull/7104#issuecomment-444932183 PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172675) Time Spent: 2h 40m (was: 2.5h) > Finish Python 3 porting for transforms module > - > > Key: BEAM-5321 > URL: https://issues.apache.org/jira/browse/BEAM-5321 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6187) Drop Scala suffix of FlinkRunner artifacts
Maximilian Michels created BEAM-6187: Summary: Drop Scala suffix of FlinkRunner artifacts Key: BEAM-6187 URL: https://issues.apache.org/jira/browse/BEAM-6187 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 2.10.0 With BEAM-5419 we will build multiple versions of the Flink Runner against different Flink versions. The new artifacts will lead to confusing names like {{beam-runners-flink1.5_2.11}}. I think it is time to drop the Scala suffix and just build against the most stable Flink Scala version. Projects like Scio have the option to cross-compile to different Scala versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6187) Drop Scala suffix of FlinkRunner artifacts
[ https://issues.apache.org/jira/browse/BEAM-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711654#comment-16711654 ] Maximilian Michels commented on BEAM-6187: -- What do you think [~thw]? > Drop Scala suffix of FlinkRunner artifacts > -- > > Key: BEAM-6187 > URL: https://issues.apache.org/jira/browse/BEAM-6187 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.10.0 > > > With BEAM-5419 we will build multiple versions of the Flink Runner against > different Flink versions. The new artifacts will lead to confusing names like > {{beam-runners-flink1.5_2.11}}. I think it is time to drop the Scala suffix > and just build against the most stable Flink Scala version. > Projects like Scio have the option to cross-compile to different Scala > versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6185) Upgrade to Spark 2.4.0
[ https://issues.apache.org/jira/browse/BEAM-6185?focusedWorklogId=172673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172673 ] ASF GitHub Bot logged work on BEAM-6185: Author: ASF GitHub Bot Created on: 06/Dec/18 16:18 Start Date: 06/Dec/18 16:18 Worklog Time Spent: 10m Work Description: iemejia commented on issue #7216: [BEAM-6185] Upgrade to Spark 2.4.0 URL: https://github.com/apache/beam/pull/7216#issuecomment-444930062 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172673) Time Spent: 20m (was: 10m) > Upgrade to Spark 2.4.0 > -- > > Key: BEAM-6185 > URL: https://issues.apache.org/jira/browse/BEAM-6185 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: 2.10.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5410) Fail SpannerIO early for unsupported streaming mode
[ https://issues.apache.org/jira/browse/BEAM-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niel Markwick resolved BEAM-5410. - Resolution: Fixed Fix Version/s: 2.9.0 Fixed by BEAM-4796 in 2.9.0 > Fail SpannerIO early for unsupported streaming mode > --- > > Key: BEAM-5410 > URL: https://issues.apache.org/jira/browse/BEAM-5410 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.8.0 >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath >Priority: Major > Fix For: 2.9.0 > > > Currently SpannerIO does not support streaming mode. We should fail with a > clear error till this is fixed and also update documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6186) Cleanup FnApiRunner optimization phases.
Robert Bradshaw created BEAM-6186: - Summary: Cleanup FnApiRunner optimization phases. Key: BEAM-6186 URL: https://issues.apache.org/jira/browse/BEAM-6186 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Robert Bradshaw Assignee: Ahmet Altay They are currently expressed as functions with closure. It would be good to pull them out with explicit dependencies both to better be able to follow the code, and also be able to test and reuse them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5859) Improve Traceability of Pipeline translation
[ https://issues.apache.org/jira/browse/BEAM-5859?focusedWorklogId=172650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172650 ] ASF GitHub Bot logged work on BEAM-5859: Author: ASF GitHub Bot Created on: 06/Dec/18 14:17 Start Date: 06/Dec/18 14:17 Worklog Time Spent: 10m Work Description: mxm closed pull request #7208: [BEAM-5859] Better handle fused composite stage names. URL: https://github.com/apache/beam/pull/7208 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java index 09d1a991c279..4fe4a0f3669e 100644 --- a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java +++ b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ExecutableStageTranslation.java @@ -19,7 +19,15 @@ import static com.google.common.base.Preconditions.checkArgument; +import com.google.common.base.Joiner; +import com.google.common.collect.Iterables; +import com.google.common.collect.LinkedHashMultimap; +import com.google.common.collect.Multimap; import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Map; +import java.util.stream.Collectors; import org.apache.beam.model.pipeline.v1.RunnerApi; import org.apache.beam.model.pipeline.v1.RunnerApi.ExecutableStagePayload; import org.apache.beam.runners.core.construction.graph.ExecutableStage; @@ -45,19 +53,80 @@ public static String generateNameFromStagePayload(ExecutableStagePayload stagePa RunnerApi.Components components = stagePayload.getComponents(); final int transformsCount = stagePayload.getTransformsCount(); sb.append("[").append(transformsCount).append("]"); -sb.append("{"); +Collection names = new ArrayList<>(); for (int i = 0; i < transformsCount; i++) { String name = components.getTransformsOrThrow(stagePayload.getTransforms(i)).getUniqueName(); - // Python: Remove the 'ref_AppliedPTransform_' prefix which just makes the name longer - name = name.replaceFirst("^ref_AppliedPTransform_", ""); // Java: Remove the 'ParMultiDo(Anonymous)' suffix which just makes the name longer name = name.replaceFirst("/ParMultiDo\\(Anonymous\\)$", ""); - sb.append(name); - if (i + 1 < transformsCount) { -sb.append(", "); - } + names.add(name); } -sb.append("}"); +sb.append(generateNameFromTransformNames(names, true)); return sb.toString(); } + + /** + * Creates a human-readable name for a set of stage names that occur in a single stage. + * + * This name reflects the nested structure of the stages, as inferred by slashes in the stage + * names. Sibling stages will be listed as {A, B}, nested stages as A/B, and according to the + * value of truncateSiblingComposites the nesting stops at the first level that siblings are + * encountered. + * + * This is best understood via examples, of which there are several in the tests for this + * class. + * + * @param names a list of full stage names in this fused operation + * @param truncateSiblingComposites whether to recursively descent into composite operations that + * have simblings, or stop the recursion at that level. + * @return a single string representation of all the stages in this fused operation + */ + public static String generateNameFromTransformNames( + Collection names, boolean truncateSiblingComposites) { +Multimap groupByOuter = LinkedHashMultimap.create(); +for (String name : names) { + int index = name.indexOf('/'); + if (index == -1) { +groupByOuter.put(name, ""); + } else { +groupByOuter.put(name.substring(0, index), name.substring(index + 1)); + } +} +if (groupByOuter.keySet().size() == 1) { + Map.Entry> outer = + Iterables.getOnlyElement(groupByOuter.asMap().entrySet()); + if (outer.getValue().size() == 1 && outer.getValue().contains("")) { +// Names consisted of a single name without any slashes. +return outer.getKey(); + } else { +// Everything is in the same outer stage, enumerate at one level down. +return String.format( +"%s/%s", +outer.getKey(), +generateNameFromTransformNames(outer.getValue(), truncateSiblingComposites)); + } +} else { + Collection parts; +
[jira] [Work logged] (BEAM-6167) Create a Class to read content of a file keeping track of the file path (python)
[ https://issues.apache.org/jira/browse/BEAM-6167?focusedWorklogId=172646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172646 ] ASF GitHub Bot logged work on BEAM-6167: Author: ASF GitHub Bot Created on: 06/Dec/18 14:02 Start Date: 06/Dec/18 14:02 Worklog Time Spent: 10m Work Description: robertwb commented on issue #7193: [BEAM-6167] Add class ReadFromTextWithFilename (Python) URL: https://github.com/apache/beam/pull/7193#issuecomment-444880407 Thanks, looks good modulo some precommit issues: ``` $ tox -e docs,py27-lint ... 12:09:32 /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/io/textio.py:docstring of apache_beam.io.textio.ReadFromTextWithFilename:1: WARNING: py:class reference target not found: apache_beam.io.ReadFromText ... 12:10:32 C: 41, 0: Line too long (88/80) (line-too-long) 12:10:32 C:325, 0: Line too long (94/80) (line-too-long) 12:10:32 C:543, 0: Line too long (88/80) (line-too-long) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172646) Time Spent: 1h 20m (was: 1h 10m) > Create a Class to read content of a file keeping track of the file path > (python) > > > Key: BEAM-6167 > URL: https://issues.apache.org/jira/browse/BEAM-6167 > Project: Beam > Issue Type: Improvement > Components: io-ideas >Affects Versions: 2.8.0 >Reporter: Lorenzo Caggioni >Assignee: Eugene Kirpichov >Priority: Minor > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Add a class to read content of a file keeping track of the file path each > element come from. > This is an improvement of the current python/apache_beam/io/textio.py -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-3912) Add batching support for HadoopOutputFormatIO
[ https://issues.apache.org/jira/browse/BEAM-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Romanenko resolved BEAM-3912. Resolution: Fixed Fix Version/s: 2.10.0 > Add batching support for HadoopOutputFormatIO > - > > Key: BEAM-3912 > URL: https://issues.apache.org/jira/browse/BEAM-3912 > Project: Beam > Issue Type: Sub-task > Components: io-java-hadoop >Reporter: Alexey Romanenko >Assignee: Alexey Romanenko >Priority: Minor > Fix For: 2.10.0 > > Time Spent: 10h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-1533) Switch split/splitBasedOnRegions to use Beam’s Byte classes for HBaseIO
[ https://issues.apache.org/jira/browse/BEAM-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-1533. Resolution: Fixed Fix Version/s: Not applicable > Switch split/splitBasedOnRegions to use Beam’s Byte classes for HBaseIO > --- > > Key: BEAM-1533 > URL: https://issues.apache.org/jira/browse/BEAM-1533 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > Labels: newbie, starter > Fix For: Not applicable > > > Current implementation uses classic java's byte array manipulation, this is > error-prone, it could benefit of using the existing ByteKey, ByteKeyRange, > ByteKeyRangeTracker classes from the SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-1533) Switch split/splitBasedOnRegions to use Beam’s Byte classes for HBaseIO
[ https://issues.apache.org/jira/browse/BEAM-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711467#comment-16711467 ] Ismaël Mejía commented on BEAM-1533: Yes you are right, split is now based on ByteKey, closing it for now, thanks. > Switch split/splitBasedOnRegions to use Beam’s Byte classes for HBaseIO > --- > > Key: BEAM-1533 > URL: https://issues.apache.org/jira/browse/BEAM-1533 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > Labels: newbie, starter > Fix For: Not applicable > > > Current implementation uses classic java's byte array manipulation, this is > error-prone, it could benefit of using the existing ByteKey, ByteKeyRange, > ByteKeyRangeTracker classes from the SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5817) Nexmark test of joining stream to files
[ https://issues.apache.org/jira/browse/BEAM-5817?focusedWorklogId=172639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172639 ] ASF GitHub Bot logged work on BEAM-5817: Author: ASF GitHub Bot Created on: 06/Dec/18 13:35 Start Date: 06/Dec/18 13:35 Worklog Time Spent: 10m Work Description: echauchot commented on issue #7205: [BEAM-5817] Add SQL bounded side input join to queries that are actually run URL: https://github.com/apache/beam/pull/7205#issuecomment-444872395 @kennknowles sorry I did not have time to take a look before merge, I'm a git busy en spark runner right now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172639) Time Spent: 8h 20m (was: 8h 10m) > Nexmark test of joining stream to files > --- > > Key: BEAM-5817 > URL: https://issues.apache.org/jira/browse/BEAM-5817 > Project: Beam > Issue Type: New Feature > Components: examples-nexmark >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.10.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Nexmark is a convenient framework for testing the use case of large scale > stream enrichment. One way is joining a stream to files, and it can be tested > via any source that Nexmark supports. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6184) PortableRunner dependency missed in wordcount example maven artifact
[ https://issues.apache.org/jira/browse/BEAM-6184?focusedWorklogId=172644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-172644 ] ASF GitHub Bot logged work on BEAM-6184: Author: ASF GitHub Bot Created on: 06/Dec/18 13:42 Start Date: 06/Dec/18 13:42 Worklog Time Spent: 10m Work Description: robertwb closed pull request #7213: [BEAM-6184] Add portable-runner dependency to example pom.xml URL: https://github.com/apache/beam/pull/7213 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml index dcd9748311a9..5f1ba8b3490e 100644 --- a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml +++ b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml @@ -167,6 +167,22 @@ + + portable-runner + +true + + + + + org.apache.beam + beam-runners-reference-java + ${beam.version} + runtime + + + + apex-runner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 172644) Time Spent: 0.5h (was: 20m) > PortableRunner dependency missed in wordcount example maven artifact > > > Key: BEAM-6184 > URL: https://issues.apache.org/jira/browse/BEAM-6184 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > > > more context: > https://lists.apache.org/thread.html/8dd60395424425f7502d62888c49014430d1d3b06c026606f3db28ab@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)