[jira] [Commented] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake
[ https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643939#comment-16643939 ] Scott Wegner commented on BEAM-5683: This failed again today: https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1303/ > [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to > pip download flake > -- > > Key: BEAM-5683 > URL: https://issues.apache.org/jira/browse/BEAM-5683 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Scott Wegner >Assignee: Ankur Goenka >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/] > * [Gradle Build > Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests] > * [Test source > code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390] > Initial investigation: > Seems to be failing on pip download. > == > ERROR: test_multiple_empty_outputs > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py", > line 277, in test_multiple_empty_outputs > pipeline.run() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 104, in run > result = super(TestPipeline, self).run(test_runner_api) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 403, in run > self.to_runner_api(), self.runner, self._options).run(False) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 416, in run > return self.runner.run_pipeline(self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 50, in run_pipeline > self.result = super(TestDataflowRunner, self).run_pipeline(pipeline) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", > line 389, in run_pipeline > self.dataflow_client.create_job(self.job), self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py", > line 184, in wrapper > return fun(*args, **kwargs) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 490, in create_job > self.create_job_description(job) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 519, in create_job_description > resources = self._stage_resour > ces(job.options) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 452, in _stage_resources > staging_location=google_cloud_options.staging_location) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 161, in stage_job_resources > requirements_cache_path) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 411, in _populate_requirements_cache > processes.check_call(cmd_args) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py", > line 46, in check_call > return subprocess.check_call(*args, **kwargs) > File "/usr/lib/python2.7/subprocess.py", line 541, in check_call &g
[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152848 ] ASF GitHub Bot logged work on BEAM-5326: Author: ASF GitHub Bot Created on: 09/Oct/18 18:37 Start Date: 09/Oct/18 18:37 Worklog Time Spent: 10m Work Description: herohde closed pull request #6615: [BEAM-5326] Shim main class and fix Go artifact naming mismatch for c… URL: https://github.com/apache/beam/pull/6615 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java b/runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java new file mode 100644 index 000..4aca8dd567f --- /dev/null +++ b/runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.google.cloud.dataflow.worker; + +/** Temporary redirect for org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness. */ +public class DataflowRunnerHarness { + public static void main(String[] args) throws Exception { +org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(args); + } +} diff --git a/sdks/go/pkg/beam/runners/dataflow/dataflow.go b/sdks/go/pkg/beam/runners/dataflow/dataflow.go index 293bab2b7a8..9c56dbc66c1 100644 --- a/sdks/go/pkg/beam/runners/dataflow/dataflow.go +++ b/sdks/go/pkg/beam/runners/dataflow/dataflow.go @@ -149,10 +149,12 @@ func Execute(ctx context.Context, p *beam.Pipeline) error { return fmt.Errorf("failed to generate model pipeline: %v", err) } - id := atomic.AddInt32(&unique, 1) - modelURL := gcsx.Join(*stagingLocation, fmt.Sprintf("model-%v-%v", id, time.Now().UnixNano())) - workerURL := gcsx.Join(*stagingLocation, fmt.Sprintf("worker-%v-%v", id, time.Now().UnixNano())) - jarURL := gcsx.Join(*stagingLocation, fmt.Sprintf("dataflow-worker-%v-%v.jar", id, time.Now().UnixNano())) + // NOTE(herohde) 10/8/2018: the last segment of the names must be "worker" and "dataflow-worker.jar". + id := fmt.Sprintf("go-%v-%v", atomic.AddInt32(&unique, 1), time.Now().UnixNano()) + + modelURL := gcsx.Join(*stagingLocation, id, "model") + workerURL := gcsx.Join(*stagingLocation, id, "worker") + jarURL := gcsx.Join(*stagingLocation, id, "dataflow-worker.jar") if *dryRun { log.Info(ctx, "Dry-run: not submitting job!") diff --git a/sdks/go/pkg/beam/util/gcsx/gcs.go b/sdks/go/pkg/beam/util/gcsx/gcs.go index dd9470fa7a4..c426988f633 100644 --- a/sdks/go/pkg/beam/util/gcsx/gcs.go +++ b/sdks/go/pkg/beam/util/gcsx/gcs.go @@ -142,10 +142,10 @@ func ParseObject(object string) (bucket, path string, err error) { // Join joins a GCS path with an element. Preserves // the gs:// prefix. -func Join(object string, elm string) string { +func Join(object string, elms ...string) string { bucket, prefix, err := ParseObject(object) if err != nil { panic(err) } - return MakeObject(bucket, path.Join(prefix, elm)) + return MakeObject(bucket, path.Join(prefix, path.Join(elms...))) } This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Tim
[jira] [Work logged] (BEAM-5660) Add dataflow java worker unit tests into precommit
[ https://issues.apache.org/jira/browse/BEAM-5660?focusedWorklogId=152847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152847 ] ASF GitHub Bot logged work on BEAM-5660: Author: ASF GitHub Bot Created on: 09/Oct/18 18:36 Start Date: 09/Oct/18 18:36 Worklog Time Spent: 10m Work Description: herohde closed pull request #6613: [BEAM-5660] Add both dataflow legacy worker and fn-api worker into JavaPreCommit URL: https://github.com/apache/beam/pull/6613 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/build.gradle b/build.gradle index 63ee9d4d044..cc98ac9110f 100644 --- a/build.gradle +++ b/build.gradle @@ -187,6 +187,8 @@ task javaPreCommit() { dependsOn ":beam-model-pipeline:build" dependsOn ":beam-model-job-management:build" dependsOn ":beam-model-fn-execution:build" + dependsOn ":beam-runners-google-cloud-dataflow-java-legacy-worker:build" + dependsOn ":beam-runners-google-cloud-dataflow-java-fn-api-worker:build" dependsOn ":beam-sdks-java-core:buildNeeded" dependsOn ":beam-sdks-java-core:buildDependents" dependsOn ":beam-examples-java:preCommit" diff --git a/runners/google-cloud-dataflow-java/worker/build.gradle b/runners/google-cloud-dataflow-java/worker/build.gradle index d28b162ce2c..3ec6df1e265 100644 --- a/runners/google-cloud-dataflow-java/worker/build.gradle +++ b/runners/google-cloud-dataflow-java/worker/build.gradle @@ -39,7 +39,7 @@ def DATAFLOW_VERSION = "dataflow.version" // Use -PisLegacyWorker in Gradle command if build legacy worker, otherwise, // FnAPI worker is considered as default. def is_legacy_worker = { - return project.hasProperty("isLegacyWorker") + return project.name.contains("dataflow-java-legacy-worker") } // Get full dependency of 'com.google.apis:google-api-services-dataflow' @@ -99,9 +99,6 @@ applyJavaNature(validateShadowJar: false, shadowClosure: DEFAULT_SHADOW_CLOSURE } } - // Archive name pattern: ${name}-${appendix}-${version}-${classifier}.jar - appendix = is_legacy_worker() ? "legacy-bundled" : "fnapi-bundled" - // Include original source files extracted under // '$buildDir/original_sources_to_package' to jar from "$buildDir/original_sources_to_package" diff --git a/settings.gradle b/settings.gradle index dfdc5577173..9f99894b34a 100644 --- a/settings.gradle +++ b/settings.gradle @@ -178,7 +178,9 @@ include "beam-vendor-sdks-java-extensions-protobuf" project(":beam-vendor-sdks-java-extensions-protobuf").dir = file("vendor/sdks-java-extensions-protobuf") include "beam-website" project(":beam-website").dir = file("website") -include "beam-runners-google-cloud-dataflow-java-worker" -project(":beam-runners-google-cloud-dataflow-java-worker").dir = file("runners/google-cloud-dataflow-java/worker") +include "beam-runners-google-cloud-dataflow-java-legacy-worker" +project(":beam-runners-google-cloud-dataflow-java-legacy-worker").dir = file("runners/google-cloud-dataflow-java/worker") +include "beam-runners-google-cloud-dataflow-java-fn-api-worker" +project(":beam-runners-google-cloud-dataflow-java-fn-api-worker").dir = file("runners/google-cloud-dataflow-java/worker") include "beam-runners-google-cloud-dataflow-java-windmill" project(":beam-runners-google-cloud-dataflow-java-windmill").dir = file("runners/google-cloud-dataflow-java/worker/windmill") \ No newline at end of file This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152847) Time Spent: 40m (was: 0.5h) > Add dataflow java worker unit tests into precommit > -- > > Key: BEAM-5660 > URL: https://issues.apache.org/jira/browse/BEAM-5660 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152845&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152845 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 18:31 Start Date: 09/Oct/18 18:31 Worklog Time Spent: 10m Work Description: tweise commented on issue #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#issuecomment-428301218 It is a separate environment factory: https://github.com/lyft/beam/blob/release-2.8.0-lyft/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java#L47 Alternatively it would need to be part of the default process environment factory with a parameter to control the python specific execution. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152845) Time Spent: 6.5h (was: 6h 20m) > Python Flink ValidatesRunner job fixes > -- > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 6.5h > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643919#comment-16643919 ] Reuven Lax commented on BEAM-5514: -- Is the problem simply that ApiErrorExtractor doesn't see quotaExceeded as a rate limit error? Appears that it currently looks for either rateLimitExceeded or userRateLimitExceeded. > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Reuven Lax >Priority: Major > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5685) TopWikipediaSessionsIT is flaky
[ https://issues.apache.org/jira/browse/BEAM-5685?focusedWorklogId=152844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152844 ] ASF GitHub Bot logged work on BEAM-5685: Author: ASF GitHub Bot Created on: 09/Oct/18 18:27 Start Date: 09/Oct/18 18:27 Worklog Time Spent: 10m Work Description: pabloem closed pull request #6611: [BEAM-5685] Improving comparator for TopWikipediaSessions URL: https://github.com/apache/beam/pull/6611 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java b/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java index b33bbc00732..5eb747fe48f 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/TopWikipediaSessions.java @@ -18,6 +18,7 @@ package org.apache.beam.examples.complete; import com.google.api.services.bigquery.model.TableRow; +import com.google.common.collect.ComparisonChain; import java.io.IOException; import java.math.BigDecimal; import java.util.List; @@ -112,7 +113,11 @@ public void processElement(ProcessContext c) { @Override public PCollection>> expand(PCollection> sessions) { SerializableComparator> comparator = - (o1, o2) -> Long.compare(o1.getValue(), o2.getValue()); + (o1, o2) -> + ComparisonChain.start() + .compare(o1.getValue(), o2.getValue()) + .compare(o1.getKey(), o2.getKey()) + .result(); return sessions .apply(Window.into(CalendarWindows.months(1))) .apply(Top.of(1, comparator).withoutDefaults()); diff --git a/examples/java/src/test/java/org/apache/beam/examples/complete/TopWikipediaSessionsIT.java b/examples/java/src/test/java/org/apache/beam/examples/complete/TopWikipediaSessionsIT.java index 1f278e7f7a7..aa32b485abe 100644 --- a/examples/java/src/test/java/org/apache/beam/examples/complete/TopWikipediaSessionsIT.java +++ b/examples/java/src/test/java/org/apache/beam/examples/complete/TopWikipediaSessionsIT.java @@ -36,7 +36,7 @@ private static final String DEFAULT_INPUT_10_FILES = "gs://apache-beam-samples/wikipedia_edits/wiki_data-000*.json"; - private static final String DEFAULT_OUTPUT_CHECKSUM = "a7f0c50b895d0a2e37b78c3f94eadcfb11a647a6"; + private static final String DEFAULT_OUTPUT_CHECKSUM = "61262b08503338bfe4e36b0791958d65e6070933"; /** PipelineOptions for the TopWikipediaSessions integration test. */ public interface TopWikipediaSessionsITOptions This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152844) Time Spent: 0.5h (was: 20m) > TopWikipediaSessionsIT is flaky > --- > > Key: BEAM-5685 > URL: https://issues.apache.org/jira/browse/BEAM-5685 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5685) TopWikipediaSessionsIT is flaky
[ https://issues.apache.org/jira/browse/BEAM-5685?focusedWorklogId=152843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152843 ] ASF GitHub Bot logged work on BEAM-5685: Author: ASF GitHub Bot Created on: 09/Oct/18 18:26 Start Date: 09/Oct/18 18:26 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #6611: [BEAM-5685] Improving comparator for TopWikipediaSessions URL: https://github.com/apache/beam/pull/6611#issuecomment-428299302 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152843) Time Spent: 20m (was: 10m) > TopWikipediaSessionsIT is flaky > --- > > Key: BEAM-5685 > URL: https://issues.apache.org/jira/browse/BEAM-5685 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5630) supplement Bigquery Read IT test cases and blacklist them in post-commit
[ https://issues.apache.org/jira/browse/BEAM-5630?focusedWorklogId=152842&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152842 ] ASF GitHub Bot logged work on BEAM-5630: Author: ASF GitHub Bot Created on: 09/Oct/18 18:23 Start Date: 09/Oct/18 18:23 Worklog Time Spent: 10m Work Description: yifanzou commented on a change in pull request #6559: [BEAM-5630] add more tests into BigQueryIOReadIT URL: https://github.com/apache/beam/pull/6559#discussion_r223812009 ## File path: sdks/java/io/google-cloud-platform/build.gradle ## @@ -97,6 +97,7 @@ task integrationTest(type: Test) { outputs.upToDateWhen { false } include '**/*IT.class' + exclude '**/BigQueryIOReadIT.class' Review comment: To be more specific, this test will be skipped when jobs running against the DirectRunner since limited JVM heap. The entire class will still run against the DataflowRunner in the job [:beam-runners-google-cloud-dataflow-java:googleCloudPlatformIntegrationTest](https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/build.gradle#L152). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152842) Time Spent: 3.5h (was: 3h 20m) > supplement Bigquery Read IT test cases and blacklist them in post-commit > > > Key: BEAM-5630 > URL: https://issues.apache.org/jira/browse/BEAM-5630 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: yifan zou >Assignee: yifan zou >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5630) supplement Bigquery Read IT test cases and blacklist them in post-commit
[ https://issues.apache.org/jira/browse/BEAM-5630?focusedWorklogId=152841&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152841 ] ASF GitHub Bot logged work on BEAM-5630: Author: ASF GitHub Bot Created on: 09/Oct/18 18:23 Start Date: 09/Oct/18 18:23 Worklog Time Spent: 10m Work Description: yifanzou commented on a change in pull request #6559: [BEAM-5630] add more tests into BigQueryIOReadIT URL: https://github.com/apache/beam/pull/6559#discussion_r223812009 ## File path: sdks/java/io/google-cloud-platform/build.gradle ## @@ -97,6 +97,7 @@ task integrationTest(type: Test) { outputs.upToDateWhen { false } include '**/*IT.class' + exclude '**/BigQueryIOReadIT.class' Review comment: To be more specific, this test will be skip when jobs running against the DirectRunner since limited JVM heap. The entire class will still run against the DataflowRunner in the job [:beam-runners-google-cloud-dataflow-java:googleCloudPlatformIntegrationTest](https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/build.gradle#L152). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152841) Time Spent: 3h 20m (was: 3h 10m) > supplement Bigquery Read IT test cases and blacklist them in post-commit > > > Key: BEAM-5630 > URL: https://issues.apache.org/jira/browse/BEAM-5630 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: yifan zou >Assignee: yifan zou >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()
[ https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643903#comment-16643903 ] Tim Robertson commented on BEAM-5036: - {quote}Should this be marked as a blocker for 2.8.0 ? PR is still in review. {quote} I don't think this can block 2.8.0 for time constraints. The [PR (now closed)|https://github.com/apache/beam/pull/6289] caused a lot of discussion but was not suitable for merging. I suggest we modify {{HDFSFileSystem}} to always overwrite (i.e. move the necessary bits from the [existing PR|https://github.com/apache/beam/pull/6289] into the HDFS implementation only) and then the solution will be simpler and the change can be made to {{rename()}}. > Optimize FileBasedSink's WriteOperation.moveToOutput() > -- > > Key: BEAM-5036 > URL: https://issues.apache.org/jira/browse/BEAM-5036 > Project: Beam > Issue Type: Improvement > Components: io-java-files >Affects Versions: 2.5.0 >Reporter: Jozef Vilcek >Assignee: Tim Robertson >Priority: Major > Time Spent: 11h > Remaining Estimate: 0h > > moveToOutput() methods in FileBasedSink.WriteOperation implements move by > copy+delete. It would be better to use a rename() which can be much more > effective for some filesystems. > Filesystem must support cross-directory rename. BEAM-4861 is related to this > for the case of HDFS filesystem. > Feature was discussed here: > http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152840&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152840 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 18:19 Start Date: 09/Oct/18 18:19 Worklog Time Spent: 10m Work Description: angoenka commented on issue #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#issuecomment-428296935 I was hopping to fit it via process environment factory. Does it need a separate environment factory to be registered? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152840) Time Spent: 6h 20m (was: 6h 10m) > Python Flink ValidatesRunner job fixes > -- > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 6h 20m > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152839&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152839 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 18:17 Start Date: 09/Oct/18 18:17 Worklog Time Spent: 10m Work Description: tweise commented on issue #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#issuecomment-428296179 > Reusing it would be better. Can we get that code to master? Of course. We just need to agree how we want to enable the environment provider/factory. Probably best as separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152839) Time Spent: 6h 10m (was: 6h) > Python Flink ValidatesRunner job fixes > -- > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 6h 10m > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5456) Update google-api-client libraries to 1.25
[ https://issues.apache.org/jira/browse/BEAM-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath updated BEAM-5456: - Fix Version/s: (was: 2.8.0) 2.9.0 > Update google-api-client libraries to 1.25 > -- > > Key: BEAM-5456 > URL: https://issues.apache.org/jira/browse/BEAM-5456 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath >Priority: Blocker > Fix For: 2.9.0 > > > This version updates authentication URLs > ([https://github.com/googleapis/google-api-java-client/releases)] that is > needed for certain features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5456) Update google-api-client libraries to 1.25
[ https://issues.apache.org/jira/browse/BEAM-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643879#comment-16643879 ] Chamikara Jayalath commented on BEAM-5456: -- Moved to 2.9.0. > Update google-api-client libraries to 1.25 > -- > > Key: BEAM-5456 > URL: https://issues.apache.org/jira/browse/BEAM-5456 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath >Priority: Blocker > Fix For: 2.9.0 > > > This version updates authentication URLs > ([https://github.com/googleapis/google-api-java-client/releases)] that is > needed for certain features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5427) Fix sample code (AverageFn) in Combine.java
[ https://issues.apache.org/jira/browse/BEAM-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruoyun Huang resolved BEAM-5427. Resolution: Fixed Fix Version/s: 2.8.0 > Fix sample code (AverageFn) in Combine.java > --- > > Key: BEAM-5427 > URL: https://issues.apache.org/jira/browse/BEAM-5427 > Project: Beam > Issue Type: Improvement > Components: examples-java >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Fix For: 2.8.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Sample code missing coder. > In its current state, job run fails with Coder missing error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152835 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 18:11 Start Date: 09/Oct/18 18:11 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#discussion_r223807159 ## File path: sdks/python/build.gradle ## @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') { } } } + +project.task('createProcessWorker') { + dependsOn ':beam-sdks-python-container:build' + dependsOn 'setupVirtualenv' + def outputFile = file("${project.buildDir}/sdk_worker.sh") + def workerScript = "${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot" + def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \"" Review comment: Makes sense This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152835) Time Spent: 5h 40m (was: 5.5h) > Python Flink ValidatesRunner job fixes > ------ > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 5h 40m > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152834&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152834 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 18:11 Start Date: 09/Oct/18 18:11 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#discussion_r223806288 ## File path: .test-infra/jenkins/job_PostCommit_Python_ValidatesRunner_Flink.groovy ## @@ -33,8 +33,8 @@ PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python_VR_Flink', steps { gradle { rootBuildScriptDir(commonJobProperties.checkoutDir) - tasks(':beam-sdks-python:flinkCompatibilityMatrixBatch') - tasks(':beam-sdks-python:flinkCompatibilityMatrixStreaming') + tasks(':beam-sdks-python:flinkCompatibilityMatrixBatchProcess') Review comment: Makes sense. And will also move the sequence to gradle task This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152834) Time Spent: 5.5h (was: 5h 20m) > Python Flink ValidatesRunner job fixes > -- > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 5.5h > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152836 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 18:11 Start Date: 09/Oct/18 18:11 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#discussion_r223806349 ## File path: sdks/python/apache_beam/runners/portability/flink_runner_test.py ## @@ -34,12 +36,26 @@ # Run as # # python -m apache_beam.runners.portability.flink_runner_test \ - # /path/to/job_server.jar \ + # --flink_job_server_jar=/path/to/job_server.jar \ + # --type=Batch \ + # --harness_type=docker \ # [FlinkRunnerTest.test_method, ...] - flinkJobServerJar = sys.argv.pop(1) - streaming = sys.argv.pop(1).lower() == 'streaming' - # This is defined here to only be run when we invoke this file explicitly. + parser = argparse.ArgumentParser(add_help=True) + parser.add_argument('--flink_job_server_jar', + help='Job server jar to submit jobs.') + parser.add_argument('--type', default='batch', + help='Job type. batch or streaming') Review comment: sure This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152836) > Python Flink ValidatesRunner job fixes > -- > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 5h 40m > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152838&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152838 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 18:11 Start Date: 09/Oct/18 18:11 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#discussion_r223807072 ## File path: sdks/python/build.gradle ## @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') { } } } + +project.task('createProcessWorker') { + dependsOn ':beam-sdks-python-container:build' + dependsOn 'setupVirtualenv' + def outputFile = file("${project.buildDir}/sdk_worker.sh") Review comment: I will try it out This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152838) Time Spent: 6h (was: 5h 50m) > Python Flink ValidatesRunner job fixes > -- > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 6h > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152837 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 18:11 Start Date: 09/Oct/18 18:11 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#discussion_r223806539 ## File path: sdks/python/build.gradle ## @@ -327,24 +327,35 @@ task hdfsIntegrationTest(dependsOn: 'installGcpTest') { } } +class CompatibilityMatrixConfig { + String type Review comment: Will make it a boolean This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152837) Time Spent: 5h 50m (was: 5h 40m) > Python Flink ValidatesRunner job fixes > -- > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 5h 50m > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()
[ https://issues.apache.org/jira/browse/BEAM-5036?focusedWorklogId=152833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152833 ] ASF GitHub Bot logged work on BEAM-5036: Author: ASF GitHub Bot Created on: 09/Oct/18 18:09 Start Date: 09/Oct/18 18:09 Worklog Time Spent: 10m Work Description: timrobertson100 closed pull request #6289: [BEAM-5036] Optimize the FileBasedSink WriteOperation.moveToOutput() URL: https://github.com/apache/beam/pull/6289 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java index 92b2382e365..66a94ae087b 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java @@ -757,9 +757,14 @@ final void moveToOutputFiles( "Will copy temporary file {} to final location {}", entry.getKey(), entry.getValue()); } // During a failure case, files may have been deleted in an earlier step. Thus - // we ignore missing files here. - FileSystems.copy(srcFiles, dstFiles, StandardMoveOptions.IGNORE_MISSING_FILES); - removeTemporaryFiles(srcFiles); + // we ignore missing files here. It is possible that files already exist in the + // destination and we wish to replace them (e.g. a previous job run) + FileSystems.rename( + srcFiles, + dstFiles, + StandardMoveOptions.IGNORE_MISSING_FILES, + StandardMoveOptions.REPLACE_EXISTING); + removeTemporaryFiles(srcFiles); // removes temp folder if applicable } /** diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java index c2977d0527c..eb22a76f33e 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java @@ -36,6 +36,7 @@ import java.io.IOException; import java.nio.channels.ReadableByteChannel; import java.nio.channels.WritableByteChannel; +import java.nio.file.FileAlreadyExistsException; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; @@ -298,6 +299,12 @@ public static void copy( * * It doesn't support renaming globs. * + * If the underlying file system reports that a target file already exists and moveOptions + * contains {@code StandardMoveOptions.REPLACE_EXISTING} then all target files that existed prior + * to calling rename will be deleted and the rename retried. When a retry is attempted then + * missing files from the source will be ignored. Some filesystem implementations will always + * overwrite. + * * @param srcResourceIds the references of the source resources * @param destResourceIds the references of the destination resources */ @@ -310,10 +317,11 @@ public static void rename( return; } +Set options = Sets.newHashSet(moveOptions); + List srcToRename = srcResourceIds; List destToRename = destResourceIds; -if (Sets.newHashSet(moveOptions) -.contains(MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES)) { +if (options.contains(MoveOptions.StandardMoveOptions.IGNORE_MISSING_FILES)) { KV, List> existings = filterMissingFiles(srcResourceIds, destResourceIds); srcToRename = existings.getKey(); @@ -322,8 +330,71 @@ public static void rename( if (srcToRename.isEmpty()) { return; } -getFileSystemInternal(srcToRename.iterator().next().getScheme()) -.rename(srcToRename, destToRename); + +boolean replaceExisting = +options.contains(MoveOptions.StandardMoveOptions.REPLACE_EXISTING) ? true : false; +rename( +getFileSystemInternal(srcToRename.iterator().next().getScheme()), +srcToRename, +destToRename, +replaceExisting); + } + + /** + * Executes a rename of the src which all must exist using the provided filesystem. + * + * If replaceExisting is enabled and filesystem throws {code FileAlreadyExistsException} then + * an attempt to delete the destination is made and the rename is retried. Some filesystem + * implementations may apply this automatically without throwing. + * + * @param fileSystem The filesystem in use + * @param srcResourceIds The source resources to move + * @param destResourceIds The destinations for the sources to move to (must be same length
[jira] [Updated] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()
[ https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Robertson updated BEAM-5036: Fix Version/s: (was: 2.8.0) > Optimize FileBasedSink's WriteOperation.moveToOutput() > -- > > Key: BEAM-5036 > URL: https://issues.apache.org/jira/browse/BEAM-5036 > Project: Beam > Issue Type: Improvement > Components: io-java-files >Affects Versions: 2.5.0 >Reporter: Jozef Vilcek >Assignee: Tim Robertson >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > moveToOutput() methods in FileBasedSink.WriteOperation implements move by > copy+delete. It would be better to use a rename() which can be much more > effective for some filesystems. > Filesystem must support cross-directory rename. BEAM-4861 is related to this > for the case of HDFS filesystem. > Feature was discussed here: > http://mail-archives.apache.org/mod_mbox/beam-dev/201807.mbox/%3CCAF9t7_4Mp54pQ+vRrJrBh9Vx0=uaknupzd_qdh_qdm9vxll...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152831 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 09/Oct/18 18:07 Start Date: 09/Oct/18 18:07 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6590: [BEAM-5315] Partially port io URL: https://github.com/apache/beam/pull/6590#issuecomment-428292725 Looks like precommit failed due to a flaky test. I'll ask Jenkins to retest and try to reproduce the failure locally. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152831) Time Spent: 3.5h (was: 3h 20m) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Simon >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152832 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 09/Oct/18 18:07 Start Date: 09/Oct/18 18:07 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6590: [BEAM-5315] Partially port io URL: https://github.com/apache/beam/pull/6590#issuecomment-428292778 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152832) Time Spent: 3h 40m (was: 3.5h) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Simon >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5630) supplement Bigquery Read IT test cases and blacklist them in post-commit
[ https://issues.apache.org/jira/browse/BEAM-5630?focusedWorklogId=152830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152830 ] ASF GitHub Bot logged work on BEAM-5630: Author: ASF GitHub Bot Created on: 09/Oct/18 18:07 Start Date: 09/Oct/18 18:07 Worklog Time Spent: 10m Work Description: yifanzou commented on a change in pull request #6559: [BEAM-5630] add more tests into BigQueryIOReadIT URL: https://github.com/apache/beam/pull/6559#discussion_r223806946 ## File path: sdks/java/io/google-cloud-platform/build.gradle ## @@ -97,6 +97,7 @@ task integrationTest(type: Test) { outputs.upToDateWhen { false } include '**/*IT.class' + exclude '**/BigQueryIOReadIT.class' Review comment: https://discuss.gradle.org/t/how-to-exclude-a-single-test-method/20860 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152830) Time Spent: 3h 10m (was: 3h) > supplement Bigquery Read IT test cases and blacklist them in post-commit > > > Key: BEAM-5630 > URL: https://issues.apache.org/jira/browse/BEAM-5630 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: yifan zou >Assignee: yifan zou >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643834#comment-16643834 ] Chamikara Jayalath edited comment on BEAM-5514 at 10/9/18 6:06 PM: --- I'm trying to determine the priority at which this should be addressed. [~reuvenlax] any reason why we rely on workitems retries instead of retrying BQ streaming write requests with exponential backoff ? was (Author: chamikara): I'm trying to determine the priority at which this should be addressed. [~reuvenlax] any reason why do rely on workitems retries instead of retrying BQ streaming write requests with exponential backoff ? > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Reuven Lax >Priority: Major > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5630) supplement Bigquery Read IT test cases and blacklist them in post-commit
[ https://issues.apache.org/jira/browse/BEAM-5630?focusedWorklogId=152829&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152829 ] ASF GitHub Bot logged work on BEAM-5630: Author: ASF GitHub Bot Created on: 09/Oct/18 18:06 Start Date: 09/Oct/18 18:06 Worklog Time Spent: 10m Work Description: yifanzou commented on a change in pull request #6559: [BEAM-5630] add more tests into BigQueryIOReadIT URL: https://github.com/apache/beam/pull/6559#discussion_r223806777 ## File path: sdks/java/io/google-cloud-platform/build.gradle ## @@ -97,6 +97,7 @@ task integrationTest(type: Test) { outputs.upToDateWhen { false } include '**/*IT.class' + exclude '**/BigQueryIOReadIT.class' Review comment: Unfortunately no. The 'exclude' here only works on file pattern so that we could only exclude the entire class. I found an incubating Gradle [TestFilter](https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/testing/TestFilter.html) that is able to execute specific test methods. But it doesn't work with the 'exclude' to skip test methods. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152829) Time Spent: 3h (was: 2h 50m) > supplement Bigquery Read IT test cases and blacklist them in post-commit > > > Key: BEAM-5630 > URL: https://issues.apache.org/jira/browse/BEAM-5630 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: yifan zou >Assignee: yifan zou >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath reassigned BEAM-5514: Assignee: Reuven Lax (was: Chamikara Jayalath) > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Reuven Lax >Priority: Major > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5514) BigQueryIO doesn't handle quotaExceeded errors properly
[ https://issues.apache.org/jira/browse/BEAM-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643834#comment-16643834 ] Chamikara Jayalath commented on BEAM-5514: -- I'm trying to determine the priority at which this should be addressed. [~reuvenlax] any reason why do rely on workitems retries instead of retrying BQ streaming write requests with exponential backoff ? > BigQueryIO doesn't handle quotaExceeded errors properly > --- > > Key: BEAM-5514 > URL: https://issues.apache.org/jira/browse/BEAM-5514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Kevin Peterson >Assignee: Chamikara Jayalath >Priority: Major > > When exceeding a streaming quota for BigQuery insertAll requests, BigQuery > returns a 403 with reason "quotaExceeded". > The current implementation of BigQueryIO does not consider this to be a rate > limited exception, and therefore does not perform exponential backoff > properly, leading to repeated calls to BQ. > The actual error is in the > [ApiErrorExtractor|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L739] > class, which is called from > [BigQueryServicesImpl|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/util/src/main/java/com/google/cloud/hadoop/util/ApiErrorExtractor.java#L263] > to determine how to retry the failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5294) [beam_Release_Gradle_NightlySnapshot] Failing due to website test.
[ https://issues.apache.org/jira/browse/BEAM-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-5294. Resolution: Fixed Fix Version/s: Not applicable > [beam_Release_Gradle_NightlySnapshot] Failing due to website test. > -- > > Key: BEAM-5294 > URL: https://issues.apache.org/jira/browse/BEAM-5294 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Alan Myrvold >Priority: Major > Fix For: Not applicable > > > Build link: > [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/] > [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/185/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5522) beam_PostRelease_NightlySnapshot timed out
[ https://issues.apache.org/jira/browse/BEAM-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-5522. Resolution: Cannot Reproduce Fix Version/s: Not applicable > beam_PostRelease_NightlySnapshot timed out > -- > > Key: BEAM-5522 > URL: https://issues.apache.org/jira/browse/BEAM-5522 > Project: Beam > Issue Type: Bug > Components: test-failures, testing >Reporter: Ahmet Altay >Assignee: Alan Myrvold >Priority: Major > Fix For: Not applicable > > > https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/383/console -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5407) [beam_PostCommit_Go_GradleBuild][testE2ETopWikiPages][RolledBack] Breaks post commit
[ https://issues.apache.org/jira/browse/BEAM-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643833#comment-16643833 ] Scott Wegner commented on BEAM-5407: Is this resolved? Can this be closed? > [beam_PostCommit_Go_GradleBuild][testE2ETopWikiPages][RolledBack] Breaks post > commit > > > Key: BEAM-5407 > URL: https://issues.apache.org/jira/browse/BEAM-5407 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Batkhuyag Batsaikhan >Assignee: Pablo Estrada >Priority: Major > > Failing job url: > https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1482/testReport/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5534) AutocompleteIT and WikiTopSessionsIT fail in google testing
[ https://issues.apache.org/jira/browse/BEAM-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner closed BEAM-5534. -- Resolution: Fixed Fix Version/s: Not applicable > AutocompleteIT and WikiTopSessionsIT fail in google testing > --- > > Key: BEAM-5534 > URL: https://issues.apache.org/jira/browse/BEAM-5534 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5125) beam_PostCommit_Java_GradleBuild org.apache.beam.runners.flink PortableExecutionTest testExecution_1_ flaky
[ https://issues.apache.org/jira/browse/BEAM-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner closed BEAM-5125. -- Resolution: Cannot Reproduce Fix Version/s: Not applicable > beam_PostCommit_Java_GradleBuild org.apache.beam.runners.flink > PortableExecutionTest testExecution_1_ flaky > --- > > Key: BEAM-5125 > URL: https://issues.apache.org/jira/browse/BEAM-5125 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Test fails in both: post and precommit tests. Fails more often in pre-commits. > Pre-commit history: > [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/180/testReport/junit/org.apache.beam.runners.flink/PortableExecutionTest/testExecution_1_/history/] > Post-commit history: > [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1223/testReport/junit/org.apache.beam.runners.flink/PortableExecutionTest/testExecution_1_/history/?start=75] > Sample job: > https://builds.apache.org/job/beam_PreCommit_Java_Phrase/180/testReport/junit/org.apache.beam.runners.flink/PortableExecutionTest/testExecution_1_/ > Log: > java.lang.AssertionError: job state expected: but was: at > org.junit.Assert.fail(Assert.java:88) at > org.junit.Assert.failNotEquals(Assert.java:834) at > org.junit.Assert.assertEquals(Assert.java:118) at > org.apache.beam.runners.flink.PortableExecutionTest.testExecution(PortableExecutionTest.java:177) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5367) [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website test failure
[ https://issues.apache.org/jira/browse/BEAM-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-5367. Resolution: Fixed > [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website > test failure > --- > > Key: BEAM-5367 > URL: https://issues.apache.org/jira/browse/BEAM-5367 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Batkhuyag Batsaikhan >Assignee: Batkhuyag Batsaikhan >Priority: Major > Fix For: 2.8.0 > > Time Spent: 1h > Remaining Estimate: 0h > > apache/website test is breaking nightly build. Since we don't have stable > apache/beam website, we should remove it from the build for now. Until we > have an actual plan to fully migrate from asf beam-site, we should disable > apache/website checks from Beam build. > Failing job url: [https://scans.gradle.com/s/uxb6mdigqj4n4/console-log#L23465] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5367) [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website test failure
[ https://issues.apache.org/jira/browse/BEAM-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643830#comment-16643830 ] Scott Wegner commented on BEAM-5367: This is now fixed, latest runs have been successful: https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/395/ > [beam_Release_Gradle_NightlySnapshot] is broken due to apache/beam website > test failure > --- > > Key: BEAM-5367 > URL: https://issues.apache.org/jira/browse/BEAM-5367 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Batkhuyag Batsaikhan >Assignee: Batkhuyag Batsaikhan >Priority: Major > Fix For: 2.8.0 > > Time Spent: 1h > Remaining Estimate: 0h > > apache/website test is breaking nightly build. Since we don't have stable > apache/beam website, we should remove it from the build for now. Until we > have an actual plan to fully migrate from asf beam-site, we should disable > apache/website checks from Beam build. > Failing job url: [https://scans.gradle.com/s/uxb6mdigqj4n4/console-log#L23465] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5243) beam_Release_Gradle_NightlySnapshot InvocationError py27-cython/bin/python setup.py nosetests
[ https://issues.apache.org/jira/browse/BEAM-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner updated BEAM-5243: --- Labels: (was: currently-failing) > beam_Release_Gradle_NightlySnapshot InvocationError py27-cython/bin/python > setup.py nosetests > - > > Key: BEAM-5243 > URL: https://issues.apache.org/jira/browse/BEAM-5243 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Andrew Pilloud >Assignee: Ahmet Altay >Priority: Major > > It isn't clear to me what exactly failed, logs are full of stack traces. > [https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/151/] > [https://builds.apache.org/job/beam_PostCommit_Python_Verify/5844/] > > *01:00:38* ERROR: InvocationError for command > '/home/jenkins/jenkins-slave/workspace/beam_Release_Gradle_NightlySnapshot/src/sdks/python/target/.tox/py27-cython/bin/python > setup.py nosetests' (exited with code -11)*01:00:38* > ___ summary > *01:00:38* ERROR: py27-cython: commands > failed*01:00:38* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5665) [beam_PreCommit_Website_Commit] [:testWebsite] External link http://www.atrato.io/blog/2017/04/08/apache-apex-cli/ failed: response code 0 means something's wrong.
[ https://issues.apache.org/jira/browse/BEAM-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643826#comment-16643826 ] Scott Wegner commented on BEAM-5665: http://www.atrato.io/ still 404's. If this doesn't resolve by next week we should remove the dead link. > [beam_PreCommit_Website_Commit] [:testWebsite] External link > http://www.atrato.io/blog/2017/04/08/apache-apex-cli/ failed: response code 0 > means something's wrong. > > > Key: BEAM-5665 > URL: https://issues.apache.org/jira/browse/BEAM-5665 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PreCommit_Website_Commit/249/] > * [Gradle Build > Scan|https://scans.gradle.com/s/2sqxa7hlvdeum/console-log?task=:beam-website:testWebsite#L12] > * [Test source > code|https://github.com/apache/beam-site/blob/asf-site/Rakefile] > Initial investigation: > It seems http://www.atrato.io/ is down or no longer available. We should > disable the test for now and perhaps remove the dead link if it no longer > works. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5661) [beam_PostCommit_Py_ValCont] [:beam-sdks-python-container:docker] no such file or directory
[ https://issues.apache.org/jira/browse/BEAM-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner updated BEAM-5661: --- Labels: (was: currently-failing) > [beam_PostCommit_Py_ValCont] [:beam-sdks-python-container:docker] no such > file or directory > --- > > Key: BEAM-5661 > URL: https://issues.apache.org/jira/browse/BEAM-5661 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Assignee: Henning Rohde >Priority: Major > > _Use this form to file an issue for test failure:_ > * [Jenkins Job|https://builds.apache.org/job/beam_PostCommit_Py_ValCont/902/] > * [Gradle Build > Scan|https://scans.gradle.com/s/pmjbkhxaeryx4/console-log?task=:beam-sdks-python-container:docker#L2] > * [Test source > code|https://github.com/apache/beam/blob/845f8d0abcc5a8d7f93457c27aff0feeb1a867d5/sdks/python/container/build.gradle#L61] > Initial investigation: > {{failed to get digest > sha256:4ee4ea2f0113e98b49d8e376ce847feb374ddf2b8ea775502459d8a1b8a3eaed: open > /var/lib/docker/image/aufs/imagedb/content/sha256/4ee4ea2f0113e98b49d8e376ce847feb374ddf2b8ea775502459d8a1b8a3eaed: > no such file or directory}} > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5662) [beam_PostCommit_Website_Publish] [:testWebsite] External link http://wiki.apache.org/incubator/BeamProposal failed: got a time out
[ https://issues.apache.org/jira/browse/BEAM-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner updated BEAM-5662: --- Labels: (was: currently-failing) > [beam_PostCommit_Website_Publish] [:testWebsite] External link > http://wiki.apache.org/incubator/BeamProposal failed: got a time out > --- > > Key: BEAM-5662 > URL: https://issues.apache.org/jira/browse/BEAM-5662 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Website_Publish/94/] > * [Gradle Build > Scan|https://scans.gradle.com/s/h4mayefon7v7q/console-log?task=:beam-website:testWebsite#L12] > * [Test source > code|https://github.com/apache/beam/blob/845f8d0abcc5a8d7f93457c27aff0feeb1a867d5/website/Rakefile#L6] > Initial investigation: > The failed link is http://wiki.apache.org/incubator/BeamProposal > When I visit this link, it works for me. This is likely a flake. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5665) [beam_PreCommit_Website_Commit] [:testWebsite] External link http://www.atrato.io/blog/2017/04/08/apache-apex-cli/ failed: response code 0 means something's wrong.
[ https://issues.apache.org/jira/browse/BEAM-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner updated BEAM-5665: --- Labels: (was: currently-failing) > [beam_PreCommit_Website_Commit] [:testWebsite] External link > http://www.atrato.io/blog/2017/04/08/apache-apex-cli/ failed: response code 0 > means something's wrong. > > > Key: BEAM-5665 > URL: https://issues.apache.org/jira/browse/BEAM-5665 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PreCommit_Website_Commit/249/] > * [Gradle Build > Scan|https://scans.gradle.com/s/2sqxa7hlvdeum/console-log?task=:beam-website:testWebsite#L12] > * [Test source > code|https://github.com/apache/beam-site/blob/asf-site/Rakefile] > Initial investigation: > It seems http://www.atrato.io/ is down or no longer available. We should > disable the test for now and perhaps remove the dead link if it no longer > works. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5662) [beam_PostCommit_Website_Publish] [:testWebsite] External link http://wiki.apache.org/incubator/BeamProposal failed: got a time out
[ https://issues.apache.org/jira/browse/BEAM-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643825#comment-16643825 ] Scott Wegner commented on BEAM-5662: I haven't seen this in a while, but I'll keep open for a bit just in case. > [beam_PostCommit_Website_Publish] [:testWebsite] External link > http://wiki.apache.org/incubator/BeamProposal failed: got a time out > --- > > Key: BEAM-5662 > URL: https://issues.apache.org/jira/browse/BEAM-5662 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Website_Publish/94/] > * [Gradle Build > Scan|https://scans.gradle.com/s/h4mayefon7v7q/console-log?task=:beam-website:testWebsite#L12] > * [Test source > code|https://github.com/apache/beam/blob/845f8d0abcc5a8d7f93457c27aff0feeb1a867d5/website/Rakefile#L6] > Initial investigation: > The failed link is http://wiki.apache.org/incubator/BeamProposal > When I visit this link, it works for me. This is likely a flake. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5680) [beam_PostCommit_Website_Publish] [:publishWebsite] git push 'Authentication failed'
[ https://issues.apache.org/jira/browse/BEAM-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643824#comment-16643824 ] Scott Wegner commented on BEAM-5680: Latest run succeeded, this appears to be fixed: https://builds.apache.org/job/beam_PostCommit_Website_Publish/146/ > [beam_PostCommit_Website_Publish] [:publishWebsite] git push 'Authentication > failed' > > > Key: BEAM-5680 > URL: https://issues.apache.org/jira/browse/BEAM-5680 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Website_Publish/131/] > * [Gradle Build > Scan|https://scans.gradle.com/s/wkk4k3jp3l5ve/console-log?task=:beam-website:publishWebsite] > * [Test source > code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L221] > Initial investigation: > Failing consistently since 10/7 5am until now ([test > history|https://builds.apache.org/job/beam_PostCommit_Website_Publish/buildTimeTrend]). > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5680) [beam_PostCommit_Website_Publish] [:publishWebsite] git push 'Authentication failed'
[ https://issues.apache.org/jira/browse/BEAM-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-5680. Resolution: Fixed Fix Version/s: Not applicable > [beam_PostCommit_Website_Publish] [:publishWebsite] git push 'Authentication > failed' > > > Key: BEAM-5680 > URL: https://issues.apache.org/jira/browse/BEAM-5680 > Project: Beam > Issue Type: Bug > Components: test-failures, website >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Website_Publish/131/] > * [Gradle Build > Scan|https://scans.gradle.com/s/wkk4k3jp3l5ve/console-log?task=:beam-website:publishWebsite] > * [Test source > code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L221] > Initial investigation: > Failing consistently since 10/7 5am until now ([test > history|https://builds.apache.org/job/beam_PostCommit_Website_Publish/buildTimeTrend]). > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5634) Bring Dataflow Java Worker Code into Beam
[ https://issues.apache.org/jira/browse/BEAM-5634?focusedWorklogId=152828&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152828 ] ASF GitHub Bot logged work on BEAM-5634: Author: ASF GitHub Bot Created on: 09/Oct/18 17:56 Start Date: 09/Oct/18 17:56 Worklog Time Spent: 10m Work Description: boyuanzz commented on a change in pull request #6561: [BEAM-5634] Bring dataflow java worker code into beam URL: https://github.com/apache/beam/pull/6561#discussion_r223803403 ## File path: runners/google-cloud-dataflow-java/worker/build.gradle ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/**/ +// Apply BeamModulePlugin + +// Reuse project_root/buildSrc in this build.gradle file to reduce the +// maintenance burden and simpily this file. See BeamModulePlugin for +// documentation on default build tasks and properties that are enabled in +// addition to natures that will be applied to worker. +apply plugin: org.apache.beam.gradle.BeamModulePlugin + +group = "org.apache.beam.runners.dataflow" + +/**/ +// Apply Java nature with customized configurations + +// Set a specific version of 'com.google.apis:google-api-services-dataflow' +// by adding -Pdataflow.version= in Gradle command. Otherwise, +// 'google_clients_version' defined in BeamModulePlugin will be used as default. +def DATAFLOW_VERSION = "dataflow.version" + +// To build FnAPI or legacy worker. +// Use -PisLegacyWorker in Gradle command if build legacy worker, otherwise, +// FnAPI worker is considered as default. +def is_legacy_worker = { + return project.hasProperty("isLegacyWorker") +} + +// Get full dependency of 'com.google.apis:google-api-services-dataflow' +def google_api_services_dataflow = project.hasProperty(DATAFLOW_VERSION) ? "com.google.apis:google-api-services-dataflow:" + getProperty(DATAFLOW_VERSION) : library.java.google_api_services_dataflow + +// Returns a string representing the relocated path to be used with the shadow +// plugin when given a suffix such as "com.". +def getWorkerRelocatedPath = { String suffix -> + return ("org.apache.beam.runners.dataflow.worker.repackaged." + + suffix) +} + +// Following listed dependencies will be shaded only in fnapi worker, not legacy +// worker +def sdk_provided_dependencies = [ + "org.apache.beam:beam-runners-google-cloud-dataflow-java:$version", + "org.apache.beam:beam-sdks-java-core:$version", + "org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:$version", + "org.apache.beam:beam-sdks-java-io-google-cloud-platform:$version", + google_api_services_dataflow, + library.java.avro, + library.java.google_api_client, + library.java.google_http_client, + library.java.google_http_client_jackson, + library.java.jackson_annotations, + library.java.jackson_core, + library.java.jackson_databind, + library.java.joda_time, +] + +// Exclude unneeded dependencies when building jar +def excluded_dependencies = [ + "com.google.auto.service:auto-service", // Provided scope added from applyJavaNature + "com.google.auto.value:auto-value", // Provided scope added from applyJavaNature + "org.codehaus.jackson:jackson-core-asl", // Exclude an old version of jackson-core-asl introduced by google-http-client-jackson + "org.objenesis:objenesis", // Transitive dependency introduced from Beam + "org.tukaani:xz",// Transitive dependency introduced from Beam + library.java.commons_compress, // Transitive dependency introduced from Beam + library.java.error_prone_annotations,// Provided scope added in worker + library.java.hamcrest_core, // Test only + library.java.hamcre
[jira] [Commented] (BEAM-5688) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert
[ https://issues.apache.org/jira/browse/BEAM-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643817#comment-16643817 ] Scott Wegner commented on BEAM-5688: [~udim] fyi > [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on > githubPullRequestId assert > -- > > Key: BEAM-5688 > URL: https://issues.apache.org/jira/browse/BEAM-5688 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Assignee: Alan Myrvold >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/20/] > * [Gradle Build Scan|https://gradle.com/s/7mqwgjegf5hge] > * [Test source > code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L234] > Initial investigation: > This is a problem with how the website gradle scripts are implemented to > accept an githubPullRequestId. The Cron job will not have an associated PR, > so this currently fails. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5691) testWebsite failed owing to several url 404 not found
[ https://issues.apache.org/jira/browse/BEAM-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-5691. Resolution: Duplicate Fix Version/s: Not applicable > testWebsite failed owing to several url 404 not found > - > > Key: BEAM-5691 > URL: https://issues.apache.org/jira/browse/BEAM-5691 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Boyuan Zhang >Priority: Major > Fix For: Not applicable > > > test log: > https://builds.apache.org/job/beam_PreCommit_Website_Commit/262/console -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake
[ https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner updated BEAM-5683: --- Summary: [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to pip download flake (was: [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary) > [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Fails due to > pip download flake > -- > > Key: BEAM-5683 > URL: https://issues.apache.org/jira/browse/BEAM-5683 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Scott Wegner >Assignee: Ankur Goenka >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/] > * [Gradle Build > Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests] > * [Test source > code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390] > Initial investigation: > Seems to be failing on pip download. > == > ERROR: test_multiple_empty_outputs > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py", > line 277, in test_multiple_empty_outputs > pipeline.run() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 104, in run > result = super(TestPipeline, self).run(test_runner_api) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 403, in run > self.to_runner_api(), self.runner, self._options).run(False) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 416, in run > return self.runner.run_pipeline(self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 50, in run_pipeline > self.result = super(TestDataflowRunner, self).run_pipeline(pipeline) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", > line 389, in run_pipeline > self.dataflow_client.create_job(self.job), self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py", > line 184, in wrapper > return fun(*args, **kwargs) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 490, in create_job > self.create_job_description(job) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 519, in create_job_description > resources = self._stage_resour > ces(job.options) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 452, in _stage_resources > staging_location=google_cloud_options.staging_location) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 161, in stage_job_resources > requirements_cache_path) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 411, in _populate_requirements_cache > processes.check_call(cmd_args) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py", > line 46, in check_call > return subprocess.check_call(*args, **kwargs) > File "/usr/lib/python2.7/subprocess.py&q
[jira] [Assigned] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5637: --- Assignee: Ruoyun Huang > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5114) Create example uber jars for supported runners
[ https://issues.apache.org/jira/browse/BEAM-5114?focusedWorklogId=152825&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152825 ] ASF GitHub Bot logged work on BEAM-5114: Author: ASF GitHub Bot Created on: 09/Oct/18 17:50 Start Date: 09/Oct/18 17:50 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #6191: [BEAM-5114] Create example uber jars URL: https://github.com/apache/beam/pull/6191#issuecomment-428286959 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152825) Time Spent: 1h 50m (was: 1h 40m) > Create example uber jars for supported runners > -- > > Key: BEAM-5114 > URL: https://issues.apache.org/jira/browse/BEAM-5114 > Project: Beam > Issue Type: New Feature > Components: examples-java >Reporter: Ben Sidhom >Assignee: Ben Sidhom >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Producing these artifacts results in several benefits > * Gives an example of how to package user code for different runners > * Enables ad-hoc testing of runner changes against real user pipelines easier > * Enables integration testing end-to-end pipelines against different runner > services -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152823&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152823 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 17:42 Start Date: 09/Oct/18 17:42 Worklog Time Spent: 10m Work Description: tweise commented on issue #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#issuecomment-428284446 @angoenka the current solution of using the container boot binary prevents this from running on another architecture. Can we come up with a solution that would also work locally on macOS and other platforms. We had already solved this for Lyft. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152823) Time Spent: 5h 20m (was: 5h 10m) > Python Flink ValidatesRunner job fixes > -- > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 5h 20m > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5660) Add dataflow java worker unit tests into precommit
[ https://issues.apache.org/jira/browse/BEAM-5660?focusedWorklogId=152822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152822 ] ASF GitHub Bot logged work on BEAM-5660: Author: ASF GitHub Bot Created on: 09/Oct/18 17:40 Start Date: 09/Oct/18 17:40 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #6613: [BEAM-5660] Add both dataflow legacy worker and fn-api worker into JavaPreCommit URL: https://github.com/apache/beam/pull/6613#issuecomment-428283701 Re @herohde, I don't think they are related. All failures are url 404 not found. Filed JIRA: https://issues.apache.org/jira/browse/BEAM-5691 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152822) Time Spent: 0.5h (was: 20m) > Add dataflow java worker unit tests into precommit > -- > > Key: BEAM-5660 > URL: https://issues.apache.org/jira/browse/BEAM-5660 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5691) testWebsite failed owing to several url 404 not found
Boyuan Zhang created BEAM-5691: -- Summary: testWebsite failed owing to several url 404 not found Key: BEAM-5691 URL: https://issues.apache.org/jira/browse/BEAM-5691 Project: Beam Issue Type: Bug Components: test-failures Reporter: Boyuan Zhang test log: https://builds.apache.org/job/beam_PreCommit_Website_Commit/262/console -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5684) Need a test that verifies Flattening / not-flattening of BQ nested records
[ https://issues.apache.org/jira/browse/BEAM-5684?focusedWorklogId=152821&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152821 ] ASF GitHub Bot logged work on BEAM-5684: Author: ASF GitHub Bot Created on: 09/Oct/18 17:38 Start Date: 09/Oct/18 17:38 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6609: [BEAM-5684] Adding a BQNestedRecords Test URL: https://github.com/apache/beam/pull/6609#issuecomment-428283083 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152821) Time Spent: 1h 20m (was: 1h 10m) > Need a test that verifies Flattening / not-flattening of BQ nested records > -- > > Key: BEAM-5684 > URL: https://issues.apache.org/jira/browse/BEAM-5684 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643769#comment-16643769 ] Xu Mingmin commented on BEAM-5690: -- Is this the error specifically? Seems duplicated {{0}} counts here, {code:java} {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 0}{code} > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')
[ https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152815&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152815 ] ASF GitHub Bot logged work on BEAM-5624: Author: ASF GitHub Bot Created on: 09/Oct/18 17:19 Start Date: 09/Oct/18 17:19 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #6616: [BEAM-5624] Fix avro.schema parser for py3 URL: https://github.com/apache/beam/pull/6616#discussion_r223790260 ## File path: sdks/python/apache_beam/io/avroio_test.py ## @@ -25,10 +25,15 @@ from builtins import range import avro.datafile -import avro.schema from avro.datafile import DataFileWriter from avro.io import DatumWriter import hamcrest as hc +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + from avro.schema import Parse Review comment: +1 to @aaltay's comment. Also we will need a similar change in other places in Beam where we use avro.schema.parse. See `apache_beam/io/avroio.py`, `apache_beam/examples/avro_bitcoin.py`. This can be done in another PR if you prefer. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152815) Time Spent: 0.5h (was: 20m) > Avro IO does not work with avro-python3 package out-of-the-box on Python 3, > several tests fail with AttributeError (module 'avro.schema' has no attribute > 'parse') > --- > > Key: BEAM-5624 > URL: https://issues.apache.org/jira/browse/BEAM-5624 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Simon >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > == > ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse') > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py", > line 39, in runTest > raise self.exc_val.with_traceback(self.tb) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py", > line 418, in loadTestsFromName > addr.filename, addr.module) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py", > line 47, in importFromPath > return self.importFromDir(dir_path, fqname) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py", > line 94, in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py", > line 234, in load_module > return load_source(name, filename, file) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py", > line 172, in load_source > module = _load(spec) > File "", line 693, in _load > File "", line 673, in _load_unlocked > File "", line 673, in exec_module > File "", line 222, in _call_with_frames_removed > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py", > line 54, in > class TestAvro(unittest.TestCase): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py", > line 89, in TestAvro > SCHEMA = avro.schema.parse(''' > AttributeError: module 'avro.schema' has no attribute 'parse' > Note that we use a different implementation of avro/avro-python3 package > depending on Python version. We are also evaluating potential replacement of > avro with fastavro. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')
[ https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152813&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152813 ] ASF GitHub Bot logged work on BEAM-5624: Author: ASF GitHub Bot Created on: 09/Oct/18 17:12 Start Date: 09/Oct/18 17:12 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6616: [BEAM-5624] Fix avro.schema parser for py3 URL: https://github.com/apache/beam/pull/6616#discussion_r223788054 ## File path: sdks/python/apache_beam/io/avroio_test.py ## @@ -25,10 +25,15 @@ from builtins import range import avro.datafile -import avro.schema from avro.datafile import DataFileWriter from avro.io import DatumWriter import hamcrest as hc +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + from avro.schema import Parse Review comment: Could you add a comment here about, in what versions of avro which version of parse is supported. (We can use this information to remove this block later on.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152813) Time Spent: 20m (was: 10m) > Avro IO does not work with avro-python3 package out-of-the-box on Python 3, > several tests fail with AttributeError (module 'avro.schema' has no attribute > 'parse') > --- > > Key: BEAM-5624 > URL: https://issues.apache.org/jira/browse/BEAM-5624 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Simon >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > == > ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse') > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py", > line 39, in runTest > raise self.exc_val.with_traceback(self.tb) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py", > line 418, in loadTestsFromName > addr.filename, addr.module) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py", > line 47, in importFromPath > return self.importFromDir(dir_path, fqname) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py", > line 94, in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py", > line 234, in load_module > return load_source(name, filename, file) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py", > line 172, in load_source > module = _load(spec) > File "", line 693, in _load > File "", line 673, in _load_unlocked > File "", line 673, in exec_module > File "", line 222, in _call_with_frames_removed > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py", > line 54, in > class TestAvro(unittest.TestCase): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py", > line 89, in TestAvro > SCHEMA = avro.schema.parse(''' > AttributeError: module 'avro.schema' has no attribute 'parse' > Note that we use a different implementation of avro/avro-python3 package > depending on Python version. We are also evaluating potential replacement of > avro with fastavro. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5254) Add Samza Runner translator registrar and refactor config generation
[ https://issues.apache.org/jira/browse/BEAM-5254?focusedWorklogId=152814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152814 ] ASF GitHub Bot logged work on BEAM-5254: Author: ASF GitHub Bot Created on: 09/Oct/18 17:12 Start Date: 09/Oct/18 17:12 Worklog Time Spent: 10m Work Description: xinyuiscool commented on issue #6292: [BEAM-5254] Add Samza Runner translator registrar and refactor config URL: https://github.com/apache/beam/pull/6292#issuecomment-428274726 Thanks for merging it! I might have a few more coming in the next few days for Samza Runner. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152814) Time Spent: 1.5h (was: 1h 20m) > Add Samza Runner translator registrar and refactor config generation > > > Key: BEAM-5254 > URL: https://issues.apache.org/jira/browse/BEAM-5254 > Project: Beam > Issue Type: Improvement > Components: runner-samza >Reporter: Xinyu Liu >Assignee: Xinyu Liu >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Add a registrar for transform translators in Samza Runner so we allow > customized translators. Also refactors the config generation part so it can > be extended outside open source beam. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-1081) annotations should support custom messages and classes
[ https://issues.apache.org/jira/browse/BEAM-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643748#comment-16643748 ] Ahmet Altay commented on BEAM-1081: --- Adding tests would be good. There is also the "1. ability to customize message" part of the original issue. > annotations should support custom messages and classes > -- > > Key: BEAM-1081 > URL: https://issues.apache.org/jira/browse/BEAM-1081 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > > Update > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/utils/annotations.py > to add 2 new features: > 1. ability to customize message > 2. ability to tag classes (not only functions) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')
[ https://issues.apache.org/jira/browse/BEAM-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643746#comment-16643746 ] Valentyn Tymofieiev commented on BEAM-5624: --- This issue is due to a small API change (see: https://github.com/apache/beam/pull/6616). However there are some troubling reports about bad experience with avro-python3, see [1,2]. That said, we may want to migrate to fastavro sooner than later, FYI [~udim] [~chamikara] [~altay]. [1] https://github.com/common-workflow-language/cwltool/issues/524 [2] https://issues.apache.org/jira/browse/AVRO-2046 > Avro IO does not work with avro-python3 package out-of-the-box on Python 3, > several tests fail with AttributeError (module 'avro.schema' has no attribute > 'parse') > --- > > Key: BEAM-5624 > URL: https://issues.apache.org/jira/browse/BEAM-5624 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Simon >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > == > ERROR: Failure: AttributeError (module 'avro.schema' has no attribute 'parse') > -- > Traceback (most recent call last): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/failure.py", > line 39, in runTest > raise self.exc_val.with_traceback(self.tb) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/loader.py", > line 418, in loadTestsFromName > addr.filename, addr.module) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py", > line 47, in importFromPath > return self.importFromDir(dir_path, fqname) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/nose/importer.py", > line 94, in importFromDir > mod = load_module(part_fqname, fh, filename, desc) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py", > line 234, in load_module > return load_source(name, filename, file) > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/imp.py", > line 172, in load_source > module = _load(spec) > File "", line 693, in _load > File "", line 673, in _load_unlocked > File "", line 673, in exec_module > File "", line 222, in _call_with_frames_removed > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py", > line 54, in > class TestAvro(unittest.TestCase): > File > "/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/avroio_test.py", > line 89, in TestAvro > SCHEMA = avro.schema.parse(''' > AttributeError: module 'avro.schema' has no attribute 'parse' > Note that we use a different implementation of avro/avro-python3 package > depending on Python version. We are also evaluating potential replacement of > avro with fastavro. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152779&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152779 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 16:46 Start Date: 09/Oct/18 16:46 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223779339 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; + private final String address; + private final int port; + private final Charset charset; + + public MetricsGraphiteSink(PipelineOptions pipelineOptions) { +this.address = pipelineOptions.getMetricsGraphiteHost(); +this.port = pipelineOptions.getMetricsGraphitePort(); +this.charset = UTF_8; + } + + @Experimental(Experimental.Kind.METRICS) + @Override + public void writeMetrics(MetricQueryResults metricQueryResults) throws Exception { +final long metricTimestamp = System.currentTimeMillis() / 1000L; +Socket socket = new Socket(InetAddress.getByName(address), port); +BufferedWriter writer = +new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), charset)); +StringBuilder messagePayload = new StringBuilder(); +Iterable> counters = metricQueryResults.getCounters(); +Iterable> gauges = metricQueryResults.getGauges(); +Iterable> distributions = +metricQueryResults.getDistributions(); + +for (MetricResult counter : counters) { + // if committed metrics are not supported, exception is thrown and we don't append the message + try { +messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, false)); +} + +for (MetricResult gauge : gauges) { + try { +messagePayload.append(createGaugeGraphiteMessage(gauge, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messageP
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152777 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 16:42 Start Date: 09/Oct/18 16:42 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223777983 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; + private final String address; + private final int port; + private final Charset charset; + + public MetricsGraphiteSink(PipelineOptions pipelineOptions) { +this.address = pipelineOptions.getMetricsGraphiteHost(); +this.port = pipelineOptions.getMetricsGraphitePort(); +this.charset = UTF_8; + } + + @Experimental(Experimental.Kind.METRICS) + @Override + public void writeMetrics(MetricQueryResults metricQueryResults) throws Exception { +final long metricTimestamp = System.currentTimeMillis() / 1000L; +Socket socket = new Socket(InetAddress.getByName(address), port); +BufferedWriter writer = +new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), charset)); +StringBuilder messagePayload = new StringBuilder(); +Iterable> counters = metricQueryResults.getCounters(); +Iterable> gauges = metricQueryResults.getGauges(); +Iterable> distributions = +metricQueryResults.getDistributions(); + +for (MetricResult counter : counters) { + // if committed metrics are not supported, exception is thrown and we don't append the message + try { +messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, false)); +} + +for (MetricResult gauge : gauges) { + try { +messagePayload.append(createGaugeGraphiteMessage(gauge, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messageP
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152771 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 16:36 Start Date: 09/Oct/18 16:36 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223776184 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; Review comment: If it's used only once then we could you use just character, but ok with a constant This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152771) Time Spent: 3h (was: 2h 50m) > Implement a Graphite sink for the metrics pusher > > > Key: BEAM-4553 > URL: https://issues.apache.org/jira/browse/BEAM-4553 > Project: Beam > Issue Type: Sub-task > Components: runner-extensions-metrics >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Today only a REST Http sink that sends raw json metrics using POST request to > a http server is available. It is more a POC sink. It would be good to code > the first real metrics sink. Some of the most popular is Graphite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152762&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152762 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 09/Oct/18 16:28 Start Date: 09/Oct/18 16:28 Worklog Time Spent: 10m Work Description: splovyt commented on issue #6590: [BEAM-5315] Partially port io URL: https://github.com/apache/beam/pull/6590#issuecomment-428259931 @tvalentyn I have rebased, although the checks seem to be hanging. PTAL and please merge if approved (I am at an event next two days). Thanks once again for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152762) Time Spent: 3h 20m (was: 3h 10m) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Simon >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5660) Add dataflow java worker unit tests into precommit
[ https://issues.apache.org/jira/browse/BEAM-5660?focusedWorklogId=152760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152760 ] ASF GitHub Bot logged work on BEAM-5660: Author: ASF GitHub Bot Created on: 09/Oct/18 16:27 Start Date: 09/Oct/18 16:27 Worklog Time Spent: 10m Work Description: herohde commented on issue #6613: [BEAM-5660] Add both dataflow legacy worker and fn-api worker into JavaPreCommit URL: https://github.com/apache/beam/pull/6613#issuecomment-428259602 @boyuanzz Is the website failure unrelated? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152760) Time Spent: 20m (was: 10m) > Add dataflow java worker unit tests into precommit > -- > > Key: BEAM-5660 > URL: https://issues.apache.org/jira/browse/BEAM-5660 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work
[ https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152755 ] ASF GitHub Bot logged work on BEAM-5687: Author: ASF GitHub Bot Created on: 09/Oct/18 16:20 Start Date: 09/Oct/18 16:20 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6617: [BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines URL: https://github.com/apache/beam/pull/6617#discussion_r223770717 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java ## @@ -203,6 +203,9 @@ static StreamExecutionEnvironment createStreamExecutionEnvironment( .getCheckpointConfig() .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints); } +} else { + // checkpointing is disabled, we can allow shutting down sources when they're done Review comment: https://issues.apache.org/jira/browse/FLINK-2491 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152755) Time Spent: 0.5h (was: 20m) > Checkpointing in portable pipelines does not work > - > > Key: BEAM-5687 > URL: https://issues.apache.org/jira/browse/BEAM-5687 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink > Fix For: 2.9.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Checkpoints fail: > {noformat} > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> > ToKeyedWorkItem (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> > ToKeyedWorkItem (1/1). > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943) > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.NullPointerException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854) > ... 5 more > Caused by: java.lang.NullPointerException > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136) > at > org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93) > at > org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394) > at > org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352) > at > org.apache.flink.runtime.io.asyn
[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work
[ https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152754 ] ASF GitHub Bot logged work on BEAM-5687: Author: ASF GitHub Bot Created on: 09/Oct/18 16:19 Start Date: 09/Oct/18 16:19 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #6617: [BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines URL: https://github.com/apache/beam/pull/6617#discussion_r223770313 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java ## @@ -203,6 +203,9 @@ static StreamExecutionEnvironment createStreamExecutionEnvironment( .getCheckpointConfig() .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints); } +} else { + // checkpointing is disabled, we can allow shutting down sources when they're done Review comment: Is there a Flink JIRA for this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152754) Time Spent: 20m (was: 10m) > Checkpointing in portable pipelines does not work > - > > Key: BEAM-5687 > URL: https://issues.apache.org/jira/browse/BEAM-5687 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink > Fix For: 2.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Checkpoints fail: > {noformat} > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> > ToKeyedWorkItem (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> > ToKeyedWorkItem (1/1). > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943) > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.NullPointerException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854) > ... 5 more > Caused by: java.lang.NullPointerException > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136) > at > org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93) > at > org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394) > at > org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352) > at > org.apache.flink.runtime.io.async.AbstractAsyncC
[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work
[ https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152759 ] ASF GitHub Bot logged work on BEAM-5687: Author: ASF GitHub Bot Created on: 09/Oct/18 16:25 Start Date: 09/Oct/18 16:25 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6617: [BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines URL: https://github.com/apache/beam/pull/6617#discussion_r223772311 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java ## @@ -203,6 +203,9 @@ static StreamExecutionEnvironment createStreamExecutionEnvironment( .getCheckpointConfig() .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints); } +} else { + // checkpointing is disabled, we can allow shutting down sources when they're done Review comment: It's also in `ImpulseSourceFunction` but doesn't hurt to add it here as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152759) Time Spent: 50m (was: 40m) > Checkpointing in portable pipelines does not work > - > > Key: BEAM-5687 > URL: https://issues.apache.org/jira/browse/BEAM-5687 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink > Fix For: 2.9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Checkpoints fail: > {noformat} > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> > ToKeyedWorkItem (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> > ToKeyedWorkItem (1/1). > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943) > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.NullPointerException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854) > ... 5 more > Caused by: java.lang.NullPointerException > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136) > at > org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93) > at > org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394) > at > org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBacke
[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work
[ https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152758 ] ASF GitHub Bot logged work on BEAM-5687: Author: ASF GitHub Bot Created on: 09/Oct/18 16:23 Start Date: 09/Oct/18 16:23 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #6617: [BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines URL: https://github.com/apache/beam/pull/6617#discussion_r223771625 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkExecutionEnvironments.java ## @@ -203,6 +203,9 @@ static StreamExecutionEnvironment createStreamExecutionEnvironment( .getCheckpointConfig() .setMinPauseBetweenCheckpoints(minPauseBetweenCheckpoints); } +} else { + // checkpointing is disabled, we can allow shutting down sources when they're done Review comment: great, let's add that link as a comment This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152758) Time Spent: 40m (was: 0.5h) > Checkpointing in portable pipelines does not work > - > > Key: BEAM-5687 > URL: https://issues.apache.org/jira/browse/BEAM-5687 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink > Fix For: 2.9.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Checkpoints fail: > {noformat} > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> > ToKeyedWorkItem (1/1).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> > ToKeyedWorkItem (1/1). > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943) > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.NullPointerException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854) > ... 5 more > Caused by: java.lang.NullPointerException > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162) > at > org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136) > at > org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93) > at > org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394) > at > org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352) > at > org.apache.flink.runtime.io
[jira] [Work logged] (BEAM-5687) Checkpointing in portable pipelines does not work
[ https://issues.apache.org/jira/browse/BEAM-5687?focusedWorklogId=152749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152749 ] ASF GitHub Bot logged work on BEAM-5687: Author: ASF GitHub Bot Created on: 09/Oct/18 16:15 Start Date: 09/Oct/18 16:15 Worklog Time Spent: 10m Work Description: mxm opened a new pull request #6617: [BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines URL: https://github.com/apache/beam/pull/6617 ### [BEAM-5687] Fix checkpointing of FlinkRunner for portable pipelines This provides the input WindowValue Coder to ExecutableStageDoFnOperator which ensures that the buffered elements can be checkpointed correctly. ### [BEAM-3727] Do not shutdown Impulse sources to enable checkpointing Flink's checkpointing won't work properly after sources have finished. They need to be up and running for as long as checkpoints should be taken. This was already the case for the non-portable UnboundedSourceWrapper but it needs to be extended also for Impulse transforms. ### [BEAM-3727] Allow sources to shutdown when checkpointing is disabled CC @tweise @angoenka Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152749) Time Spent: 10m Remaining Estimate: 0h > Checkpointing in portable pipelines does not work > - > > Key: BEAM-5687 > URL: https://issues.apache.org/jira/browse/BEAM-5687 > Project:
[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643682#comment-16643682 ] Kenneth Knowles commented on BEAM-5690: --- CC [~kedin] [~apilloud] [~xumingming] [~mingmxu] [~amaliujia] Since it is not reproduced in the Flink runner or Direct runner, the SQL implementation of GROUP BY is probably triggering some latent bug in the Spark runner's streaming mode. > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-5690: - Assignee: (was: Amit Sela) > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5254) Add Samza Runner translator registrar and refactor config generation
[ https://issues.apache.org/jira/browse/BEAM-5254?focusedWorklogId=152737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152737 ] ASF GitHub Bot logged work on BEAM-5254: Author: ASF GitHub Bot Created on: 09/Oct/18 16:00 Start Date: 09/Oct/18 16:00 Worklog Time Spent: 10m Work Description: akedin closed pull request #6292: [BEAM-5254] Add Samza Runner translator registrar and refactor config URL: https://github.com/apache/beam/pull/6292 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java index 6e67e385756..bba10ddd962 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/SamzaRunner.java @@ -33,7 +33,6 @@ import org.apache.beam.sdk.values.PValue; import org.apache.samza.application.StreamApplication; import org.apache.samza.config.Config; -import org.apache.samza.config.MapConfig; import org.apache.samza.metrics.MetricsRegistryMap; import org.apache.samza.operators.ContextManager; import org.apache.samza.operators.StreamGraph; @@ -76,20 +75,18 @@ public SamzaPipelineResult run(Pipeline pipeline) { // Add a dummy source for use in special cases (TestStream, empty flatten) final PValue dummySource = pipeline.apply("Dummy Input Source", Create.of("dummy")); - final Map idMap = PViewToIdMapper.buildIdMap(pipeline); -final Map config = ConfigBuilder.buildConfig(pipeline, options, idMap); -final SamzaExecutionContext executionContext = new SamzaExecutionContext(); +final ConfigBuilder configBuilder = new ConfigBuilder(options); +SamzaPipelineTranslator.createConfig(pipeline, idMap, configBuilder); +final ApplicationRunner runner = ApplicationRunner.fromConfig(configBuilder.build()); -final ApplicationRunner runner = ApplicationRunner.fromConfig(new MapConfig(config)); +final SamzaExecutionContext executionContext = new SamzaExecutionContext(); final StreamApplication app = new StreamApplication() { @Override public void init(StreamGraph streamGraph, Config config) { -// TODO: we should probably not be creating the execution context this early since it needs -// to be shipped off to various tasks. streamGraph.withContextManager( new ContextManager() { @Override diff --git a/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java b/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java index 1bef011a34f..f653cfc934b 100644 --- a/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java +++ b/runners/samza/src/main/java/org/apache/beam/runners/samza/adapter/BoundedSourceSystem.java @@ -69,28 +69,6 @@ */ // TODO: instrumentation for the consumer public class BoundedSourceSystem { - /** - * Returns the configuration required to instantiate a consumer for the given {@link - * BoundedSource}. - * - * @param id a unique id for the source. Must use only valid characters for a system name in - * Samza. - * @param source the source - * @param coder a coder to deserialize messages received by the source's consumer - * @param the type of object produced by the source consumer - */ - public static Map createConfigFor( - String id, BoundedSource source, Coder> coder, String stepName) { -final Map config = new HashMap<>(); -final String streamPrefix = "systems." + id; -config.put(streamPrefix + ".samza.factory", BoundedSourceSystem.Factory.class.getName()); -config.put(streamPrefix + ".source", Base64Serializer.serializeUnchecked(source)); -config.put(streamPrefix + ".coder", Base64Serializer.serializeUnchecked(coder)); -config.put(streamPrefix + ".stepName", stepName); -config.put("streams." + id + ".samza.system", id); -config.put("streams." + id + ".samza.bounded", "true"); -return config; - } private static List> split( BoundedSource source, SamzaPipelineOptions pipelineOptions) throws Exception { @@ -414,8 +392,7 @@ private void enqueueUninterruptibly(IncomingMessageEnvelope envelope) { /** * A {@link SystemFactory} that produces a {@link BoundedSourceSystem} for a particular {@lin
[jira] [Created] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
Kenneth Knowles created BEAM-5690: - Summary: Issue with GroupByKey in BeamSql using SparkRunner Key: BEAM-5690 URL: https://issues.apache.org/jira/browse/BEAM-5690 Project: Beam Issue Type: Task Components: runner-spark Reporter: Kenneth Knowles Assignee: Amit Sela Reported on user@ {quote}We are trying to setup a pipeline with using BeamSql and the trigger used is default (AfterWatermark crosses the window). Below is the pipeline: KafkaSource (KafkaIO) ---> Windowing (FixedWindow 1min) ---> BeamSql ---> KafkaSink (KafkaIO) We are using Spark Runner for this. The BeamSql query is: {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} We are grouping by Col3 which is a string. It can hold values string[0-9]. The records are getting emitted out at 1 min to kafka sink, but the output record in kafka is not as expected. Below is the output observed: (WST and WET are indicators for window start time and window end time) {code} {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 +"} {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 +","WET":"2018-10-09 09-56-00 0} {code} {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152733 ] ASF GitHub Bot logged work on BEAM-5326: Author: ASF GitHub Bot Created on: 09/Oct/18 15:51 Start Date: 09/Oct/18 15:51 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6615: [BEAM-5326] Shim main class and fix Go artifact naming mismatch for c… URL: https://github.com/apache/beam/pull/6615#discussion_r223760656 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.google.cloud.dataflow.worker; + +/** Temporary redirect for org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness. */ Review comment: Opened BEAM-5686 for when we can remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152733) Time Spent: 1h 10m (was: 1h) > SDK support for custom dataflow worker jar > -- > > Key: BEAM-5326 > URL: https://issues.apache.org/jira/browse/BEAM-5326 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Doc: > https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152734 ] ASF GitHub Bot logged work on BEAM-5326: Author: ASF GitHub Bot Created on: 09/Oct/18 15:51 Start Date: 09/Oct/18 15:51 Worklog Time Spent: 10m Work Description: herohde commented on issue #6615: [BEAM-5326] Shim main class and fix Go artifact naming mismatch for c… URL: https://github.com/apache/beam/pull/6615#issuecomment-428246844 Thanks @boyuanzz. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152734) Time Spent: 1h 20m (was: 1h 10m) > SDK support for custom dataflow worker jar > -- > > Key: BEAM-5326 > URL: https://issues.apache.org/jira/browse/BEAM-5326 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Doc: > https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5686) Remove DataflowRunnerHarness shim again
Henning Rohde created BEAM-5686: --- Summary: Remove DataflowRunnerHarness shim again Key: BEAM-5686 URL: https://issues.apache.org/jira/browse/BEAM-5686 Project: Beam Issue Type: Task Components: runner-dataflow Reporter: Henning Rohde Assignee: Henning Rohde -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152732&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152732 ] ASF GitHub Bot logged work on BEAM-5326: Author: ASF GitHub Bot Created on: 09/Oct/18 15:50 Start Date: 09/Oct/18 15:50 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6615: [BEAM-5326] Shim main class and fix Go artifact naming mismatch for c… URL: https://github.com/apache/beam/pull/6615#discussion_r223760415 ## File path: sdks/go/pkg/beam/runners/dataflow/dataflow.go ## @@ -149,10 +149,12 @@ func Execute(ctx context.Context, p *beam.Pipeline) error { return fmt.Errorf("failed to generate model pipeline: %v", err) } - id := atomic.AddInt32(&unique, 1) - modelURL := gcsx.Join(*stagingLocation, fmt.Sprintf("model-%v-%v", id, time.Now().UnixNano())) - workerURL := gcsx.Join(*stagingLocation, fmt.Sprintf("worker-%v-%v", id, time.Now().UnixNano())) - jarURL := gcsx.Join(*stagingLocation, fmt.Sprintf("dataflow-worker-%v-%v.jar", id, time.Now().UnixNano())) + // NOTE(herohde) 10/8/2018: the last segment of the names must be "worker" and "dataflow-worker.jar". Review comment: Opened BEAM-5689 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152732) Time Spent: 1h (was: 50m) > SDK support for custom dataflow worker jar > -- > > Key: BEAM-5326 > URL: https://issues.apache.org/jira/browse/BEAM-5326 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Doc: > https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5688) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert
[ https://issues.apache.org/jira/browse/BEAM-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643659#comment-16643659 ] Scott Wegner commented on BEAM-5688: I have https://github.com/apache/beam/pull/6608 out to fix this, but the current version doesn't work. > [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on > githubPullRequestId assert > -- > > Key: BEAM-5688 > URL: https://issues.apache.org/jira/browse/BEAM-5688 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Assignee: Alan Myrvold >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/20/] > * [Gradle Build Scan|https://gradle.com/s/7mqwgjegf5hge] > * [Test source > code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L234] > Initial investigation: > This is a problem with how the website gradle scripts are implemented to > accept an githubPullRequestId. The Cron job will not have an associated PR, > so this currently fails. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5689) Remove artifact naming constraint for portable Dataflow job
Henning Rohde created BEAM-5689: --- Summary: Remove artifact naming constraint for portable Dataflow job Key: BEAM-5689 URL: https://issues.apache.org/jira/browse/BEAM-5689 Project: Beam Issue Type: Task Components: runner-dataflow Reporter: Henning Rohde Assignee: Henning Rohde Artifact names/keys are not preserved in Dataflow. Remove the below workarounds when they are. * Go Dataflow runner * Java and Python container boot code (probably) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary
[ https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643661#comment-16643661 ] Pablo Estrada commented on BEAM-5683: - I'll take a look in a bit > [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary > -- > > Key: BEAM-5683 > URL: https://issues.apache.org/jira/browse/BEAM-5683 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Scott Wegner >Assignee: Ankur Goenka >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/] > * [Gradle Build > Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests] > * [Test source > code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390] > Initial investigation: > Seems to be failing on pip download. > == > ERROR: test_multiple_empty_outputs > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py", > line 277, in test_multiple_empty_outputs > pipeline.run() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 104, in run > result = super(TestPipeline, self).run(test_runner_api) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 403, in run > self.to_runner_api(), self.runner, self._options).run(False) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 416, in run > return self.runner.run_pipeline(self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 50, in run_pipeline > self.result = super(TestDataflowRunner, self).run_pipeline(pipeline) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", > line 389, in run_pipeline > self.dataflow_client.create_job(self.job), self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py", > line 184, in wrapper > return fun(*args, **kwargs) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 490, in create_job > self.create_job_description(job) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 519, in create_job_description > resources = self._stage_resour > ces(job.options) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 452, in _stage_resources > staging_location=google_cloud_options.staging_location) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 161, in stage_job_resources > requirements_cache_path) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 411, in _populate_requirements_cache > processes.check_call(cmd_args) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py", > line 46, in check_call > return subprocess.check_call(*args, **kwargs) > File "/usr/lib/python2.7/subprocess.py", line 541, in check_call > raise CalledProcessError(retcode, cmd) > CalledProcessError: Command > '['/hom
[jira] [Created] (BEAM-5688) [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert
Scott Wegner created BEAM-5688: -- Summary: [beam_PreCommit_Website_Stage_GCS_Cron] [stageWebsite] Fails on githubPullRequestId assert Key: BEAM-5688 URL: https://issues.apache.org/jira/browse/BEAM-5688 Project: Beam Issue Type: Bug Components: test-failures Reporter: Scott Wegner Assignee: Alan Myrvold _Use this form to file an issue for test failure:_ * [Jenkins Job|https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Cron/20/] * [Gradle Build Scan|https://gradle.com/s/7mqwgjegf5hge] * [Test source code|https://github.com/apache/beam/blob/a19183b05f0271f0a927aafcd778235335b7d269/website/build.gradle#L234] Initial investigation: This is a problem with how the website gradle scripts are implemented to accept an githubPullRequestId. The Cron job will not have an associated PR, so this currently fails. _After you've filled out the above details, please [assign the issue to an individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. Assignee should [treat test failures as high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], helping to fix the issue or find a more appropriate owner. See [Apache Beam Post-Commit Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5687) Checkpointing in portable pipelines does not work
Maximilian Michels created BEAM-5687: Summary: Checkpointing in portable pipelines does not work Key: BEAM-5687 URL: https://issues.apache.org/jira/browse/BEAM-5687 Project: Beam Issue Type: Bug Components: runner-flink Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 2.9.0 Checkpoints fail: {noformat} AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> ToKeyedWorkItem (1/1).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.Exception: Could not materialize checkpoint 2 for operator Source: Custom Source -> 9TestInput.None/beam:env:docker:v1:0 -> ToKeyedWorkItem (1/1). at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943) ... 6 more Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53) at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854) ... 5 more Caused by: java.lang.NullPointerException at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer$CoderTypeSerializerConfigSnapshot.(CoderTypeSerializer.java:162) at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.snapshotConfiguration(CoderTypeSerializer.java:136) at org.apache.flink.runtime.state.RegisteredOperatorBackendStateMetaInfo.snapshot(RegisteredOperatorBackendStateMetaInfo.java:93) at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:394) at org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:352) at org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50) ... 7 more {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5281) There are several deadlinks in beam-site, please removed.
[ https://issues.apache.org/jira/browse/BEAM-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Wegner resolved BEAM-5281. Resolution: Duplicate Fix Version/s: Not applicable Dupe of BEAM-5681; there's something broken in the pre-commit scripts. > There are several deadlinks in beam-site, please removed. > - > > Key: BEAM-5281 > URL: https://issues.apache.org/jira/browse/BEAM-5281 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Boyuan Zhang >Assignee: Melissa Pashniak >Priority: Major > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > Deadlinks in beam-site cause nightly build failed: > https://scans.gradle.com/s/nzwfwj6iqlgrg/console-log?task=:beam-website:testWebsite#L13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5326) SDK support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5326?focusedWorklogId=152728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152728 ] ASF GitHub Bot logged work on BEAM-5326: Author: ASF GitHub Bot Created on: 09/Oct/18 15:41 Start Date: 09/Oct/18 15:41 Worklog Time Spent: 10m Work Description: herohde commented on a change in pull request #6615: [BEAM-5326] Shim main class and fix Go artifact naming mismatch for c… URL: https://github.com/apache/beam/pull/6615#discussion_r223756962 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/com/google/cloud/dataflow/worker/DataflowRunnerHarness.java ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package com.google.cloud.dataflow.worker; + +/** Temporary redirect for org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness. */ Review comment: I'll track that separately. The condition is internal to Dataflow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152728) Time Spent: 50m (was: 40m) > SDK support for custom dataflow worker jar > -- > > Key: BEAM-5326 > URL: https://issues.apache.org/jira/browse/BEAM-5326 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Henning Rohde >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Doc: > https://docs.google.com/document/d/1-m-GzkYWIODKOEl1ZSUNXYbcGRvRr3QkasfHsJxbuoA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5624) Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse')
[ https://issues.apache.org/jira/browse/BEAM-5624?focusedWorklogId=152710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152710 ] ASF GitHub Bot logged work on BEAM-5624: Author: ASF GitHub Bot Created on: 09/Oct/18 15:20 Start Date: 09/Oct/18 15:20 Worklog Time Spent: 10m Work Description: splovyt opened a new pull request #6616: [BEAM-5624] Fix avro.schema parser for py3 URL: https://github.com/apache/beam/pull/6616 Fix for the following error mentioned in BEAM-5624: _AttributeError (module 'avro.schema' has no attribute 'parse')_ This is is part of a series of PRs with goal to make Apache Beam PY3 compatible. The proposal with the outlined approach has been documented [here](https://s.apache.org/beam-python-3). @tvalentyn @Fematich @charlesccychen @aaltay @Juta @manuzhang Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152710) Time Spent: 10m Remaining Estimate: 0h > Avro IO does not work with avro-python3 package out-of-the-box on Python 3, > several tests fail with AttributeError (module 'avro.schema' has no attribute > 'parse') > --- > > Key: BEAM-5624 > URL: https://issues.apache.org/jira/browse/BEAM-5624 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Rep
[jira] [Commented] (BEAM-5683) [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary
[ https://issues.apache.org/jira/browse/BEAM-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643606#comment-16643606 ] Scott Wegner commented on BEAM-5683: [~pabloem] / [~robertwb] can either of you help out? bq. Can we access pip subprocess logs? > [beam_PostCommit_Py_VR_Dataflow] [test_multiple_empty_outputs] Failure summary > -- > > Key: BEAM-5683 > URL: https://issues.apache.org/jira/browse/BEAM-5683 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Scott Wegner >Assignee: Ankur Goenka >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/1289/] > * [Gradle Build > Scan|https://scans.gradle.com/s/hjmzvh4ylhs6y/console-log?task=:beam-sdks-python:validatesRunnerBatchTests] > * [Test source > code|https://github.com/apache/beam/blob/303a4275eb0a323761e1a4dec6a22fde9863acf8/sdks/python/apache_beam/runners/portability/stager.py#L390] > Initial investigation: > Seems to be failing on pip download. > == > ERROR: test_multiple_empty_outputs > (apache_beam.transforms.ptransform_test.PTransformTest) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/transforms/ptransform_test.py", > line 277, in test_multiple_empty_outputs > pipeline.run() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 104, in run > result = super(TestPipeline, self).run(test_runner_api) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 403, in run > self.to_runner_api(), self.runner, self._options).run(False) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/pipeline.py", > line 416, in run > return self.runner.run_pipeline(self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 50, in run_pipeline > self.result = super(TestDataflowRunner, self).run_pipeline(pipeline) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", > line 389, in run_pipeline > self.dataflow_client.create_job(self.job), self) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/retry.py", > line 184, in wrapper > return fun(*args, **kwargs) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 490, in create_job > self.create_job_description(job) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 519, in create_job_description > resources = self._stage_resour > ces(job.options) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py", > line 452, in _stage_resources > staging_location=google_cloud_options.staging_location) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 161, in stage_job_resources > requirements_cache_path) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/runners/portability/stager.py", > line 411, in _populate_requirements_cache > processes.check_call(cmd_args) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_VR_Dataflow/src/sdks/python/apache_beam/utils/processes.py", > line 46, in check_call > return subprocess.check_call(*args, **kwargs) > File "/usr/lib/python2.7/subprocess.py", line 541, in check_call > raise CalledProcessError(retcode, cmd) &
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152709&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152709 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 15:16 Start Date: 09/Oct/18 15:16 Worklog Time Spent: 10m Work Description: echauchot commented on issue #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#issuecomment-428233453 @aromanenko-dev thanks for the review ! I answered all your comments PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152709) Time Spent: 2h 50m (was: 2h 40m) > Implement a Graphite sink for the metrics pusher > > > Key: BEAM-4553 > URL: https://issues.apache.org/jira/browse/BEAM-4553 > Project: Beam > Issue Type: Sub-task > Components: runner-extensions-metrics >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Today only a REST Http sink that sends raw json metrics using POST request to > a http server is available. It is more a POC sink. It would be good to code > the first real metrics sink. Some of the most popular is Graphite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152708 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 15:15 Start Date: 09/Oct/18 15:15 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223745814 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; + private final String address; + private final int port; + private final Charset charset; + + public MetricsGraphiteSink(PipelineOptions pipelineOptions) { +this.address = pipelineOptions.getMetricsGraphiteHost(); +this.port = pipelineOptions.getMetricsGraphitePort(); +this.charset = UTF_8; + } + + @Experimental(Experimental.Kind.METRICS) + @Override + public void writeMetrics(MetricQueryResults metricQueryResults) throws Exception { +final long metricTimestamp = System.currentTimeMillis() / 1000L; +Socket socket = new Socket(InetAddress.getByName(address), port); +BufferedWriter writer = +new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), charset)); +StringBuilder messagePayload = new StringBuilder(); +Iterable> counters = metricQueryResults.getCounters(); +Iterable> gauges = metricQueryResults.getGauges(); +Iterable> distributions = +metricQueryResults.getDistributions(); + +for (MetricResult counter : counters) { + // if committed metrics are not supported, exception is thrown and we don't append the message + try { +messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, false)); +} + +for (MetricResult gauge : gauges) { + try { +messagePayload.append(createGaugeGraphiteMessage(gauge, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messageP
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152707&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152707 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 15:15 Start Date: 09/Oct/18 15:15 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223745814 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; + private final String address; + private final int port; + private final Charset charset; + + public MetricsGraphiteSink(PipelineOptions pipelineOptions) { +this.address = pipelineOptions.getMetricsGraphiteHost(); +this.port = pipelineOptions.getMetricsGraphitePort(); +this.charset = UTF_8; + } + + @Experimental(Experimental.Kind.METRICS) + @Override + public void writeMetrics(MetricQueryResults metricQueryResults) throws Exception { +final long metricTimestamp = System.currentTimeMillis() / 1000L; +Socket socket = new Socket(InetAddress.getByName(address), port); +BufferedWriter writer = +new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), charset)); +StringBuilder messagePayload = new StringBuilder(); +Iterable> counters = metricQueryResults.getCounters(); +Iterable> gauges = metricQueryResults.getGauges(); +Iterable> distributions = +metricQueryResults.getDistributions(); + +for (MetricResult counter : counters) { + // if committed metrics are not supported, exception is thrown and we don't append the message + try { +messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, false)); +} + +for (MetricResult gauge : gauges) { + try { +messagePayload.append(createGaugeGraphiteMessage(gauge, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messageP
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152705 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 15:11 Start Date: 09/Oct/18 15:11 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223744438 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; + private final String address; + private final int port; + private final Charset charset; + + public MetricsGraphiteSink(PipelineOptions pipelineOptions) { +this.address = pipelineOptions.getMetricsGraphiteHost(); +this.port = pipelineOptions.getMetricsGraphitePort(); +this.charset = UTF_8; + } + + @Experimental(Experimental.Kind.METRICS) + @Override + public void writeMetrics(MetricQueryResults metricQueryResults) throws Exception { +final long metricTimestamp = System.currentTimeMillis() / 1000L; +Socket socket = new Socket(InetAddress.getByName(address), port); +BufferedWriter writer = +new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), charset)); +StringBuilder messagePayload = new StringBuilder(); +Iterable> counters = metricQueryResults.getCounters(); +Iterable> gauges = metricQueryResults.getGauges(); +Iterable> distributions = +metricQueryResults.getDistributions(); + +for (MetricResult counter : counters) { + // if committed metrics are not supported, exception is thrown and we don't append the message + try { +messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, false)); +} + +for (MetricResult gauge : gauges) { + try { +messagePayload.append(createGaugeGraphiteMessage(gauge, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messageP
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152704 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 15:10 Start Date: 09/Oct/18 15:10 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223743972 ## File path: runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSinkTest.java ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import static org.junit.Assert.assertTrue; + +import java.io.IOException; +import java.net.ServerSocket; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +/** Test class for MetricsGraphiteSink. */ +public class MetricsGraphiteSinkTest { + private static NetworkMockServer graphiteServer; + private static int port; + + @BeforeClass + public static void beforeClass() throws IOException, InterruptedException { +// get free local port +ServerSocket serverSocket = new ServerSocket(0); +port = serverSocket.getLocalPort(); +serverSocket.close(); +graphiteServer = new NetworkMockServer(port); +Thread.sleep(200); +graphiteServer.clear(); +graphiteServer.start(); + } + + @Before + public void before() { +graphiteServer.clear(); + } + + @AfterClass + public static void afterClass() throws IOException { +graphiteServer.stop(); + } + + @Test + public void testWriteMetricsWithCommittedSupported() throws Exception { +MetricQueryResults metricQueryResults = new CustomMetricQueryResults(true); +PipelineOptions pipelineOptions = PipelineOptionsFactory.create(); +pipelineOptions.setMetricsGraphitePort(port); +pipelineOptions.setMetricsGraphiteHost("127.0.0.1"); +MetricsGraphiteSink metricsGraphiteSink = new MetricsGraphiteSink(pipelineOptions); +metricsGraphiteSink.writeMetrics(metricQueryResults); +Thread.sleep(2000L); Review comment: Yes because, when we write messages to the socket, then on a different thread `NetworkMockServer` reads the socket and adds messages to an arrayList so that they could be read in the assert. On heavy load jenkins server, the test might fail because the `NetworkMockServer` does not have enough time to add the messages to the arraylist This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152704) Time Spent: 2h 10m (was: 2h) > Implement a Graphite sink for the metrics pusher > > > Key: BEAM-4553 > URL: https://issues.apache.org/jira/browse/BEAM-4553 > Project: Beam > Issue Type: Sub-task > Components: runner-extensions-metrics >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Today only a REST Http sink that sends raw json metrics using POST request to > a http server is available. It is more a POC sink. It would be good to code > the first real metrics sink. Some of the most popular is Graphite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152696&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152696 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 09/Oct/18 15:01 Start Date: 09/Oct/18 15:01 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #6590: [BEAM-5315] Partially port io URL: https://github.com/apache/beam/pull/6590#discussion_r223733107 ## File path: sdks/python/apache_beam/io/filebasedsink_test.py ## @@ -75,6 +76,10 @@ def _create_temp_file(self, name='', suffix=''): class MyFileBasedSink(filebasedsink.FileBasedSink): + @unittest.skipIf(sys.version_info[0] == 3 and + os.environ.get('RUN_SKIPPED_PY3_TESTS') != '1', + 'This test still needs to be fixed on Python 3.' + 'TODO: BEAM-5627') Review comment: Let's add: TODO: BEAM-5627, BEAM-5618 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152696) Time Spent: 3h 10m (was: 3h) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Simon >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=152697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152697 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 09/Oct/18 15:01 Start Date: 09/Oct/18 15:01 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #6590: [BEAM-5315] Partially port io URL: https://github.com/apache/beam/pull/6590#discussion_r223731816 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -21,7 +21,9 @@ import io import logging +import os Review comment: We recently merged https://github.com/apache/beam/pull/6587. All tests in this file are now passing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152697) Time Spent: 3h 10m (was: 3h) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Simon >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152694&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152694 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 14:55 Start Date: 09/Oct/18 14:55 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223737331 ## File path: runners/extensions-java/metrics/src/test/java/org/apache/beam/runners/extensions/metrics/CustomMetricQueryResults.java ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.extensions.metrics; + +import java.util.Collections; +import java.util.List; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricName; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.joda.time.Instant; + +/** Test class to be used as a input to {@link MetricsSink} implementations tests. */ +class CustomMetricQueryResults implements MetricQueryResults { + + private final boolean isCommittedSupported; + + CustomMetricQueryResults(boolean isCommittedSupported) { +this.isCommittedSupported = isCommittedSupported; + } + + @Override + public List> getCounters() { +return Collections.singletonList( +new MetricResult() { + + @Override + public MetricName getName() { +return MetricName.named("ns1", "n1"); + } + + @Override + public String getStep() { +return "s1"; + } + + @Override + public Long getCommitted() { +if (!isCommittedSupported) { + // This is what getCommitted code is like for AccumulatedMetricResult on runners + // that do not support committed metrics + throw new UnsupportedOperationException( + "This runner does not currently support committed" + + " metrics results. Please use 'attempted' instead."); +} +return 10L; Review comment: yes just test value This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152694) Time Spent: 2h (was: 1h 50m) > Implement a Graphite sink for the metrics pusher > > > Key: BEAM-4553 > URL: https://issues.apache.org/jira/browse/BEAM-4553 > Project: Beam > Issue Type: Sub-task > Components: runner-extensions-metrics >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Today only a REST Http sink that sends raw json metrics using POST request to > a http server is available. It is more a POC sink. It would be good to code > the first real metrics sink. Some of the most popular is Graphite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152693 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 14:54 Start Date: 09/Oct/18 14:54 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223736905 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; + private final String address; + private final int port; + private final Charset charset; + + public MetricsGraphiteSink(PipelineOptions pipelineOptions) { +this.address = pipelineOptions.getMetricsGraphiteHost(); +this.port = pipelineOptions.getMetricsGraphitePort(); +this.charset = UTF_8; + } + + @Experimental(Experimental.Kind.METRICS) + @Override + public void writeMetrics(MetricQueryResults metricQueryResults) throws Exception { +final long metricTimestamp = System.currentTimeMillis() / 1000L; +Socket socket = new Socket(InetAddress.getByName(address), port); +BufferedWriter writer = +new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), charset)); +StringBuilder messagePayload = new StringBuilder(); +Iterable> counters = metricQueryResults.getCounters(); +Iterable> gauges = metricQueryResults.getGauges(); +Iterable> distributions = +metricQueryResults.getDistributions(); + +for (MetricResult counter : counters) { + // if committed metrics are not supported, exception is thrown and we don't append the message + try { +messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messagePayload.append(createCounterGraphiteMessage(metricTimestamp, counter, false)); +} + +for (MetricResult gauge : gauges) { + try { +messagePayload.append(createGaugeGraphiteMessage(gauge, true)); + } catch (UnsupportedOperationException e) { +if (!e.getMessage().contains("committed metrics")) { + throw e; +} + } + messageP
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152692&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152692 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 14:48 Start Date: 09/Oct/18 14:48 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223714684 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; + private final String address; + private final int port; + private final Charset charset; + + public MetricsGraphiteSink(PipelineOptions pipelineOptions) { +this.address = pipelineOptions.getMetricsGraphiteHost(); +this.port = pipelineOptions.getMetricsGraphitePort(); +this.charset = UTF_8; + } + + @Experimental(Experimental.Kind.METRICS) + @Override + public void writeMetrics(MetricQueryResults metricQueryResults) throws Exception { +final long metricTimestamp = System.currentTimeMillis() / 1000L; +Socket socket = new Socket(InetAddress.getByName(address), port); +BufferedWriter writer = +new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), charset)); +StringBuilder messagePayload = new StringBuilder(); +Iterable> counters = metricQueryResults.getCounters(); +Iterable> gauges = metricQueryResults.getGauges(); +Iterable> distributions = +metricQueryResults.getDistributions(); + +for (MetricResult counter : counters) { Review comment: I thought about it but the difference between the loops are the generic type, so with type erasure, I would need to pass a class to the common method and use a switch class in the code. I prefer the 3 loops to that kind of code. WDYT, anything to suggest ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152692) Time Spent: 1h 40m (was: 1.5h) > Implement a Graphite sink for the metrics pusher > --
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152680 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 14:04 Start Date: 09/Oct/18 14:04 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223714684 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; + private final String address; + private final int port; + private final Charset charset; + + public MetricsGraphiteSink(PipelineOptions pipelineOptions) { +this.address = pipelineOptions.getMetricsGraphiteHost(); +this.port = pipelineOptions.getMetricsGraphitePort(); +this.charset = UTF_8; + } + + @Experimental(Experimental.Kind.METRICS) + @Override + public void writeMetrics(MetricQueryResults metricQueryResults) throws Exception { +final long metricTimestamp = System.currentTimeMillis() / 1000L; +Socket socket = new Socket(InetAddress.getByName(address), port); +BufferedWriter writer = +new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), charset)); +StringBuilder messagePayload = new StringBuilder(); +Iterable> counters = metricQueryResults.getCounters(); +Iterable> gauges = metricQueryResults.getGauges(); +Iterable> distributions = +metricQueryResults.getDistributions(); + +for (MetricResult counter : counters) { Review comment: I thought about it but the difference between the loops are the generic type, so with type erasure, I would need to pass a class to the common method and use a switch class in the code. That would be even less maintainable This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152680) Time Spent: 1.5h (was: 1h 20m) > Implement a Graphite sink for the metrics pusher > ---
[jira] [Work logged] (BEAM-5467) Python Flink ValidatesRunner job fixes
[ https://issues.apache.org/jira/browse/BEAM-5467?focusedWorklogId=152679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152679 ] ASF GitHub Bot logged work on BEAM-5467: Author: ASF GitHub Bot Created on: 09/Oct/18 14:04 Start Date: 09/Oct/18 14:04 Worklog Time Spent: 10m Work Description: tweise commented on a change in pull request #6532: [BEAM-5467] Use process SDKHarness to run flink PVR tests. URL: https://github.com/apache/beam/pull/6532#discussion_r223714650 ## File path: sdks/python/build.gradle ## @@ -361,3 +372,21 @@ task dependencyUpdates(dependsOn: ':dependencyUpdates') { } } } + +project.task('createProcessWorker') { + dependsOn ':beam-sdks-python-container:build' + dependsOn 'setupVirtualenv' + def outputFile = file("${project.buildDir}/sdk_worker.sh") + def workerScript = "${project(":beam-sdks-python-container:").buildDir.absolutePath}/target/launcher/linux_amd64/boot" + def text = "sh -c \". ${envdir}/bin/activate && ${workerScript} \$* \"" Review comment: `sdkWorkerCommand` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152679) Time Spent: 5h 10m (was: 5h) > Python Flink ValidatesRunner job fixes > ------ > > Key: BEAM-5467 > URL: https://issues.apache.org/jira/browse/BEAM-5467 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Minor > Labels: portability-flink > Time Spent: 5h 10m > Remaining Estimate: 0h > > Add status to README > Rename script and job for consistency > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4553) Implement a Graphite sink for the metrics pusher
[ https://issues.apache.org/jira/browse/BEAM-4553?focusedWorklogId=152678&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-152678 ] ASF GitHub Bot logged work on BEAM-4553: Author: ASF GitHub Bot Created on: 09/Oct/18 13:53 Start Date: 09/Oct/18 13:53 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #6569: [BEAM-4553] Implement graphite sink for MetricsPusher and refactor MetricsHttpSink test URL: https://github.com/apache/beam/pull/6569#discussion_r223709815 ## File path: runners/extensions-java/metrics/src/main/java/org/apache/beam/runners/extensions/metrics/MetricsGraphiteSink.java ## @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.beam.runners.extensions.metrics; + +import edu.umd.cs.findbugs.annotations.SuppressFBWarnings; +import java.io.BufferedWriter; +import java.io.OutputStreamWriter; +import java.net.InetAddress; +import java.net.Socket; +import java.nio.charset.Charset; +import java.util.Locale; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.metrics.DistributionResult; +import org.apache.beam.sdk.metrics.GaugeResult; +import org.apache.beam.sdk.metrics.MetricQueryResults; +import org.apache.beam.sdk.metrics.MetricResult; +import org.apache.beam.sdk.metrics.MetricsSink; +import org.apache.beam.sdk.options.PipelineOptions; + +/** + * Sink to push metrics to Graphite. Graphite requires a timestamp. So metrics are reported with the + * timestamp (seconds from epoch) when the push to the sink was done (except with gauges that + * already have a timestamp value). The graphite metric name will be in the form of + * beam.metricType.metricNamespace.metricName.[committed|attempted].metricValueType For example: + * {@code beam.counter.throughput.nbRecords.attempted.value} Or {@code + * beam.distribution.throughput.nbRecordsPerSec.attempted.mean} + */ +public class MetricsGraphiteSink implements MetricsSink { + private static final Charset UTF_8 = Charset.forName("UTF-8"); + private static final Pattern WHITESPACE = Pattern.compile("[\\s]+"); + private static final String SPACE_REPLACEMENT = "_"; Review comment: It was just to give it a name, not for deduplication This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 152678) Time Spent: 1h 20m (was: 1h 10m) > Implement a Graphite sink for the metrics pusher > > > Key: BEAM-4553 > URL: https://issues.apache.org/jira/browse/BEAM-4553 > Project: Beam > Issue Type: Sub-task > Components: runner-extensions-metrics >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Today only a REST Http sink that sends raw json metrics using POST request to > a http server is available. It is more a POC sink. It would be good to code > the first real metrics sink. Some of the most popular is Graphite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)