[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410854 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 27/Mar/20 05:50 Start Date: 27/Mar/20 05:50 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #11246: [BEAM-9136]Add licenses for dependencies for Go URL: https://github.com/apache/beam/pull/11246#issuecomment-604822387 R: @alanmyrvold, @robertwb Cc: @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410854) Time Spent: 9h (was: 8h 50m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410850=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410850 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Mar/20 05:33 Start Date: 27/Mar/20 05:33 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#issuecomment-604823688 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410850) Time Spent: 34h 10m (was: 34h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 34h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410849 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 27/Mar/20 05:28 Start Date: 27/Mar/20 05:28 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #11246: [BEAM-9136]Add licenses for dependencies for Go URL: https://github.com/apache/beam/pull/11246#issuecomment-604822387 R: @alanmyrvold, @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410849) Time Spent: 8h 50m (was: 8h 40m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410848=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410848 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 27/Mar/20 05:28 Start Date: 27/Mar/20 05:28 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11246: [BEAM-9136]Add licenses for dependencies for Go URL: https://github.com/apache/beam/pull/11246#discussion_r399042837 ## File path: sdks/go/container/license_script.sh ## @@ -0,0 +1,25 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +output_dir=third_party_licenses +# remove output_dir if existing +if [ -d "$output_dir" ]; then rm -rf $output_dir; fi + +# get go-licenses and run +go get github.com/google/go-licenses +$GOPATH/bin/go-licenses save "github.com/apache/beam/sdks/go/pkg/beam/" --save_path="$output_dir" Review comment: This line returns `not found` error when run with Jenkins. When I test with my machine, `$GOPATH/bin/go-license` worked. I tried with `go-licenses`, `$GOPATH/bin/go-license`, `/usr/bin/go/bin/go-licenses` but no one worked. All returned `not found` error. [log](https://builds.apache.org/job/beam_PreCommit_Go_Commit/6020/console) How do we run a Go package within Jenkins? Can we use the same script to run both locally and Jenkins? And in case users want to customize it, it should be able to run at users' machine as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410848) Time Spent: 8h 40m (was: 8.5h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=410839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410839 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 27/Mar/20 04:53 Start Date: 27/Mar/20 04:53 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #11199: [BEAM-9562] Update Timer encoding with respect of dynamic timers URL: https://github.com/apache/beam/pull/11199#issuecomment-604814424 Most python and Java SDK part has been done. Remaining work for java runner hookup. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410839) Time Spent: 5h 40m (was: 5.5h) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410831 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 27/Mar/20 04:32 Start Date: 27/Mar/20 04:32 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11246: [BEAM-9136]Add licenses for dependencies for Go URL: https://github.com/apache/beam/pull/11246 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410829 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Mar/20 04:26 Start Date: 27/Mar/20 04:26 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#issuecomment-604808223 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410829) Time Spent: 34h (was: 33h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 34h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=410826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410826 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 27/Mar/20 04:21 Start Date: 27/Mar/20 04:21 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11216: [BEAM-9562] Remove TimerSpec from Proto URL: https://github.com/apache/beam/pull/11216 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410826) Time Spent: 5.5h (was: 5h 20m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=410825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410825 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 27/Mar/20 04:20 Start Date: 27/Mar/20 04:20 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #11216: [BEAM-9562] Remove TimerSpec from Proto URL: https://github.com/apache/beam/pull/11216#issuecomment-604806948 All tests passed. Going to merge it. Thanks for your help! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410825) Time Spent: 5h 20m (was: 5h 10m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=410817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410817 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 27/Mar/20 03:30 Start Date: 27/Mar/20 03:30 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#discussion_r399015980 ## File path: sdks/python/apache_beam/transforms/sql_test.py ## @@ -0,0 +1,109 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for transforms that use the SQL Expansion service.""" + +# pytype: skip-file + +from __future__ import absolute_import + +import logging +import typing +import unittest + +from nose.plugins.attrib import attr +from past.builtins import unicode + +import apache_beam as beam +from apache_beam import coders +from apache_beam.options.pipeline_options import DebugOptions +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to +from apache_beam.transforms.sql import SqlTransform +from apache_beam.utils import subprocess_server + +SimpleRow = typing.NamedTuple( +"SimpleRow", [("int", int), ("str", unicode), ("flt", float)]) +coders.registry.register_coder(SimpleRow, coders.RowCoder) + + +@attr('UsesSqlExpansionService') +@unittest.skipIf( +TestPipeline().get_pipeline_options().view_as(StandardOptions).runner is +None, +"Must be run with a runner that supports cross-language transforms") +class SqlTransformTest(unittest.TestCase): + """Tests that exercise the cross-language SqlTransform (implemented in java). + + Note this test must be executed with pipeline options that run jobs on a local + job server. The easiest way to accomplish this is to run the + `validatesCrossLanguageRunnerPythonUsingSql` gradle target for a particular + job server, which will start the runner and job server for you. For example, + `:runners:flink:1.10:job-server:validatesCrossLanguageRunnerPythonUsingSql` to + test on Flink 1.10. + + Alternatively, you may be able to iterate faster if you run the tests directly + using a runner like `FlinkRunner`, which starts its own job server, but you'll + need to spin up a local flink cluster: +$ pip install -e './sdks/python[gcp,test]' +$ python ./sdks/python/setup.py nosetests \\ +--tests apache_beam.transforms.sql_test \\ +--test-pipeline-options="--runner=FlinkRunner \\ + --flink_version=1.10 \\ + --flink_master=localhost:8081" + """ + @staticmethod + def make_test_pipeline(): +path_to_jar = subprocess_server.JavaJarServer.path_to_beam_jar( +":sdks:java:extensions:sql:expansion-service:shadowJar") +test_pipeline = TestPipeline() +test_pipeline.get_pipeline_options().view_as(DebugOptions).experiments = [ +'jar_packages=' + path_to_jar Review comment: We can remove `jar_packages` flag when BEAM-9238 is done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410817) Time Spent: 5h 10m (was: 5h) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8751) Beam Dependency Update Request: com.google.apis:google-api-services-cloudresourcemanager
[ https://issues.apache.org/jira/browse/BEAM-8751?focusedWorklogId=410816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410816 ] ASF GitHub Bot logged work on BEAM-8751: Author: ASF GitHub Bot Created on: 27/Mar/20 03:28 Start Date: 27/Mar/20 03:28 Worklog Time Spent: 10m Work Description: suztomo commented on issue #11208: [BEAM-8751] google-api-client 1.30.9 URL: https://github.com/apache/beam/pull/11208#issuecomment-604795796 R: @lukecwik 22 successful checks! @aaltay Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410816) Time Spent: 2h (was: 1h 50m) > Beam Dependency Update Request: > com.google.apis:google-api-services-cloudresourcemanager > > > Key: BEAM-8751 > URL: https://issues.apache.org/jira/browse/BEAM-8751 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > - 2019-11-19 21:04:41.938497 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191018-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:09:51.401493 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191115-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:09:00.761817 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191115-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-23 12:09:01.384571 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191206-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-30 14:04:31.850871 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191206-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-06 12:08:07.241510 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191206-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-13 12:08:00.916536 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191206-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above.
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=410815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410815 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 27/Mar/20 03:27 Start Date: 27/Mar/20 03:27 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #11216: [BEAM-9562] Remove TimerSpec from Proto URL: https://github.com/apache/beam/pull/11216#issuecomment-604795592 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410815) Time Spent: 5h 10m (was: 5h) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4150) Standardize use of PCollection coder proto attribute
[ https://issues.apache.org/jira/browse/BEAM-4150?focusedWorklogId=410796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410796 ] ASF GitHub Bot logged work on BEAM-4150: Author: ASF GitHub Bot Created on: 27/Mar/20 02:54 Start Date: 27/Mar/20 02:54 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11222: [BEAM-4150] Don't window PCollection coders. URL: https://github.com/apache/beam/pull/11222#issuecomment-604788204 Run PythonDocker PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410796) Time Spent: 9h (was: 8h 50m) > Standardize use of PCollection coder proto attribute > > > Key: BEAM-4150 > URL: https://issues.apache.org/jira/browse/BEAM-4150 > Project: Beam > Issue Type: Task > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Luke Cwik >Priority: Major > Fix For: 2.20.0 > > Time Spent: 9h > Remaining Estimate: 0h > > In some places it's expected to be a WindowedCoder, in others the raw > ElementCoder. We should use the same convention (decided in discussion to be > the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the > attached windowing strategy, and the input/output ports should specify the > encoding directly rather than read the adjacent PCollection coder fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark
[ https://issues.apache.org/jira/browse/BEAM-9434?focusedWorklogId=410790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410790 ] ASF GitHub Bot logged work on BEAM-9434: Author: ASF GitHub Bot Created on: 27/Mar/20 02:40 Start Date: 27/Mar/20 02:40 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11037: [BEAM-9434] performance improvements reading many Avro files in S3 URL: https://github.com/apache/beam/pull/11037#issuecomment-604785112 Sorry about the long delay but **Reshuffle** should produce as many partitions as the runner thinks is optimal. It is effectively a **redistribute** operation. It looks like the spark translation is copying the number of partitions from the upstream transform for the reshuffle translation and in your case this is likely 1. Translation: https://github.com/apache/beam/blob/f5a4a5afcd9425c0ddb9ec9c70067a5d5c0bc769/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L681 Copying partitions: https://github.com/apache/beam/blob/f5a4a5afcd9425c0ddb9ec9c70067a5d5c0bc769/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java#L191 @iemejia Shouldn't we be using a much larger value for partitions, e.g. the number of nodes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410790) Time Spent: 2h 50m (was: 2h 40m) > Performance improvements processing a large number of Avro files in S3+Spark > > > Key: BEAM-9434 > URL: https://issues.apache.org/jira/browse/BEAM-9434 > Project: Beam > Issue Type: Improvement > Components: io-java-aws, sdk-java-core >Affects Versions: 2.19.0 >Reporter: Emiliano Capoccia >Assignee: Emiliano Capoccia >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > There is a performance issue when processing a large number of small Avro > files in Spark on K8S (tens of thousands or more). > The recommended way of reading a pattern of Avro files in Beam is by means of: > > {code:java} > PCollection records = p.apply(AvroIO.read(AvroGenClass.class) > .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles()) > {code} > However, in the case of many small files, the above results in the entire > reading taking place in a single task/node, which is considerably slow and > has scalability issues. > The option of omitting the hint is not viable, as it results in too many > tasks being spawn, and the cluster being busy doing coordination of tiny > tasks with high overhead. > There are a few workarounds on the internet which mainly revolve around > compacting the input files before processing, so that a reduced number of > bulky files is processed in parallel. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9399) Possible deadlock between DataflowWorkerLoggingHandler and overridden System.err PrintStream
[ https://issues.apache.org/jira/browse/BEAM-9399?focusedWorklogId=410776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410776 ] ASF GitHub Bot logged work on BEAM-9399: Author: ASF GitHub Bot Created on: 27/Mar/20 02:24 Start Date: 27/Mar/20 02:24 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11096: [BEAM-9399] Change the redirection of System.err to be a custom PrintStream URL: https://github.com/apache/beam/pull/11096#issuecomment-604781282 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410776) Time Spent: 3h 20m (was: 3h 10m) > Possible deadlock between DataflowWorkerLoggingHandler and overridden > System.err PrintStream > > > Key: BEAM-9399 > URL: https://issues.apache.org/jira/browse/BEAM-9399 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Minor > Time Spent: 3h 20m > Remaining Estimate: 0h > > When an exception is encountered in DataflowWorkerLoggingHandler the > ErrorManager is used to log the exception. ErrorManager uses System.err > which is overridden to be a PrintStream that writes back into > DataflowWorkerLoggingHandler. > This has the lock ordering DataflowWorkerLoggingHandler -> PrintStream. > Other logging of System.err has the inverse lock ordering > PrintStream->DataflowWorkerLoggingHandler so there is potential for deadlock. > This is one known cause of the inversion, but any other System.err logs from > inside DataflowWorkerLoggingHandler could cause the same issue. > Proposed fix is to address low-hanging fruit of having ErrorManager output to > the original System.err. A full fix would be to improve our override of > System.err to a PrintStream that can detect the locking inversion or possibly > we could use the PrintStream mutex in both cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410775 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Mar/20 02:23 Start Date: 27/Mar/20 02:23 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#issuecomment-604780915 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410775) Time Spent: 33h 50m (was: 33h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 33h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API
[ https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=410774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410774 ] ASF GitHub Bot logged work on BEAM-8932: Author: ASF GitHub Bot Created on: 27/Mar/20 02:22 Start Date: 27/Mar/20 02:22 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10478: [BEAM-8932][Cleanup] Extract PubsubBoundedWriter from PubsubIO URL: https://github.com/apache/beam/pull/10478#issuecomment-604780730 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410774) Time Spent: 17h 20m (was: 17h 10m) > Expose complete Cloud Pub/Sub messages through PubsubIO API > --- > > Key: BEAM-8932 > URL: https://issues.apache.org/jira/browse/BEAM-8932 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: Major > Time Spent: 17h 20m > Remaining Estimate: 0h > > The PubsubIO API only exposes a subset of the fields in the underlying > PubsubMessage protocol buffer. To accomodate future feature changes as well > as for greater compatability with code using the Cloud Pub/Sub apis, a method > to read and write these protocol messages should be exposed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8648) Euphoria: Deprecate OutputHints from public API
[ https://issues.apache.org/jira/browse/BEAM-8648?focusedWorklogId=410773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410773 ] ASF GitHub Bot logged work on BEAM-8648: Author: ASF GitHub Bot Created on: 27/Mar/20 02:21 Start Date: 27/Mar/20 02:21 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10084: [BEAM-8648] Deprecate OutputHints from Euphoria API. URL: https://github.com/apache/beam/pull/10084#issuecomment-604780643 What should be the next action here? Should we close this PR? Is it ready to be merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410773) Time Spent: 1h 40m (was: 1.5h) > Euphoria: Deprecate OutputHints from public API > --- > > Key: BEAM-8648 > URL: https://issues.apache.org/jira/browse/BEAM-8648 > Project: Beam > Issue Type: Improvement > Components: dsl-euphoria >Reporter: David Morávek >Assignee: David Morávek >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Deprecate OutputHints as they are no longer used during translation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9620) textio (and fileio in general) takes too long to estimate sizes of large globs
[ https://issues.apache.org/jira/browse/BEAM-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068204#comment-17068204 ] Chamikara Madhusanka Jayalath commented on BEAM-9620: - Though it might make the source not work in the way it's implemented today. We rely on estimate_size() to perform initial splitting at workers which has to work for the source to work. If we time limit, we have to make sure that splitting/reading is not affected. > textio (and fileio in general) takes too long to estimate sizes of large globs > -- > > Key: BEAM-9620 > URL: https://issues.apache.org/jira/browse/BEAM-9620 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > As a workaround we could introduce a way to not perform size estimation when > reading large globs. For example Java SDK has withHintMatchesManyFiles() > option. > > [https://github.com/apache/beam/blob/850e8469de798d45ec535fe90cb2dc5dbda4974a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L371] > > Additionally, seems like we are repeating the size estimation where the same > PCollection read from a file-based source is applied to multiple PTransforms. > > See following for more details. > [https://stackoverflow.com/questions/60874942/avoid-recomputing-size-of-all-cloud-storage-files-in-gcsio-beam-python-sdk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9620) textio (and fileio in general) takes too long to estimate sizes of large globs
[ https://issues.apache.org/jira/browse/BEAM-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068203#comment-17068203 ] Chamikara Madhusanka Jayalath commented on BEAM-9620: - Yeah, that makes sense. > textio (and fileio in general) takes too long to estimate sizes of large globs > -- > > Key: BEAM-9620 > URL: https://issues.apache.org/jira/browse/BEAM-9620 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > As a workaround we could introduce a way to not perform size estimation when > reading large globs. For example Java SDK has withHintMatchesManyFiles() > option. > > [https://github.com/apache/beam/blob/850e8469de798d45ec535fe90cb2dc5dbda4974a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L371] > > Additionally, seems like we are repeating the size estimation where the same > PCollection read from a file-based source is applied to multiple PTransforms. > > See following for more details. > [https://stackoverflow.com/questions/60874942/avoid-recomputing-size-of-all-cloud-storage-files-in-gcsio-beam-python-sdk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=410759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410759 ] ASF GitHub Bot logged work on BEAM-9444: Author: ASF GitHub Bot Created on: 27/Mar/20 01:44 Start Date: 27/Mar/20 01:44 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11156: [BEAM-9444] Use GCP Libraries BOM for Google Cloud Dependencies URL: https://github.com/apache/beam/pull/11156#issuecomment-604772021 Run SQL Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410759) Time Spent: 10h 20m (was: 10h 10m) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot > 2020-03-17 at 16.01.16.png > > Time Spent: 10h 20m > Remaining Estimate: 0h > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > h1. Implementation > Notes for obstacles. > h2. BeamModulePlugin's "force" does not take BOM into account (thus fails) > {{forcedModules}} via version resolution strategy is playing bad. This causes > {noformat} > A problem occurred evaluating project ':sdks:java:extensions:sql'. > Could not resolve all dependencies for configuration > ':sdks:java:extensions:sql:fmppTemplates'. > Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version > cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat} > !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! > > h2. :sdks:java:maven-archetypes:examples needs the version of > google-http-client > The task requires the version for the library: > {code:java} > 'google-http-client.version': > dependencies.create(project.library.java.google_http_client).getVersion(), > {code} > This would generate NullPointerException. Running gradlew without the > subproject: > > {code:java} > ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check > {code} > h1. Problem in Gradle-generated pom files > The generated Maven artifact POM has invalid data due to the BOM change. For > example my locally installed > {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}} > had the following problems. > h2. The GCP Libraries BOM showing up in dependencies section: > {noformat} > > > com.google.cloud > libraries-bom > 4.2.0 > compile > > > com.google.guava > guava-jdk5 > ... > > > {noformat} > h2. The artifact that use the BOM in Gradle is missing version in the > dependency. > {noformat} > > com.google.api > gax > > compile > ... > > {noformat} > h1. DependencyManagement section in generated pom.xml > How can I check whether a entry in dependencies is "platform"? > !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=410754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410754 ] ASF GitHub Bot logged work on BEAM-9444: Author: ASF GitHub Bot Created on: 27/Mar/20 01:43 Start Date: 27/Mar/20 01:43 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11156: [BEAM-9444] Use GCP Libraries BOM for Google Cloud Dependencies URL: https://github.com/apache/beam/pull/11156#issuecomment-604771821 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410754) Time Spent: 9.5h (was: 9h 20m) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot > 2020-03-17 at 16.01.16.png > > Time Spent: 9.5h > Remaining Estimate: 0h > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > h1. Implementation > Notes for obstacles. > h2. BeamModulePlugin's "force" does not take BOM into account (thus fails) > {{forcedModules}} via version resolution strategy is playing bad. This causes > {noformat} > A problem occurred evaluating project ':sdks:java:extensions:sql'. > Could not resolve all dependencies for configuration > ':sdks:java:extensions:sql:fmppTemplates'. > Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version > cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat} > !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! > > h2. :sdks:java:maven-archetypes:examples needs the version of > google-http-client > The task requires the version for the library: > {code:java} > 'google-http-client.version': > dependencies.create(project.library.java.google_http_client).getVersion(), > {code} > This would generate NullPointerException. Running gradlew without the > subproject: > > {code:java} > ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check > {code} > h1. Problem in Gradle-generated pom files > The generated Maven artifact POM has invalid data due to the BOM change. For > example my locally installed > {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}} > had the following problems. > h2. The GCP Libraries BOM showing up in dependencies section: > {noformat} > > > com.google.cloud > libraries-bom > 4.2.0 > compile > > > com.google.guava > guava-jdk5 > ... > > > {noformat} > h2. The artifact that use the BOM in Gradle is missing version in the > dependency. > {noformat} > > com.google.api > gax > > compile > ... > > {noformat} > h1. DependencyManagement section in generated pom.xml > How can I check whether a entry in dependencies is "platform"? > !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=410755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410755 ] ASF GitHub Bot logged work on BEAM-9444: Author: ASF GitHub Bot Created on: 27/Mar/20 01:43 Start Date: 27/Mar/20 01:43 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11156: [BEAM-9444] Use GCP Libraries BOM for Google Cloud Dependencies URL: https://github.com/apache/beam/pull/11156#issuecomment-604771862 Run Java HadoopFormatIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410755) Time Spent: 9h 40m (was: 9.5h) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot > 2020-03-17 at 16.01.16.png > > Time Spent: 9h 40m > Remaining Estimate: 0h > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > h1. Implementation > Notes for obstacles. > h2. BeamModulePlugin's "force" does not take BOM into account (thus fails) > {{forcedModules}} via version resolution strategy is playing bad. This causes > {noformat} > A problem occurred evaluating project ':sdks:java:extensions:sql'. > Could not resolve all dependencies for configuration > ':sdks:java:extensions:sql:fmppTemplates'. > Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version > cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat} > !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! > > h2. :sdks:java:maven-archetypes:examples needs the version of > google-http-client > The task requires the version for the library: > {code:java} > 'google-http-client.version': > dependencies.create(project.library.java.google_http_client).getVersion(), > {code} > This would generate NullPointerException. Running gradlew without the > subproject: > > {code:java} > ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check > {code} > h1. Problem in Gradle-generated pom files > The generated Maven artifact POM has invalid data due to the BOM change. For > example my locally installed > {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}} > had the following problems. > h2. The GCP Libraries BOM showing up in dependencies section: > {noformat} > > > com.google.cloud > libraries-bom > 4.2.0 > compile > > > com.google.guava > guava-jdk5 > ... > > > {noformat} > h2. The artifact that use the BOM in Gradle is missing version in the > dependency. > {noformat} > > com.google.api > gax > > compile > ... > > {noformat} > h1. DependencyManagement section in generated pom.xml > How can I check whether a entry in dependencies is "platform"? > !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=410756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410756 ] ASF GitHub Bot logged work on BEAM-9444: Author: ASF GitHub Bot Created on: 27/Mar/20 01:43 Start Date: 27/Mar/20 01:43 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11156: [BEAM-9444] Use GCP Libraries BOM for Google Cloud Dependencies URL: https://github.com/apache/beam/pull/11156#issuecomment-604771894 Run BigQueryIO Streaming Performance Test Java This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410756) Time Spent: 9h 50m (was: 9h 40m) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot > 2020-03-17 at 16.01.16.png > > Time Spent: 9h 50m > Remaining Estimate: 0h > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > h1. Implementation > Notes for obstacles. > h2. BeamModulePlugin's "force" does not take BOM into account (thus fails) > {{forcedModules}} via version resolution strategy is playing bad. This causes > {noformat} > A problem occurred evaluating project ':sdks:java:extensions:sql'. > Could not resolve all dependencies for configuration > ':sdks:java:extensions:sql:fmppTemplates'. > Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version > cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat} > !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! > > h2. :sdks:java:maven-archetypes:examples needs the version of > google-http-client > The task requires the version for the library: > {code:java} > 'google-http-client.version': > dependencies.create(project.library.java.google_http_client).getVersion(), > {code} > This would generate NullPointerException. Running gradlew without the > subproject: > > {code:java} > ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check > {code} > h1. Problem in Gradle-generated pom files > The generated Maven artifact POM has invalid data due to the BOM change. For > example my locally installed > {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}} > had the following problems. > h2. The GCP Libraries BOM showing up in dependencies section: > {noformat} > > > com.google.cloud > libraries-bom > 4.2.0 > compile > > > com.google.guava > guava-jdk5 > ... > > > {noformat} > h2. The artifact that use the BOM in Gradle is missing version in the > dependency. > {noformat} > > com.google.api > gax > > compile > ... > > {noformat} > h1. DependencyManagement section in generated pom.xml > How can I check whether a entry in dependencies is "platform"? > !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=410757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410757 ] ASF GitHub Bot logged work on BEAM-9444: Author: ASF GitHub Bot Created on: 27/Mar/20 01:43 Start Date: 27/Mar/20 01:43 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11156: [BEAM-9444] Use GCP Libraries BOM for Google Cloud Dependencies URL: https://github.com/apache/beam/pull/11156#issuecomment-604771955 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410757) Time Spent: 10h (was: 9h 50m) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot > 2020-03-17 at 16.01.16.png > > Time Spent: 10h > Remaining Estimate: 0h > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > h1. Implementation > Notes for obstacles. > h2. BeamModulePlugin's "force" does not take BOM into account (thus fails) > {{forcedModules}} via version resolution strategy is playing bad. This causes > {noformat} > A problem occurred evaluating project ':sdks:java:extensions:sql'. > Could not resolve all dependencies for configuration > ':sdks:java:extensions:sql:fmppTemplates'. > Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version > cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat} > !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! > > h2. :sdks:java:maven-archetypes:examples needs the version of > google-http-client > The task requires the version for the library: > {code:java} > 'google-http-client.version': > dependencies.create(project.library.java.google_http_client).getVersion(), > {code} > This would generate NullPointerException. Running gradlew without the > subproject: > > {code:java} > ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check > {code} > h1. Problem in Gradle-generated pom files > The generated Maven artifact POM has invalid data due to the BOM change. For > example my locally installed > {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}} > had the following problems. > h2. The GCP Libraries BOM showing up in dependencies section: > {noformat} > > > com.google.cloud > libraries-bom > 4.2.0 > compile > > > com.google.guava > guava-jdk5 > ... > > > {noformat} > h2. The artifact that use the BOM in Gradle is missing version in the > dependency. > {noformat} > > com.google.api > gax > > compile > ... > > {noformat} > h1. DependencyManagement section in generated pom.xml > How can I check whether a entry in dependencies is "platform"? > !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=410758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410758 ] ASF GitHub Bot logged work on BEAM-9444: Author: ASF GitHub Bot Created on: 27/Mar/20 01:43 Start Date: 27/Mar/20 01:43 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11156: [BEAM-9444] Use GCP Libraries BOM for Google Cloud Dependencies URL: https://github.com/apache/beam/pull/11156#issuecomment-604771990 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410758) Time Spent: 10h 10m (was: 10h) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot > 2020-03-17 at 16.01.16.png > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > h1. Implementation > Notes for obstacles. > h2. BeamModulePlugin's "force" does not take BOM into account (thus fails) > {{forcedModules}} via version resolution strategy is playing bad. This causes > {noformat} > A problem occurred evaluating project ':sdks:java:extensions:sql'. > Could not resolve all dependencies for configuration > ':sdks:java:extensions:sql:fmppTemplates'. > Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version > cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat} > !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! > > h2. :sdks:java:maven-archetypes:examples needs the version of > google-http-client > The task requires the version for the library: > {code:java} > 'google-http-client.version': > dependencies.create(project.library.java.google_http_client).getVersion(), > {code} > This would generate NullPointerException. Running gradlew without the > subproject: > > {code:java} > ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check > {code} > h1. Problem in Gradle-generated pom files > The generated Maven artifact POM has invalid data due to the BOM change. For > example my locally installed > {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}} > had the following problems. > h2. The GCP Libraries BOM showing up in dependencies section: > {noformat} > > > com.google.cloud > libraries-bom > 4.2.0 > compile > > > com.google.guava > guava-jdk5 > ... > > > {noformat} > h2. The artifact that use the BOM in Gradle is missing version in the > dependency. > {noformat} > > com.google.api > gax > > compile > ... > > {noformat} > h1. DependencyManagement section in generated pom.xml > How can I check whether a entry in dependencies is "platform"? > !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8751) Beam Dependency Update Request: com.google.apis:google-api-services-cloudresourcemanager
[ https://issues.apache.org/jira/browse/BEAM-8751?focusedWorklogId=410753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410753 ] ASF GitHub Bot logged work on BEAM-8751: Author: ASF GitHub Bot Created on: 27/Mar/20 01:42 Start Date: 27/Mar/20 01:42 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11208: [BEAM-8751] google-api-client 1.30.9 URL: https://github.com/apache/beam/pull/11208#issuecomment-604771741 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410753) Time Spent: 1h 50m (was: 1h 40m) > Beam Dependency Update Request: > com.google.apis:google-api-services-cloudresourcemanager > > > Key: BEAM-8751 > URL: https://issues.apache.org/jira/browse/BEAM-8751 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > - 2019-11-19 21:04:41.938497 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191018-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:09:51.401493 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191115-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:09:00.761817 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191115-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-23 12:09:01.384571 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191206-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-30 14:04:31.850871 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191206-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-06 12:08:07.241510 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191206-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-13 12:08:00.916536 > - > Please consider upgrading the dependency > com.google.apis:google-api-services-cloudresourcemanager. > The current version is v1-rev20181015-1.28.0. The latest version is > v2-rev20191206-1.30.3 > cc: [~chamikara], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-20
[jira] [Commented] (BEAM-9620) textio (and fileio in general) takes too long to estimate sizes of large globs
[ https://issues.apache.org/jira/browse/BEAM-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068187#comment-17068187 ] Udi Meiri commented on BEAM-9620: - Since this is an estimation, perhaps there should be limits on how much it samples or a maximum amount of time it can spend sampling (overall). > textio (and fileio in general) takes too long to estimate sizes of large globs > -- > > Key: BEAM-9620 > URL: https://issues.apache.org/jira/browse/BEAM-9620 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > As a workaround we could introduce a way to not perform size estimation when > reading large globs. For example Java SDK has withHintMatchesManyFiles() > option. > > [https://github.com/apache/beam/blob/850e8469de798d45ec535fe90cb2dc5dbda4974a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L371] > > Additionally, seems like we are repeating the size estimation where the same > PCollection read from a file-based source is applied to multiple PTransforms. > > See following for more details. > [https://stackoverflow.com/questions/60874942/avoid-recomputing-size-of-all-cloud-storage-files-in-gcsio-beam-python-sdk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=410752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410752 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 27/Mar/20 01:16 Start Date: 27/Mar/20 01:16 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#discussion_r398982127 ## File path: sdks/python/apache_beam/transforms/sql_test.py ## @@ -0,0 +1,109 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for transforms that use the SQL Expansion service.""" + +# pytype: skip-file + +from __future__ import absolute_import + +import logging +import typing +import unittest + +from nose.plugins.attrib import attr +from past.builtins import unicode + +import apache_beam as beam +from apache_beam import coders +from apache_beam.options.pipeline_options import DebugOptions +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to +from apache_beam.transforms.sql import SqlTransform +from apache_beam.utils import subprocess_server + +SimpleRow = typing.NamedTuple( +"SimpleRow", [("int", int), ("str", unicode), ("flt", float)]) +coders.registry.register_coder(SimpleRow, coders.RowCoder) + + +@attr('UsesSqlExpansionService') +@unittest.skipIf( +TestPipeline().get_pipeline_options().view_as(StandardOptions).runner is +None, +"Must be run with a runner that supports cross-language transforms") Review comment: Ah actually I also ran into an issue when running this test with the default runner, that I'm having a hard time making sense of: ``` E ValueError: Missing requirement declaration: {'beam:requirement:pardo:splittable_dofn:v1'} sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py:651: ValueError ``` It looks like it's indicating my pipeline should contain a `splittable_dofn` declaration but doesn't - but I'm not clear on why it needs that declaration (I don't think anything in the pipeline needs to be splittable), or how to add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410752) Time Spent: 5h (was: 4h 50m) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9432) Create a separate expansion service package.
[ https://issues.apache.org/jira/browse/BEAM-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw resolved BEAM-9432. --- Fix Version/s: 2.21.0 Resolution: Fixed > Create a separate expansion service package. > > > Key: BEAM-9432 > URL: https://issues.apache.org/jira/browse/BEAM-9432 > Project: Beam > Issue Type: New Feature > Components: beam-model, sdk-java-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.21.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068180#comment-17068180 ] Robert Bradshaw commented on BEAM-9339: --- This is now done. > Declare capabilities in SDK environments > > > Key: BEAM-9339 > URL: https://issues.apache.org/jira/browse/BEAM-9339 > Project: Beam > Issue Type: New Feature > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9339) Declare capabilities in SDK environments
[ https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw resolved BEAM-9339. --- Fix Version/s: 2.21.0 Resolution: Fixed > Declare capabilities in SDK environments > > > Key: BEAM-9339 > URL: https://issues.apache.org/jira/browse/BEAM-9339 > Project: Beam > Issue Type: New Feature > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.21.0 > > Time Spent: 7h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9433) Create an expansion service artifact for common IOs
[ https://issues.apache.org/jira/browse/BEAM-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw resolved BEAM-9433. --- Fix Version/s: 2.21.0 Resolution: Fixed > Create an expansion service artifact for common IOs > --- > > Key: BEAM-9433 > URL: https://issues.apache.org/jira/browse/BEAM-9433 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka, sdk-java-core, sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.21.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > This will allow users to easily leverage Java IOs from Python/Go/... > pipelines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9618) Allow SDKs to pull process bundle descriptors.
[ https://issues.apache.org/jira/browse/BEAM-9618?focusedWorklogId=410747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410747 ] ASF GitHub Bot logged work on BEAM-9618: Author: ASF GitHub Bot Created on: 27/Mar/20 01:03 Start Date: 27/Mar/20 01:03 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11235: [BEAM-9618] Pull bundle descriptors. URL: https://github.com/apache/beam/pull/11235#issuecomment-604762162 R: @lukecwik this is rebased and should be ready for review. I will remove commit cebab89 before merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410747) Remaining Estimate: 0h Time Spent: 10m > Allow SDKs to pull process bundle descriptors. > -- > > Key: BEAM-9618 > URL: https://issues.apache.org/jira/browse/BEAM-9618 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9618) Allow SDKs to pull process bundle descriptors.
[ https://issues.apache.org/jira/browse/BEAM-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw reassigned BEAM-9618: - Assignee: Robert Bradshaw > Allow SDKs to pull process bundle descriptors. > -- > > Key: BEAM-9618 > URL: https://issues.apache.org/jira/browse/BEAM-9618 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3097) Allow BigQuerySource to take a ValueProvider as a table input.
[ https://issues.apache.org/jira/browse/BEAM-3097?focusedWorklogId=410744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410744 ] ASF GitHub Bot logged work on BEAM-3097: Author: ASF GitHub Bot Created on: 27/Mar/20 00:50 Start Date: 27/Mar/20 00:50 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11244: [BEAM-3097] _ReadFromBigQuery supports valueprovider for table URL: https://github.com/apache/beam/pull/11244 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-3097) Allow BigQuerySource to take a ValueProvider as a table input.
[ https://issues.apache.org/jira/browse/BEAM-3097?focusedWorklogId=410745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410745 ] ASF GitHub Bot logged work on BEAM-3097: Author: ASF GitHub Bot Created on: 27/Mar/20 00:50 Start Date: 27/Mar/20 00:50 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11244: [BEAM-3097] _ReadFromBigQuery supports valueprovider for table URL: https://github.com/apache/beam/pull/11244#issuecomment-604759269 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410745) Remaining Estimate: 1h 40m (was: 1h 50m) Time Spent: 20m (was: 10m) > Allow BigQuerySource to take a ValueProvider as a table input. > -- > > Key: BEAM-3097 > URL: https://issues.apache.org/jira/browse/BEAM-3097 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Ed Mothershaw >Priority: Minor > Original Estimate: 2h > Time Spent: 20m > Remaining Estimate: 1h 40m > > In file sdks/python/apache_beam/io/gcp/bigquery.py, class BigQuery, line 389. > When a ValueProvider is input as table the script will fail. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9620) textio (and fileio in general) takes too long to estimate sizes of large globs
[ https://issues.apache.org/jira/browse/BEAM-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068168#comment-17068168 ] Chamikara Madhusanka Jayalath edited comment on BEAM-9620 at 3/27/20, 12:48 AM: Actually, I think we do have a workaround. ReadAllFromText (and other various ReadAll transforms), should not run into this issue. [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py#L438] was (Author: chamikara): Actually, I think we do have a workaround. ReadAllFromText (and other various ReadAll transforms, should not run into this issue). [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py#L438] > textio (and fileio in general) takes too long to estimate sizes of large globs > -- > > Key: BEAM-9620 > URL: https://issues.apache.org/jira/browse/BEAM-9620 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > As a workaround we could introduce a way to not perform size estimation when > reading large globs. For example Java SDK has withHintMatchesManyFiles() > option. > > [https://github.com/apache/beam/blob/850e8469de798d45ec535fe90cb2dc5dbda4974a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L371] > > Additionally, seems like we are repeating the size estimation where the same > PCollection read from a file-based source is applied to multiple PTransforms. > > See following for more details. > [https://stackoverflow.com/questions/60874942/avoid-recomputing-size-of-all-cloud-storage-files-in-gcsio-beam-python-sdk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=410743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410743 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 27/Mar/20 00:48 Start Date: 27/Mar/20 00:48 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#issuecomment-604758757 > I reviewed everything but the groovy files, which I would like another set of eyes on. R: @ihji could you take a look at the groovy changes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410743) Time Spent: 4h 50m (was: 4h 40m) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9620) textio (and fileio in general) takes too long to estimate sizes of large globs
[ https://issues.apache.org/jira/browse/BEAM-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068168#comment-17068168 ] Chamikara Madhusanka Jayalath commented on BEAM-9620: - Actually, I think we do have a workaround. ReadAllFromText (and other various ReadAll transforms, should not run into this issue). [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py#L438] > textio (and fileio in general) takes too long to estimate sizes of large globs > -- > > Key: BEAM-9620 > URL: https://issues.apache.org/jira/browse/BEAM-9620 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > As a workaround we could introduce a way to not perform size estimation when > reading large globs. For example Java SDK has withHintMatchesManyFiles() > option. > > [https://github.com/apache/beam/blob/850e8469de798d45ec535fe90cb2dc5dbda4974a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L371] > > Additionally, seems like we are repeating the size estimation where the same > PCollection read from a file-based source is applied to multiple PTransforms. > > See following for more details. > [https://stackoverflow.com/questions/60874942/avoid-recomputing-size-of-all-cloud-storage-files-in-gcsio-beam-python-sdk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9623) Add support for TableProviders in Python SqlTransform
Brian Hulette created BEAM-9623: --- Summary: Add support for TableProviders in Python SqlTransform Key: BEAM-9623 URL: https://issues.apache.org/jira/browse/BEAM-9623 Project: Beam Issue Type: Improvement Components: dsl-sql, sdk-py-core Reporter: Brian Hulette It should be possible to use e.g. DataCatalogTableProvider and access BigQuery, PubSub, and GCS in queries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9622) Support for consuming tagged PCollections in Python SqlTransform
Brian Hulette created BEAM-9622: --- Summary: Support for consuming tagged PCollections in Python SqlTransform Key: BEAM-9622 URL: https://issues.apache.org/jira/browse/BEAM-9622 Project: Beam Issue Type: Improvement Components: dsl-sql, sdk-py-core Reporter: Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-8603: Component/s: dsl-sql > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9621) Python SqlTransform follow-ups
Brian Hulette created BEAM-9621: --- Summary: Python SqlTransform follow-ups Key: BEAM-9621 URL: https://issues.apache.org/jira/browse/BEAM-9621 Project: Beam Issue Type: Improvement Components: dsl-sql, sdk-py-core Reporter: Brian Hulette Assignee: Brian Hulette Tracking JIRA for follow-up work to improve SqlTransform in Python -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8603) Add Python SqlTransform MVP
[ https://issues.apache.org/jira/browse/BEAM-8603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-8603: Summary: Add Python SqlTransform MVP (was: Add Python SqlTransform example script) > Add Python SqlTransform MVP > --- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9574) NamedTuple instances generated from schemas cannot be pickled
[ https://issues.apache.org/jira/browse/BEAM-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette resolved BEAM-9574. - Fix Version/s: 2.21.0 Resolution: Fixed > NamedTuple instances generated from schemas cannot be pickled > - > > Key: BEAM-9574 > URL: https://issues.apache.org/jira/browse/BEAM-9574 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.21.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Attempting to pickle an instance of a generated NamedTuple class results in > the following: > {code} > _pickle.PicklingError: Can't pickle 'apache_beam.typehints.schemas.BeamSchema_a7de91e0_ae11_4c52_a041_0b58ada35ac1'>: > attribute lookup BeamSchema_a7de91e0_ae11_4c52_a041_0b58ada35ac1 on > apache_beam.typehints.schemas failed > {code} > In general, we shouldn't be pickling these instances, but occasionally it may > be necessary, and we should just do it rather than failing hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9574) NamedTuple instances generated from schemas cannot be pickled
[ https://issues.apache.org/jira/browse/BEAM-9574?focusedWorklogId=410742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410742 ] ASF GitHub Bot logged work on BEAM-9574: Author: ASF GitHub Bot Created on: 27/Mar/20 00:35 Start Date: 27/Mar/20 00:35 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11196: [BEAM-9574] Ensure that instances of generated NamedTuple classes can be pickled URL: https://github.com/apache/beam/pull/11196#discussion_r398971677 ## File path: sdks/python/apache_beam/typehints/schemas.py ## @@ -205,6 +218,11 @@ def typing_from_runner_api(fieldtype_proto): pass # TODO +def _hydrate_namedtuple_instance(encoded_schema, values): + return named_tuple_from_schema( + proto_utils.parse_Bytes(encoded_schema, schema_pb2.Schema))(*values) + + def named_tuple_from_schema(schema): Review comment: I went ahead and merged this, let me know if you think this should be tweaked and I can do it separately. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410742) Time Spent: 50m (was: 40m) > NamedTuple instances generated from schemas cannot be pickled > - > > Key: BEAM-9574 > URL: https://issues.apache.org/jira/browse/BEAM-9574 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Attempting to pickle an instance of a generated NamedTuple class results in > the following: > {code} > _pickle.PicklingError: Can't pickle 'apache_beam.typehints.schemas.BeamSchema_a7de91e0_ae11_4c52_a041_0b58ada35ac1'>: > attribute lookup BeamSchema_a7de91e0_ae11_4c52_a041_0b58ada35ac1 on > apache_beam.typehints.schemas failed > {code} > In general, we shouldn't be pickling these instances, but occasionally it may > be necessary, and we should just do it rather than failing hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9574) NamedTuple instances generated from schemas cannot be pickled
[ https://issues.apache.org/jira/browse/BEAM-9574?focusedWorklogId=410740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410740 ] ASF GitHub Bot logged work on BEAM-9574: Author: ASF GitHub Bot Created on: 27/Mar/20 00:34 Start Date: 27/Mar/20 00:34 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11196: [BEAM-9574] Ensure that instances of generated NamedTuple classes can be pickled URL: https://github.com/apache/beam/pull/11196 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410740) Time Spent: 40m (was: 0.5h) > NamedTuple instances generated from schemas cannot be pickled > - > > Key: BEAM-9574 > URL: https://issues.apache.org/jira/browse/BEAM-9574 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Attempting to pickle an instance of a generated NamedTuple class results in > the following: > {code} > _pickle.PicklingError: Can't pickle 'apache_beam.typehints.schemas.BeamSchema_a7de91e0_ae11_4c52_a041_0b58ada35ac1'>: > attribute lookup BeamSchema_a7de91e0_ae11_4c52_a041_0b58ada35ac1 on > apache_beam.typehints.schemas failed > {code} > In general, we shouldn't be pickling these instances, but occasionally it may > be necessary, and we should just do it rather than failing hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9620) textio (and fileio in general) takes too long to estimate sizes of large globs
[ https://issues.apache.org/jira/browse/BEAM-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068161#comment-17068161 ] Chamikara Madhusanka Jayalath commented on BEAM-9620: - cc: [~pabloem] [~udim] > textio (and fileio in general) takes too long to estimate sizes of large globs > -- > > Key: BEAM-9620 > URL: https://issues.apache.org/jira/browse/BEAM-9620 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > As a workaround we could introduce a way to not perform size estimation when > reading large globs. For example Java SDK has withHintMatchesManyFiles() > option. > > [https://github.com/apache/beam/blob/850e8469de798d45ec535fe90cb2dc5dbda4974a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L371] > > Additionally, seems like we are repeating the size estimation where the same > PCollection read from a file-based source is applied to multiple PTransforms. > > See following for more details. > [https://stackoverflow.com/questions/60874942/avoid-recomputing-size-of-all-cloud-storage-files-in-gcsio-beam-python-sdk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410739 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Mar/20 00:29 Start Date: 27/Mar/20 00:29 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#issuecomment-604754371 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410739) Time Spent: 33h 40m (was: 33.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 33h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9620) textio (and fileio in general) takes too long to estimate sizes of large globs
Chamikara Madhusanka Jayalath created BEAM-9620: --- Summary: textio (and fileio in general) takes too long to estimate sizes of large globs Key: BEAM-9620 URL: https://issues.apache.org/jira/browse/BEAM-9620 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Chamikara Madhusanka Jayalath As a workaround we could introduce a way to not perform size estimation when reading large globs. For example Java SDK has withHintMatchesManyFiles() option. [https://github.com/apache/beam/blob/850e8469de798d45ec535fe90cb2dc5dbda4974a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L371] Additionally, seems like we are repeating the size estimation where the same PCollection read from a file-based source is applied to multiple PTransforms. See following for more details. [https://stackoverflow.com/questions/60874942/avoid-recomputing-size-of-all-cloud-storage-files-in-gcsio-beam-python-sdk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410730 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 27/Mar/20 00:00 Start Date: 27/Mar/20 00:00 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-4150) Standardize use of PCollection coder proto attribute
[ https://issues.apache.org/jira/browse/BEAM-4150?focusedWorklogId=410726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410726 ] ASF GitHub Bot logged work on BEAM-4150: Author: ASF GitHub Bot Created on: 26/Mar/20 23:53 Start Date: 26/Mar/20 23:53 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11222: [BEAM-4150] Don't window PCollection coders. URL: https://github.com/apache/beam/pull/11222#issuecomment-604744648 may need to rebase to get passing Docker PreCommit? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410726) Time Spent: 8h 50m (was: 8h 40m) > Standardize use of PCollection coder proto attribute > > > Key: BEAM-4150 > URL: https://issues.apache.org/jira/browse/BEAM-4150 > Project: Beam > Issue Type: Task > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Luke Cwik >Priority: Major > Fix For: 2.20.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > In some places it's expected to be a WindowedCoder, in others the raw > ElementCoder. We should use the same convention (decided in discussion to be > the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the > attached windowing strategy, and the input/output ports should specify the > encoding directly rather than read the adjacent PCollection coder fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410717 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:48 Start Date: 26/Mar/20 23:48 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#discussion_r398957888 ## File path: model/pipeline/src/main/proto/metrics.proto ## @@ -52,38 +61,160 @@ message Annotation { string value = 2; } -// Populated MonitoringInfoSpecs for specific URNs. -// Indicating the required fields to be set. -// SDKs and RunnerHarnesses can load these instances into memory and write a -// validator or code generator to assist with populating and validating -// MonitoringInfo protos. +// A set of well known MonitoringInfo specifications. message MonitoringInfoSpecs { enum Enum { -// TODO(BEAM-6926): Add the PTRANSFORM name as a required label after -// upgrading the python SDK. -USER_COUNTER = 0 [(monitoring_info_spec) = { - urn: "beam:metric:user", - type_urn: "beam:metrics:sum_int_64", +// Represents an integer counter where values are summed across bundles. +USER_SUM_INT64 = 0 [(monitoring_info_spec) = { + urn: "beam:metric:user:v1", + type: "beam:metrics:sum_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a double counter where values are summed across bundles. +USER_SUM_DOUBLE = 1 [(monitoring_info_spec) = { + urn: "beam:metric:user:v1", + type: "beam:metrics:sum_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a distribution of an integer value where: +// - count: represents the number of values seen across all bundles Review comment: I chatted with Alex about this and the TypeUrns describing the encoding was enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410717) Time Spent: 33.5h (was: 33h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 33.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410716 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:47 Start Date: 26/Mar/20 23:47 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#discussion_r398957458 ## File path: model/pipeline/src/main/proto/metrics.proto ## @@ -52,38 +62,160 @@ message Annotation { string value = 2; } -// Populated MonitoringInfoSpecs for specific URNs. -// Indicating the required fields to be set. -// SDKs and RunnerHarnesses can load these instances into memory and write a -// validator or code generator to assist with populating and validating -// MonitoringInfo protos. +// A set of well known MonitoringInfo specifications. message MonitoringInfoSpecs { enum Enum { -// TODO(BEAM-6926): Add the PTRANSFORM name as a required label after -// upgrading the python SDK. -USER_COUNTER = 0 [(monitoring_info_spec) = { - urn: "beam:metric:user", - type_urn: "beam:metrics:sum_int_64", +// Represents an integer counter where values are summed across bundles. +USER_SUM_INT64 = 0 [(monitoring_info_spec) = { + urn: "beam:metric:user:sum_int64:v1", + type: "beam:metrics:sum_int64:v1", required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], annotations: [{ key: "description", -value: "URN utilized to report user numeric counters." +value: "URN utilized to report user metric." }] }]; -ELEMENT_COUNT = 1 [(monitoring_info_spec) = { +// Represents a double counter where values are summed across bundles. +USER_SUM_DOUBLE = 1 [(monitoring_info_spec) = { + urn: "beam:metric:user:sum_double:v1", + type: "beam:metrics:sum_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a distribution of an integer value where: +// - count: represents the number of values seen across all bundles +// - sum: represents the total of the value across all bundles +// - min: represents the smallest value seen across all bundles +// - max: represents the largest value seen across all bundles +USER_DISTRIBUTION_INT64 = 2 [(monitoring_info_spec) = { + urn: "beam:metric:user:distribution_int64:v1", + type: "beam:metrics:distribution_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a distribution of a double value where: +// - count: represents the number of values seen across all bundles +// - sum: represents the total of the value across all bundles +// - min: represents the smallest value seen across all bundles +// - max: represents the largest value seen across all bundles +USER_DISTRIBUTION_DOUBLE = 3 [(monitoring_info_spec) = { + urn: "beam:metric:user:distribution_double:v1", + type: "beam:metrics:distribution_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents the latest seen integer value. The timestamp is used to +// provide an "ordering" over multiple values to determine which is the +// latest. +USER_LATEST_INT64 = 4 [(monitoring_info_spec) = { + urn: "beam:metric:user:latest_int64:v1", + type: "beam:metrics:latest_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents the latest seen double value. The timestamp is used to +// provide an "ordering" over multiple values to determine which is the +// latest. +USER_LATEST_DOUBLE = 5 [(monitoring_info_spec) = { + urn: "beam:metric:user:latest_double:v1", + type: "beam:metrics:latest_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents the largest set of integer values seen across bundles. +USER_TOP_N_INT64 = 6 [(monitoring_info_spec) = { + urn: "beam:metric:user:top_n_int64:v1", + type: "beam:metrics:top_n_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410714 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:44 Start Date: 26/Mar/20 23:44 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#discussion_r398953040 ## File path: model/pipeline/src/main/proto/metrics.proto ## @@ -52,38 +62,160 @@ message Annotation { string value = 2; } -// Populated MonitoringInfoSpecs for specific URNs. -// Indicating the required fields to be set. -// SDKs and RunnerHarnesses can load these instances into memory and write a -// validator or code generator to assist with populating and validating -// MonitoringInfo protos. +// A set of well known MonitoringInfo specifications. message MonitoringInfoSpecs { enum Enum { -// TODO(BEAM-6926): Add the PTRANSFORM name as a required label after -// upgrading the python SDK. -USER_COUNTER = 0 [(monitoring_info_spec) = { - urn: "beam:metric:user", - type_urn: "beam:metrics:sum_int_64", +// Represents an integer counter where values are summed across bundles. +USER_SUM_INT64 = 0 [(monitoring_info_spec) = { + urn: "beam:metric:user:sum_int64:v1", + type: "beam:metrics:sum_int64:v1", required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], annotations: [{ key: "description", -value: "URN utilized to report user numeric counters." +value: "URN utilized to report user metric." }] }]; -ELEMENT_COUNT = 1 [(monitoring_info_spec) = { +// Represents a double counter where values are summed across bundles. +USER_SUM_DOUBLE = 1 [(monitoring_info_spec) = { + urn: "beam:metric:user:sum_double:v1", + type: "beam:metrics:sum_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a distribution of an integer value where: +// - count: represents the number of values seen across all bundles +// - sum: represents the total of the value across all bundles +// - min: represents the smallest value seen across all bundles +// - max: represents the largest value seen across all bundles +USER_DISTRIBUTION_INT64 = 2 [(monitoring_info_spec) = { + urn: "beam:metric:user:distribution_int64:v1", + type: "beam:metrics:distribution_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a distribution of a double value where: +// - count: represents the number of values seen across all bundles +// - sum: represents the total of the value across all bundles +// - min: represents the smallest value seen across all bundles +// - max: represents the largest value seen across all bundles +USER_DISTRIBUTION_DOUBLE = 3 [(monitoring_info_spec) = { + urn: "beam:metric:user:distribution_double:v1", + type: "beam:metrics:distribution_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents the latest seen integer value. The timestamp is used to +// provide an "ordering" over multiple values to determine which is the +// latest. +USER_LATEST_INT64 = 4 [(monitoring_info_spec) = { + urn: "beam:metric:user:latest_int64:v1", + type: "beam:metrics:latest_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents the latest seen double value. The timestamp is used to +// provide an "ordering" over multiple values to determine which is the +// latest. +USER_LATEST_DOUBLE = 5 [(monitoring_info_spec) = { + urn: "beam:metric:user:latest_double:v1", + type: "beam:metrics:latest_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents the largest set of integer values seen across bundles. +USER_TOP_N_INT64 = 6 [(monitoring_info_spec) = { + urn: "beam:metric:user:top_n_int64:v1", + type: "beam:metrics:top_n_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410715 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:44 Start Date: 26/Mar/20 23:44 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#discussion_r398952488 ## File path: model/pipeline/src/main/proto/metrics.proto ## @@ -52,38 +61,160 @@ message Annotation { string value = 2; } -// Populated MonitoringInfoSpecs for specific URNs. -// Indicating the required fields to be set. -// SDKs and RunnerHarnesses can load these instances into memory and write a -// validator or code generator to assist with populating and validating -// MonitoringInfo protos. +// A set of well known MonitoringInfo specifications. message MonitoringInfoSpecs { enum Enum { -// TODO(BEAM-6926): Add the PTRANSFORM name as a required label after -// upgrading the python SDK. -USER_COUNTER = 0 [(monitoring_info_spec) = { - urn: "beam:metric:user", - type_urn: "beam:metrics:sum_int_64", +// Represents an integer counter where values are summed across bundles. +USER_SUM_INT64 = 0 [(monitoring_info_spec) = { + urn: "beam:metric:user:v1", + type: "beam:metrics:sum_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a double counter where values are summed across bundles. +USER_SUM_DOUBLE = 1 [(monitoring_info_spec) = { + urn: "beam:metric:user:v1", + type: "beam:metrics:sum_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a distribution of an integer value where: +// - count: represents the number of values seen across all bundles Review comment: It is explicit in the type field which is a URN denoting exactly how the values are encoded? Did we need more? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410715) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 33h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9331) The Row object needs better builders
[ https://issues.apache.org/jira/browse/BEAM-9331?focusedWorklogId=410711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410711 ] ASF GitHub Bot logged work on BEAM-9331: Author: ASF GitHub Bot Created on: 26/Mar/20 23:30 Start Date: 26/Mar/20 23:30 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #10883: [BEAM-9331] Add better Row builders URL: https://github.com/apache/beam/pull/10883#issuecomment-604738743 run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410711) Time Spent: 4h 20m (was: 4h 10m) > The Row object needs better builders > > > Key: BEAM-9331 > URL: https://issues.apache.org/jira/browse/BEAM-9331 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Users should be able to build a Row object by specifying field names. Desired > syntax: > > Row.withSchema(schema) > .withFieldName("field1", "value) > .withFieldName("field2.field3", value) > .build() > > Users should also have a builder that allows taking an existing row and > changing specific fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410703 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:23 Start Date: 26/Mar/20 23:23 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11231: [BEAM-4374] Shortids for the Go SDK URL: https://github.com/apache/beam/pull/11231#discussion_r398948198 ## File path: sdks/go/pkg/beam/core/runtime/harness/monitoring.go ## @@ -16,20 +16,165 @@ package harness import ( + "bytes" + "strconv" + "sync" + "sync/atomic" "time" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" ppb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/ptypes" ) -func monitoring(p *exec.Plan) (*fnpb.Metrics, []*ppb.MonitoringInfo) { +type mUrn uint32 +type mType uint32 + +// TODO: Pull these from the protos. +var sUrns = []string{ + "beam:metric:user:v1", Review comment: heads up that this has now been exploded so that each MonitoringInfoSpec has a unique urn meaning that you'll see: beam:metric:user:sum_int64:v1, beam:metric:user:sum_double:v1, ... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410703) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 32h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410702 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:23 Start Date: 26/Mar/20 23:23 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11231: [BEAM-4374] Shortids for the Go SDK URL: https://github.com/apache/beam/pull/11231#discussion_r398949690 ## File path: sdks/go/pkg/beam/core/runtime/harness/monitoring_test.go ## @@ -0,0 +1,122 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package harness + +import ( + "testing" + + "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" +) + +func TestGetShortID(t *testing.T) { + tests := []struct { + id string + urn mUrn + typ mType + expectedUrn string + expectedType string + }{ + { + id: "1", + urn: urnUser, Review comment: Can you add the case where the same urn but unique labels are used gets a different short id? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410702) Time Spent: 32h 50m (was: 32h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 32h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410704 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:23 Start Date: 26/Mar/20 23:23 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11231: [BEAM-4374] Shortids for the Go SDK URL: https://github.com/apache/beam/pull/11231#discussion_r398948604 ## File path: sdks/go/pkg/beam/core/runtime/harness/monitoring.go ## @@ -16,20 +16,165 @@ package harness import ( + "bytes" + "strconv" + "sync" + "sync/atomic" "time" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" ppb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/ptypes" ) -func monitoring(p *exec.Plan) (*fnpb.Metrics, []*ppb.MonitoringInfo) { +type mUrn uint32 +type mType uint32 + +// TODO: Pull these from the protos. +var sUrns = []string{ + "beam:metric:user:v1", + "beam:metric:element_count:v1", + "beam:metric:pardo_execution_time:start_bundle_msecs:v1", + "beam:metric:pardo_execution_time:process_bundle_msecs:v1", + "beam:metric:pardo_execution_time:finish_bundle_msecs:v1", + "beam:metric:ptransform_progress:remaining:v1", + "beam:metric:ptransform_progress:completed:v1", + + "TestingSentinelUrn", // Must remain last. +} + +const ( + urnUser mUrn = iota + urnElementCount + urnStartBundle + urnProcessBundle + urnFinishBundle + urnProgressRemaining + urnProgressCompleted + + urnTestSentinel // Must remain last. +) + +var sTypes = []string{ + "beam:metrics:sum_int64:v1", + "beam:metrics:sum_double:v1", + "beam:metrics:distribution_int64:v1", + "beam:metrics:distribution_double:v1", + "beam:metrics:latest_int64:v1", + "beam:metrics:latest_double:v1", + "beam:metrics:top_n_int64:v1", + "beam:metrics:top_n_double:v1", + "beam:metrics:bottom_n_int64:v1", + "beam:metrics:bottom_n_double:v1", + "beam:metrics:monitoring_table:v1", + "beam:metrics:progress:v1", + + "TestingSentinelType", // Must remain last. +} + +const ( Review comment: Since the urns uniquely identify the type now, you don't need this anymore and a monitoring info is uniquely described by urn + labels. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410704) Time Spent: 33h (was: 32h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 33h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1894) Race conditions in python direct runner eager mode
[ https://issues.apache.org/jira/browse/BEAM-1894?focusedWorklogId=410700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410700 ] ASF GitHub Bot logged work on BEAM-1894: Author: ASF GitHub Bot Created on: 26/Mar/20 23:22 Start Date: 26/Mar/20 23:22 Worklog Time Spent: 10m Work Description: udim commented on issue #11242: [BEAM-1894] Remove obsolete EagerRunner test URL: https://github.com/apache/beam/pull/11242#issuecomment-604736500 R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410700) Time Spent: 20m (was: 10m) > Race conditions in python direct runner eager mode > -- > > Key: BEAM-1894 > URL: https://issues.apache.org/jira/browse/BEAM-1894 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Assignee: Udi Meiri >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > test_eager_pipeline > (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline_test.py#L283) > fails with the following error: > ERROR: test_eager_pipeline (apache_beam.pipeline_test.PipelineTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline_test.py", > line 285, in test_eager_pipeline > self.assertEqual([1, 4, 9], p | Create([1, 2, 3]) | Map(lambda x: x*x)) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/transforms/ptransform.py", > line 387, in __ror__ > p.run().wait_until_finish() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline.py", > line 160, in run > self.to_runner_api(), self.runner, self.options).run(False) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline.py", > line 169, in run > return self.runner.run(self) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 99, in run > result.wait_until_finish() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 166, in wait_until_finish > self._executor.await_completion() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/executor.py", > line 336, in await_completion > self._executor.await_completion() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/executor.py", > line 308, in __call__ > uncommitted_bundle.get_elements_iterable()) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/evaluation_context.py", > line 176, in append_to_cache > self._cache.append(applied_ptransform, tag, elements) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 138, in append > self._cache[(applied_ptransform, tag)].extend(elements) > TypeError: 'NoneType' object has no attribute '__getitem__' > This is triggered when Create is changed to a custom source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=410701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410701 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 26/Mar/20 23:22 Start Date: 26/Mar/20 23:22 Worklog Time Spent: 10m Work Description: jaketf commented on pull request #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#discussion_r398949505 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java ## @@ -0,0 +1,636 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.healthcare; + +import com.google.api.services.healthcare.v1alpha2.model.Message; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.text.ParseException; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.gcp.datastore.AdaptiveThrottler; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.PInput; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.PValue; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link HL7v2IO} provides an API for reading from and writing to https://cloud.google.com/healthcare/docs/concepts/hl7v2;>Google Cloud Healthcare HL7v2 API. + * + * + * Read Review comment: @brianlucier PTAL at this updated doc string. it describes my latest change in ba9d023 to avoid the double get whenever we reading a whole HL7v2Store with Messages.List API by adding the view=FULL param. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410701) Time Spent: 6h 50m (was: 6h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 6h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-1894) Race conditions in python direct runner eager mode
[ https://issues.apache.org/jira/browse/BEAM-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-1894: --- Assignee: Udi Meiri > Race conditions in python direct runner eager mode > -- > > Key: BEAM-1894 > URL: https://issues.apache.org/jira/browse/BEAM-1894 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Assignee: Udi Meiri >Priority: Major > > test_eager_pipeline > (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline_test.py#L283) > fails with the following error: > ERROR: test_eager_pipeline (apache_beam.pipeline_test.PipelineTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline_test.py", > line 285, in test_eager_pipeline > self.assertEqual([1, 4, 9], p | Create([1, 2, 3]) | Map(lambda x: x*x)) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/transforms/ptransform.py", > line 387, in __ror__ > p.run().wait_until_finish() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline.py", > line 160, in run > self.to_runner_api(), self.runner, self.options).run(False) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline.py", > line 169, in run > return self.runner.run(self) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 99, in run > result.wait_until_finish() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 166, in wait_until_finish > self._executor.await_completion() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/executor.py", > line 336, in await_completion > self._executor.await_completion() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/executor.py", > line 308, in __call__ > uncommitted_bundle.get_elements_iterable()) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/evaluation_context.py", > line 176, in append_to_cache > self._cache.append(applied_ptransform, tag, elements) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 138, in append > self._cache[(applied_ptransform, tag)].extend(elements) > TypeError: 'NoneType' object has no attribute '__getitem__' > This is triggered when Create is changed to a custom source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1894) Race conditions in python direct runner eager mode
[ https://issues.apache.org/jira/browse/BEAM-1894?focusedWorklogId=410699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410699 ] ASF GitHub Bot logged work on BEAM-1894: Author: ASF GitHub Bot Created on: 26/Mar/20 23:21 Start Date: 26/Mar/20 23:21 Worklog Time Spent: 10m Work Description: udim commented on pull request #11242: [BEAM-1894] Remove obsolete EagerRunner test URL: https://github.com/apache/beam/pull/11242 EagerRunner was removed in #4492. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Resolved] (BEAM-9377) Python typehints: Map wrapper prevents Optional stripping
[ https://issues.apache.org/jira/browse/BEAM-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-9377. - Fix Version/s: Not applicable Resolution: Won't Fix > Python typehints: Map wrapper prevents Optional stripping > - > > Key: BEAM-9377 > URL: https://issues.apache.org/jira/browse/BEAM-9377 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Fix For: Not applicable > > > This existing test is wrong: > {code} > def test_map_wrapper_optional_output(self): > # Optional does affect output type (Nones are NOT ignored). > def map_fn(unused_element: int) -> typehints.Optional[int]: > return 1 > th = beam.Map(map_fn).get_type_hints() > self.assertEqual(th.input_types, ((int, ), {})) > self.assertEqual(th.output_types, ((typehints.Optional[int], ), {})) > {code} > The resulting output type should be int. > {code} > inital output hint: > Optional[int] > with wrapper: > Iterable[Optional[int]] > with DoFn.default_type_hints: > Optional[int] > {code} > However any Nones returned by a DoFn's process method are dropped, so the > actual element_type returned is plain int. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9377) Python typehints: Map wrapper prevents Optional stripping
[ https://issues.apache.org/jira/browse/BEAM-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-9377: --- Assignee: Udi Meiri > Python typehints: Map wrapper prevents Optional stripping > - > > Key: BEAM-9377 > URL: https://issues.apache.org/jira/browse/BEAM-9377 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > This existing test is wrong: > {code} > def test_map_wrapper_optional_output(self): > # Optional does affect output type (Nones are NOT ignored). > def map_fn(unused_element: int) -> typehints.Optional[int]: > return 1 > th = beam.Map(map_fn).get_type_hints() > self.assertEqual(th.input_types, ((int, ), {})) > self.assertEqual(th.output_types, ((typehints.Optional[int], ), {})) > {code} > The resulting output type should be int. > {code} > inital output hint: > Optional[int] > with wrapper: > Iterable[Optional[int]] > with DoFn.default_type_hints: > Optional[int] > {code} > However any Nones returned by a DoFn's process method are dropped, so the > actual element_type returned is plain int. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9377) Python typehints: Map wrapper prevents Optional stripping
[ https://issues.apache.org/jira/browse/BEAM-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068131#comment-17068131 ] Udi Meiri commented on BEAM-9377: - Verified for myself that Nones returned from map_fn indeed appear in the PCollection: {code} def test_typed_map_optional(self): # Optional does affect output type (Nones are NOT ignored). def map_fn(element: int) -> typehints.Optional[int]: if element == 1: return None else: return element result = [1, 2, 3] | beam.Map(map_fn) self.assertCountEqual([None, 2, 3], result) {code} > Python typehints: Map wrapper prevents Optional stripping > - > > Key: BEAM-9377 > URL: https://issues.apache.org/jira/browse/BEAM-9377 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Udi Meiri >Priority: Major > > This existing test is wrong: > {code} > def test_map_wrapper_optional_output(self): > # Optional does affect output type (Nones are NOT ignored). > def map_fn(unused_element: int) -> typehints.Optional[int]: > return 1 > th = beam.Map(map_fn).get_type_hints() > self.assertEqual(th.input_types, ((int, ), {})) > self.assertEqual(th.output_types, ((typehints.Optional[int], ), {})) > {code} > The resulting output type should be int. > {code} > inital output hint: > Optional[int] > with wrapper: > Iterable[Optional[int]] > with DoFn.default_type_hints: > Optional[int] > {code} > However any Nones returned by a DoFn's process method are dropped, so the > actual element_type returned is plain int. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9377) Python typehints: Map wrapper prevents Optional stripping
[ https://issues.apache.org/jira/browse/BEAM-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068132#comment-17068132 ] Udi Meiri commented on BEAM-9377: - Nothing to do, closing > Python typehints: Map wrapper prevents Optional stripping > - > > Key: BEAM-9377 > URL: https://issues.apache.org/jira/browse/BEAM-9377 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Udi Meiri >Priority: Major > > This existing test is wrong: > {code} > def test_map_wrapper_optional_output(self): > # Optional does affect output type (Nones are NOT ignored). > def map_fn(unused_element: int) -> typehints.Optional[int]: > return 1 > th = beam.Map(map_fn).get_type_hints() > self.assertEqual(th.input_types, ((int, ), {})) > self.assertEqual(th.output_types, ((typehints.Optional[int], ), {})) > {code} > The resulting output type should be int. > {code} > inital output hint: > Optional[int] > with wrapper: > Iterable[Optional[int]] > with DoFn.default_type_hints: > Optional[int] > {code} > However any Nones returned by a DoFn's process method are dropped, so the > actual element_type returned is plain int. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-1894) Race conditions in python direct runner eager mode
[ https://issues.apache.org/jira/browse/BEAM-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068130#comment-17068130 ] Udi Meiri commented on BEAM-1894: - EagerRunner was removed in https://github.com/apache/beam/pull/4492 > Race conditions in python direct runner eager mode > -- > > Key: BEAM-1894 > URL: https://issues.apache.org/jira/browse/BEAM-1894 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Priority: Major > > test_eager_pipeline > (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline_test.py#L283) > fails with the following error: > ERROR: test_eager_pipeline (apache_beam.pipeline_test.PipelineTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline_test.py", > line 285, in test_eager_pipeline > self.assertEqual([1, 4, 9], p | Create([1, 2, 3]) | Map(lambda x: x*x)) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/transforms/ptransform.py", > line 387, in __ror__ > p.run().wait_until_finish() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline.py", > line 160, in run > self.to_runner_api(), self.runner, self.options).run(False) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/pipeline.py", > line 169, in run > return self.runner.run(self) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 99, in run > result.wait_until_finish() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 166, in wait_until_finish > self._executor.await_completion() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/executor.py", > line 336, in await_completion > self._executor.await_completion() > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/executor.py", > line 308, in __call__ > uncommitted_bundle.get_elements_iterable()) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/evaluation_context.py", > line 176, in append_to_cache > self._cache.append(applied_ptransform, tag, elements) > File > "/usr/local/google/home/vikasrk/work/incubator-beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 138, in append > self._cache[(applied_ptransform, tag)].extend(elements) > TypeError: 'NoneType' object has no attribute '__getitem__' > This is triggered when Create is changed to a custom source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=410694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410694 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 26/Mar/20 23:15 Start Date: 26/Mar/20 23:15 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #11216: [BEAM-9562] Remove TimerSpec from Proto URL: https://github.com/apache/beam/pull/11216#issuecomment-604734389 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410694) Time Spent: 5h (was: 4h 50m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410692 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:07 Start Date: 26/Mar/20 23:07 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#issuecomment-604732007 This is ready for the next review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410692) Time Spent: 32h 40m (was: 32.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 32h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4150) Standardize use of PCollection coder proto attribute
[ https://issues.apache.org/jira/browse/BEAM-4150?focusedWorklogId=410693=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410693 ] ASF GitHub Bot logged work on BEAM-4150: Author: ASF GitHub Bot Created on: 26/Mar/20 23:07 Start Date: 26/Mar/20 23:07 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11222: [BEAM-4150] Don't window PCollection coders. URL: https://github.com/apache/beam/pull/11222#issuecomment-604732125 Run PythonDocker PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410693) Time Spent: 8h 40m (was: 8.5h) > Standardize use of PCollection coder proto attribute > > > Key: BEAM-4150 > URL: https://issues.apache.org/jira/browse/BEAM-4150 > Project: Beam > Issue Type: Task > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Luke Cwik >Priority: Major > Fix For: 2.20.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > In some places it's expected to be a WindowedCoder, in others the raw > ElementCoder. We should use the same convention (decided in discussion to be > the raw ElementCoder) everywhere. The WindowCoder can be pulled out of the > attached windowing strategy, and the input/output ports should specify the > encoding directly rather than read the adjacent PCollection coder fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410690 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 23:02 Start Date: 26/Mar/20 23:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#discussion_r398942909 ## File path: model/pipeline/src/main/proto/metrics.proto ## @@ -52,38 +55,157 @@ message Annotation { string value = 2; } -// Populated MonitoringInfoSpecs for specific URNs. -// Indicating the required fields to be set. -// SDKs and RunnerHarnesses can load these instances into memory and write a -// validator or code generator to assist with populating and validating -// MonitoringInfo protos. +// A set of well known MonitoringInfo specifications. message MonitoringInfoSpecs { enum Enum { -// TODO(BEAM-6926): Add the PTRANSFORM name as a required label after -// upgrading the python SDK. -USER_COUNTER = 0 [(monitoring_info_spec) = { - urn: "beam:metric:user", - type_urn: "beam:metrics:sum_int_64", +// Represents an integer counter where values are summed across bundles. +USER_SUM_INT64 = 0 [(monitoring_info_spec) = { + urn: "beam:metric:user:v1", + type: "beam:metrics:sum_int64:v1", required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], annotations: [{ key: "description", -value: "URN utilized to report user numeric counters." +value: "URN utilized to report user metric." }] }]; -ELEMENT_COUNT = 1 [(monitoring_info_spec) = { +// Represents a double counter where values are summed across bundles. +USER_SUM_DOUBLE = 1 [(monitoring_info_spec) = { + urn: "beam:metric:user:v1", Review comment: Made all the URNs unique and added a test to make sure that they remain unique. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410690) Time Spent: 32.5h (was: 32h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 32.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=410687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410687 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 26/Mar/20 22:59 Start Date: 26/Mar/20 22:59 Worklog Time Spent: 10m Work Description: jaketf commented on issue #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#issuecomment-604655037 Ok an updates here from an internal thread w/ API team. 1. [Message.List returning message contents is available in beta API with the view parameter. 1. Schematized Data should be in next beta release roughly in ~2 weeks. 1. right now the sink is outputting schematized data json wrapped in "{data=}" In light of these I will do the following refactors: 1. [x] how we batch read from to always avoid the double get. This will make it a completely parallel code path than the real-time path but I think that's ok. 1. [ ] refactor to use beta client library (once it includes schematizedData) 1. [x] I'll strip out that `{data=}` wrapper to make this easier for users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410687) Time Spent: 6h 40m (was: 6.5h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 6h 40m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9573) Watermark hold for timer output timestamp is not computed correctly
[ https://issues.apache.org/jira/browse/BEAM-9573?focusedWorklogId=410685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410685 ] ASF GitHub Bot logged work on BEAM-9573: Author: ASF GitHub Bot Created on: 26/Mar/20 22:58 Start Date: 26/Mar/20 22:58 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11220: [BEAM-9573][release-2.20] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11220#issuecomment-604729513 Great! all tests have pass! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410685) Time Spent: 8h (was: 7h 50m) > Watermark hold for timer output timestamp is not computed correctly > --- > > Key: BEAM-9573 > URL: https://issues.apache.org/jira/browse/BEAM-9573 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.20.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Blocker > Fix For: 2.20.0 > > Time Spent: 8h > Remaining Estimate: 0h > > With the introduction of timer output timestamp, a new watermark hold had > been added to the Flink Runner. The watermark computation works on the keyed > state backend which computes a key-scoped watermark hold and not the desired > operator-wide watermark hold. > Computation: > https://github.com/apache/beam/blob/b564239081e9351c56fb0e7d263495b95dd3f8f3/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L1140 > Key-scoped state: > https://github.com/apache/beam/blob/b564239081e9351c56fb0e7d263495b95dd3f8f3/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L1130 > We need to change this to operate on all keys. This has to be done before > fixing BEAM-9566. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9573) Watermark hold for timer output timestamp is not computed correctly
[ https://issues.apache.org/jira/browse/BEAM-9573?focusedWorklogId=410686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410686 ] ASF GitHub Bot logged work on BEAM-9573: Author: ASF GitHub Bot Created on: 26/Mar/20 22:58 Start Date: 26/Mar/20 22:58 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11220: [BEAM-9573][release-2.20] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11220#issuecomment-604729513 Great! all tests have passed! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410686) Time Spent: 8h 10m (was: 8h) > Watermark hold for timer output timestamp is not computed correctly > --- > > Key: BEAM-9573 > URL: https://issues.apache.org/jira/browse/BEAM-9573 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.20.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Blocker > Fix For: 2.20.0 > > Time Spent: 8h 10m > Remaining Estimate: 0h > > With the introduction of timer output timestamp, a new watermark hold had > been added to the Flink Runner. The watermark computation works on the keyed > state backend which computes a key-scoped watermark hold and not the desired > operator-wide watermark hold. > Computation: > https://github.com/apache/beam/blob/b564239081e9351c56fb0e7d263495b95dd3f8f3/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L1140 > Key-scoped state: > https://github.com/apache/beam/blob/b564239081e9351c56fb0e7d263495b95dd3f8f3/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L1130 > We need to change this to operate on all keys. This has to be done before > fixing BEAM-9566. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9557) Error setting processing time timers near end-of-window
[ https://issues.apache.org/jira/browse/BEAM-9557?focusedWorklogId=410684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410684 ] ASF GitHub Bot logged work on BEAM-9557: Author: ASF GitHub Bot Created on: 26/Mar/20 22:56 Start Date: 26/Mar/20 22:56 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11226: [BEAM-9557] Fix timer window boundary checking URL: https://github.com/apache/beam/pull/11226#issuecomment-604729044 @amaliujia appears that test looks for specific strings in the exception (always a recipe for a brittle test). I changed "event time timer" to "event-time timer" which broke that test. Will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410684) Time Spent: 1h (was: 50m) > Error setting processing time timers near end-of-window > --- > > Key: BEAM-9557 > URL: https://issues.apache.org/jira/browse/BEAM-9557 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Steve Niemitz >Assignee: Reuven Lax >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Previously, it was possible to set a processing time timer past the end of a > window, and it would simply not fire. > However, now, this results in an error: > {code:java} > java.lang.IllegalArgumentException: Attempted to set event time timer that > outputs for 2020-03-19T18:01:35.000Z but that is after the expiration of > window 2020-03-19T17:59:59.999Z > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > > org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setAndVerifyOutputTimestamp(SimpleDoFnRunner.java:1011) > > org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setRelative(SimpleDoFnRunner.java:934) > .processElement(???.scala:187) > {code} > > I think the regression was introduced in commit > a005fd765a762183ca88df90f261f6d4a20cf3e0. Also notice that the error message > is wrong, it says that "event time timer" but the timer is in the processing > time domain. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9331) The Row object needs better builders
[ https://issues.apache.org/jira/browse/BEAM-9331?focusedWorklogId=410683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410683 ] ASF GitHub Bot logged work on BEAM-9331: Author: ASF GitHub Bot Created on: 26/Mar/20 22:47 Start Date: 26/Mar/20 22:47 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #10883: [BEAM-9331] Add better Row builders URL: https://github.com/apache/beam/pull/10883#issuecomment-604726387 @alexvanboxel rebased and fixed bugs. Previously I was blocked on getting logical types to work, but now that we natively store logical types in Row, it's become much easier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410683) Time Spent: 4h 10m (was: 4h) > The Row object needs better builders > > > Key: BEAM-9331 > URL: https://issues.apache.org/jira/browse/BEAM-9331 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Users should be able to build a Row object by specifying field names. Desired > syntax: > > Row.withSchema(schema) > .withFieldName("field1", "value) > .withFieldName("field2.field3", value) > .build() > > Users should also have a builder that allows taking an existing row and > changing specific fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5422) Update BigQueryIO DynamicDestinations documentation to clarify usage of getDestination() and getTable()
[ https://issues.apache.org/jira/browse/BEAM-5422?focusedWorklogId=410682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410682 ] ASF GitHub Bot logged work on BEAM-5422: Author: ASF GitHub Bot Created on: 26/Mar/20 22:46 Start Date: 26/Mar/20 22:46 Worklog Time Spent: 10m Work Description: udim commented on issue #11241: [BEAM-5422] Document DynamicDestinations.getTable uniqueness requirement URL: https://github.com/apache/beam/pull/11241#issuecomment-604725835 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410682) Time Spent: 20m (was: 10m) > Update BigQueryIO DynamicDestinations documentation to clarify usage of > getDestination() and getTable() > --- > > Key: BEAM-5422 > URL: https://issues.apache.org/jira/browse/BEAM-5422 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Currently, there are some details related to these methods that should be > further clarified. For example, getTable() is expected to return a unique > value for each destination. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5422) Update BigQueryIO DynamicDestinations documentation to clarify usage of getDestination() and getTable()
[ https://issues.apache.org/jira/browse/BEAM-5422?focusedWorklogId=410681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410681 ] ASF GitHub Bot logged work on BEAM-5422: Author: ASF GitHub Bot Created on: 26/Mar/20 22:44 Start Date: 26/Mar/20 22:44 Worklog Time Spent: 10m Work Description: udim commented on pull request #11241: [BEAM-5422] Document DynamicDestinations.getTable uniqueness requirement URL: https://github.com/apache/beam/pull/11241 Load job IDs are keyed by table (among other things), but not the destination. Thus multiple DestinationTs mapping to the same table will have the same BQ job ID. The first one will succeed, and the rest will seem to Beam as retries (no-ops because the job has already started/completed). Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410665 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 22:10 Start Date: 26/Mar/20 22:10 Worklog Time Spent: 10m Work Description: lostluck commented on issue #11231: [BEAM-4374] Shortids for the Go SDK URL: https://github.com/apache/beam/pull/11231#issuecomment-604713667 Just as a note, I'll wait until https://github.com/apache/beam/pull/11184 is in, and resolve the merge conflicts on my end before we merge this one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410665) Time Spent: 32h 20m (was: 32h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 32h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410662 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 26/Mar/20 22:07 Start Date: 26/Mar/20 22:07 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #11067: [BEAM-9136]Add licenses for dependencies for Python URL: https://github.com/apache/beam/pull/11067#issuecomment-604712009 I changed this PR to Python only. New PRs will be created for Java and Go. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410662) Time Spent: 8h 10m (was: 8h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410659 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 26/Mar/20 21:57 Start Date: 26/Mar/20 21:57 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #11067: [BEAM-9136]Add licenses for dependencies URL: https://github.com/apache/beam/pull/11067#issuecomment-604707757 Run Python DockerBuild PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410659) Time Spent: 8h (was: 7h 50m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410658 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 21:55 Start Date: 26/Mar/20 21:55 Worklog Time Spent: 10m Work Description: ajamato commented on pull request #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#discussion_r398916506 ## File path: model/pipeline/src/main/proto/metrics.proto ## @@ -52,38 +61,160 @@ message Annotation { string value = 2; } -// Populated MonitoringInfoSpecs for specific URNs. -// Indicating the required fields to be set. -// SDKs and RunnerHarnesses can load these instances into memory and write a -// validator or code generator to assist with populating and validating -// MonitoringInfo protos. +// A set of well known MonitoringInfo specifications. message MonitoringInfoSpecs { enum Enum { -// TODO(BEAM-6926): Add the PTRANSFORM name as a required label after -// upgrading the python SDK. -USER_COUNTER = 0 [(monitoring_info_spec) = { - urn: "beam:metric:user", - type_urn: "beam:metrics:sum_int_64", +// Represents an integer counter where values are summed across bundles. +USER_SUM_INT64 = 0 [(monitoring_info_spec) = { + urn: "beam:metric:user:v1", + type: "beam:metrics:sum_int64:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a double counter where values are summed across bundles. +USER_SUM_DOUBLE = 1 [(monitoring_info_spec) = { + urn: "beam:metric:user:v1", + type: "beam:metrics:sum_double:v1", + required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"], + annotations: [{ +key: "description", +value: "URN utilized to report user metric." + }] +}]; + +// Represents a distribution of an integer value where: +// - count: represents the number of values seen across all bundles Review comment: Now it seems like there technically aren't any fields named "count", "sum", "min", "max". Just 4 encoded varints in that specific order. There is no longer a proto or anything which defines this format. If we are going to keep type urns, I think that there should be somewhere in this file where you could a "TypeSpec", which describes how to encode each opaque bytes payload. i.e. the coders used for each value, the order they must be encoded. Or a proto that should be serialized into that bytes field, etc. A description that will work for all languages. Right now you can only know that from looking at your encoding code. I think it would be best if SDK implemented could look at a reference file like this and know how to populate the MonitoringInfo. That was the original intention behind MonitoringInfoSpec, and I believe that is a bit lost now with this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410658) Time Spent: 32h 10m (was: 32h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 32h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9557) Error setting processing time timers near end-of-window
[ https://issues.apache.org/jira/browse/BEAM-9557?focusedWorklogId=410657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410657 ] ASF GitHub Bot logged work on BEAM-9557: Author: ASF GitHub Bot Created on: 26/Mar/20 21:52 Start Date: 26/Mar/20 21:52 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11226: [BEAM-9557] Fix timer window boundary checking URL: https://github.com/apache/beam/pull/11226#issuecomment-604620053 This failed test might be relevant to this PR: org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutOfBoundsEventTimeTimer (link https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1930/) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410657) Time Spent: 50m (was: 40m) > Error setting processing time timers near end-of-window > --- > > Key: BEAM-9557 > URL: https://issues.apache.org/jira/browse/BEAM-9557 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Steve Niemitz >Assignee: Reuven Lax >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Previously, it was possible to set a processing time timer past the end of a > window, and it would simply not fire. > However, now, this results in an error: > {code:java} > java.lang.IllegalArgumentException: Attempted to set event time timer that > outputs for 2020-03-19T18:01:35.000Z but that is after the expiration of > window 2020-03-19T17:59:59.999Z > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > > org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setAndVerifyOutputTimestamp(SimpleDoFnRunner.java:1011) > > org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setRelative(SimpleDoFnRunner.java:934) > .processElement(???.scala:187) > {code} > > I think the regression was introduced in commit > a005fd765a762183ca88df90f261f6d4a20cf3e0. Also notice that the error message > is wrong, it says that "event time timer" but the timer is in the processing > time domain. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9619) Install Python 3.8 on Jenkins workers
Valentyn Tymofieiev created BEAM-9619: - Summary: Install Python 3.8 on Jenkins workers Key: BEAM-9619 URL: https://issues.apache.org/jira/browse/BEAM-9619 Project: Beam Issue Type: Sub-task Components: testing Reporter: Valentyn Tymofieiev -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9550) beam_PostCommit_Python_Chicago_Taxi_Flink OOM
[ https://issues.apache.org/jira/browse/BEAM-9550?focusedWorklogId=410653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410653 ] ASF GitHub Bot logged work on BEAM-9550: Author: ASF GitHub Bot Created on: 26/Mar/20 21:43 Start Date: 26/Mar/20 21:43 Worklog Time Spent: 10m Work Description: kamilwu commented on issue #11193: [BEAM-9550] Increase JVM Metaspace size for the TaskExecutors. URL: https://github.com/apache/beam/pull/11193#issuecomment-604702445 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410653) Time Spent: 3h (was: 2h 50m) > beam_PostCommit_Python_Chicago_Taxi_Flink OOM > - > > Key: BEAM-9550 > URL: https://issues.apache.org/jira/browse/BEAM-9550 > Project: Beam > Issue Type: Bug > Components: runner-flink, test-failures >Reporter: Kyle Weaver >Assignee: Kamil Wasilewski >Priority: Major > Labels: currently-failing > Time Spent: 3h > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_Python_Chicago_Taxi_Flink/ > The following error has been occurring consistently for several days: > 07:57:26 ERROR:root:java.lang.OutOfMemoryError: Metaspace > 07:57:27 Traceback (most recent call last): > 07:57:27 File "tfdv_analyze_and_validate.py", line 227, in > 07:57:27 main() > 07:57:27 File "tfdv_analyze_and_validate.py", line 212, in main > 07:57:27 project=known_args.metric_reporting_project) > 07:57:27 File "tfdv_analyze_and_validate.py", line 132, in compute_stats > 07:57:27 result.wait_until_finish() > 07:57:27 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Chicago_Taxi_Flink/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/apache_beam/runners/portability/portable_runner.py", > line 545, in wait_until_finish > 07:57:27 (self._job_id, self._state, self._last_error_message())) > 07:57:27 RuntimeError: Pipeline > chicago-taxi-tfdv-20200317-144954-eval_9742ac2b-26bf-4d1d-835e-572d4efacfcb > failed in state FAILED: java.lang.OutOfMemoryError: Metaspace -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410650 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 26/Mar/20 21:37 Start Date: 26/Mar/20 21:37 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11067: [BEAM-9136]Add licenses for dependencies URL: https://github.com/apache/beam/pull/11067#discussion_r398907891 ## File path: .test-infra/jenkins/job_PreCommit_Python_DockerBuild.groovy ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import PrecommitJobBuilder + +PrecommitJobBuilder builder = new PrecommitJobBuilder( +scope: this, +nameBase: 'Python', Review comment: They are not for Python Precommit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410650) Time Spent: 7h 50m (was: 7h 40m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410637 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 21:20 Start Date: 26/Mar/20 21:20 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#discussion_r398899707 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformer.java ## @@ -95,7 +99,12 @@ public CounterUpdate transform(MonitoringInfo monitoringInfo) { return null; } -long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +long value; +try { + value = VARINT_CODER.decode(monitoringInfo.getPayload().newInput()); Review comment: Done here an elsewhere. I introduced a MonitoringInfoEncodings class with the convenience methods for the currently used encodings. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410637) Time Spent: 31h 50m (was: 31h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 31h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410638 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 26/Mar/20 21:20 Start Date: 26/Mar/20 21:20 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #11067: [BEAM-9136]Add licenses for dependencies URL: https://github.com/apache/beam/pull/11067#issuecomment-604692858 > I wonder if it would make sense to have separate, more focused PRs for each of Python, Go, and Java. yep, will create separate PRs for each language. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410638) Time Spent: 7h 40m (was: 7.5h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410639 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 21:20 Start Date: 26/Mar/20 21:20 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11184: [BEAM-4374] Update protos related to MonitoringInfo. URL: https://github.com/apache/beam/pull/11184#discussion_r398899707 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ElementCountMonitoringInfoToCounterUpdateTransformer.java ## @@ -95,7 +99,12 @@ public CounterUpdate transform(MonitoringInfo monitoringInfo) { return null; } -long value = monitoringInfo.getMetric().getCounterData().getInt64Value(); +long value; +try { + value = VARINT_CODER.decode(monitoringInfo.getPayload().newInput()); Review comment: Done here and elsewhere. I introduced a MonitoringInfoEncodings class with the convenience methods for the currently used encodings. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410639) Time Spent: 32h (was: 31h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 32h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform example script
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=410632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410632 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 26/Mar/20 21:16 Start Date: 26/Mar/20 21:16 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #10055: [BEAM-8603] Add Python SqlTransform URL: https://github.com/apache/beam/pull/10055#discussion_r398897788 ## File path: sdks/python/apache_beam/transforms/sql_test.py ## @@ -0,0 +1,109 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for transforms that use the SQL Expansion service.""" + +# pytype: skip-file + +from __future__ import absolute_import + +import logging +import typing +import unittest + +from nose.plugins.attrib import attr +from past.builtins import unicode + +import apache_beam as beam +from apache_beam import coders +from apache_beam.options.pipeline_options import DebugOptions +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to +from apache_beam.transforms.sql import SqlTransform +from apache_beam.utils import subprocess_server + +SimpleRow = typing.NamedTuple( +"SimpleRow", [("int", int), ("str", unicode), ("flt", float)]) +coders.registry.register_coder(SimpleRow, coders.RowCoder) + + +@attr('UsesSqlExpansionService') +@unittest.skipIf( +TestPipeline().get_pipeline_options().view_as(StandardOptions).runner is +None, +"Must be run with a runner that supports cross-language transforms") Review comment: Oh ok I didn't realize that. Really I just needed a way to prevent this test from running in the Python PreCommit, since the SQL expansion service isn't built in that context. The other xlang test suite handles that by checking if the `EXPANSION_PORT` env var is set. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410632) Time Spent: 4h 40m (was: 4.5h) > Add Python SqlTransform example script > -- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=410629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410629 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 26/Mar/20 21:11 Start Date: 26/Mar/20 21:11 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #11216: [BEAM-9562] Remove TimerSpec from Proto URL: https://github.com/apache/beam/pull/11216#issuecomment-604688942 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410629) Time Spent: 4h 50m (was: 4h 40m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=410628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410628 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 26/Mar/20 21:11 Start Date: 26/Mar/20 21:11 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #11216: [BEAM-9562] Remove TimerSpec from Proto URL: https://github.com/apache/beam/pull/11216#issuecomment-604688875 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410628) Time Spent: 4h 40m (was: 4.5h) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9574) NamedTuple instances generated from schemas cannot be pickled
[ https://issues.apache.org/jira/browse/BEAM-9574?focusedWorklogId=410627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410627 ] ASF GitHub Bot logged work on BEAM-9574: Author: ASF GitHub Bot Created on: 26/Mar/20 21:10 Start Date: 26/Mar/20 21:10 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11196: [BEAM-9574] Ensure that instances of generated NamedTuple classes can be pickled URL: https://github.com/apache/beam/pull/11196#discussion_r398895069 ## File path: sdks/python/apache_beam/typehints/schemas.py ## @@ -205,6 +218,11 @@ def typing_from_runner_api(fieldtype_proto): pass # TODO +def _hydrate_namedtuple_instance(encoded_schema, values): + return named_tuple_from_schema( + proto_utils.parse_Bytes(encoded_schema, schema_pb2.Schema))(*values) + + def named_tuple_from_schema(schema): Review comment: It's effectively memoized with SCHEMA_REGISTRY inside `typing_from_runner_api`. We could short-circuit it here as well though This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410627) Time Spent: 0.5h (was: 20m) > NamedTuple instances generated from schemas cannot be pickled > - > > Key: BEAM-9574 > URL: https://issues.apache.org/jira/browse/BEAM-9574 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Attempting to pickle an instance of a generated NamedTuple class results in > the following: > {code} > _pickle.PicklingError: Can't pickle 'apache_beam.typehints.schemas.BeamSchema_a7de91e0_ae11_4c52_a041_0b58ada35ac1'>: > attribute lookup BeamSchema_a7de91e0_ae11_4c52_a041_0b58ada35ac1 on > apache_beam.typehints.schemas failed > {code} > In general, we shouldn't be pickling these instances, but occasionally it may > be necessary, and we should just do it rather than failing hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410623 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 26/Mar/20 21:04 Start Date: 26/Mar/20 21:04 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11067: [BEAM-9136]Add licenses for dependencies URL: https://github.com/apache/beam/pull/11067#discussion_r398891416 ## File path: .test-infra/jenkins/job_PreCommit_Python_DockerBuild.groovy ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import PrecommitJobBuilder + +PrecommitJobBuilder builder = new PrecommitJobBuilder( +scope: this, +nameBase: 'Python', Review comment: Isn't docker built as part of the precommit tests already? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410623) Time Spent: 7h 20m (was: 7h 10m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=410624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410624 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 26/Mar/20 21:04 Start Date: 26/Mar/20 21:04 Worklog Time Spent: 10m Work Description: robertwb commented on issue #11067: [BEAM-9136]Add licenses for dependencies URL: https://github.com/apache/beam/pull/11067#issuecomment-604685294 I wonder if it would make sense to have separate, more focused PRs for each of Python, Go, and Java. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410624) Time Spent: 7.5h (was: 7h 20m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410617 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 20:57 Start Date: 26/Mar/20 20:57 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11231: [BEAM-4374] Shortids for the Go SDK URL: https://github.com/apache/beam/pull/11231#discussion_r398886664 ## File path: sdks/go/pkg/beam/core/runtime/harness/monitoring.go ## @@ -16,20 +16,71 @@ package harness import ( + "bytes" + "strconv" + "sync" + "sync/atomic" "time" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/mtime" "github.com/apache/beam/sdks/go/pkg/beam/core/metrics" "github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec" fnpb "github.com/apache/beam/sdks/go/pkg/beam/model/fnexecution_v1" ppb "github.com/apache/beam/sdks/go/pkg/beam/model/pipeline_v1" "github.com/golang/protobuf/ptypes" ) -func monitoring(p *exec.Plan) (*fnpb.Metrics, []*ppb.MonitoringInfo) { +// TODO: 2020/03/26 - measure mutex overhead vs sync.Map for this case. +// sync.Map might have lower contention for this read heavy load. +var ( + shortMu sync.Mutex + labels2ShortIds map[metrics.Labels]string Review comment: Ah good point. Can't use protos as Go Map keys, because of all the magic fields they have, but I can use other things. I've put in aligned constants, types, and string arrays for the proto specified strings, so these lookups don't end up hashing the strings every time (and instead use a uint32, which is very fast for go maps to deal with.) There's still the hashing of the fields in metrics.Labels, but we can do the same hashing in the metrics code at a later time, to allow for faster lookups for those instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410617) Time Spent: 31.5h (was: 31h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 31.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=410618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-410618 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 26/Mar/20 20:57 Start Date: 26/Mar/20 20:57 Worklog Time Spent: 10m Work Description: lostluck commented on issue #11231: [BEAM-4374] Shortids for the Go SDK URL: https://github.com/apache/beam/pull/11231#issuecomment-604681904 Run Go Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 410618) Time Spent: 31h 40m (was: 31.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 31h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian Jira (v8.3.4#803005)