[jira] [Updated] (BEAM-7069) Emit a warning when a pipeline option with a hyphen is encountered by BeamArgumentParser
[ https://issues.apache.org/jira/browse/BEAM-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7069: -- Description: CC: [~altay] [~udim]] (was: CC: [~altay] [~ehudm]) > Emit a warning when a pipeline option with a hyphen is encountered by > BeamArgumentParser > > > Key: BEAM-7069 > URL: https://issues.apache.org/jira/browse/BEAM-7069 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Minor > > CC: [~altay] [~udim]] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=227100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227100 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 13/Apr/19 02:38 Start Date: 13/Apr/19 02:38 Worklog Time Spent: 10m Work Description: udim commented on issue #7949: [BEAM-3713] Add pytest testing infrastructure URL: https://github.com/apache/beam/pull/7949#issuecomment-482769612 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227100) Time Spent: 2.5h (was: 2h 20m) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7067) Flink job server: can't disable cleanArtifactsPerJob
[ https://issues.apache.org/jira/browse/BEAM-7067?focusedWorklogId=227096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227096 ] ASF GitHub Bot logged work on BEAM-7067: Author: ASF GitHub Bot Created on: 13/Apr/19 02:01 Start Date: 13/Apr/19 02:01 Worklog Time Spent: 10m Work Description: angoenka commented on issue #8293: [BEAM-7067] make cleanArtifactsPerJob configurable for Flink job serv… URL: https://github.com/apache/beam/pull/8293#issuecomment-482767137 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227096) Time Spent: 0.5h (was: 20m) > Flink job server: can't disable cleanArtifactsPerJob > > > Key: BEAM-7067 > URL: https://issues.apache.org/jira/browse/BEAM-7067 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > After [https://github.com/apache/beam/pull/8210], cleanArtifactsPerJob is now > an explicit boolean defaulting to true, so > [flink_job_server.gradle|https://github.com/apache/beam/blob/531cdf57c4f308482b2137f35ca8c77c96c64ffe/runners/flink/job-server/flink_job_server.gradle#L98-L99] > needs to be adjusted so it can be configured. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7052) portableWordCountTask raises IOError
[ https://issues.apache.org/jira/browse/BEAM-7052?focusedWorklogId=227095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227095 ] ASF GitHub Bot logged work on BEAM-7052: Author: ASF GitHub Bot Created on: 13/Apr/19 02:00 Start Date: 13/Apr/19 02:00 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #8282: [BEAM-7052] add optional environmentType property to python wordcount task URL: https://github.com/apache/beam/pull/8282#discussion_r275100950 ## File path: sdks/python/build.gradle ## @@ -182,6 +182,9 @@ def portableWordCountTask(name, streaming) { options += ["--environment_cache_millis=1"] if (project.hasProperty("jobEndpoint")) options += ["--job_endpoint=${project.property('jobEndpoint')}"] + if (project.hasProperty("environmentType")) { Review comment: Lets add environmentConfig as well without it, environmentType will not always be useful. ``` if (project.hasProperty("environmentConfig")) { options += ["--environment_config=${project.property('environmentConfig')}"] } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227095) Time Spent: 1h 10m (was: 1h) > portableWordCountTask raises IOError > > > Key: BEAM-7052 > URL: https://issues.apache.org/jira/browse/BEAM-7052 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Got this error when trying to run (Python) portableWordCountTask on the Spark > portable runner: > {{RuntimeError: IOError: [Errno 2] No such file or directory: > '/tmp/beam-temp-py-wordcount-direct-a9e7beba888211e8843d0251/44ba0386-4221-4abc-ae8d-5a2aa4610c4f.py-wordcount-direct' > [while running 'write/Write/WriteImpl/WriteBundles']}} > This does not seem to be an issue with Flink (maybe because of > environment_cache_millis), but in any case the problem is fixed when I set > --environment_type=LOOPBACK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7039) Spark portable runner: run validatesRunner tests
[ https://issues.apache.org/jira/browse/BEAM-7039?focusedWorklogId=227093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227093 ] ASF GitHub Bot logged work on BEAM-7039: Author: ASF GitHub Bot Created on: 13/Apr/19 01:57 Start Date: 13/Apr/19 01:57 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #8285: [BEAM-7039] set up validatesPortableRunner tests for Spark URL: https://github.com/apache/beam/pull/8285#discussion_r275100838 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1559,6 +1561,7 @@ class BeamModulePlugin implements Plugin { project.tasks.create(name: name, type: Test) { group = "Verification" description = "Validates the PortableRunner with JobServer ${config.jobServerDriver}" +systemProperties config.systemProperties systemProperty "beamTestPipelineOptions", JsonOutput.toJson(beamTestPipelineOptions) Review comment: Shall we also merge this system property to systemProperties set above, This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227093) Time Spent: 1h 50m (was: 1h 40m) > Spark portable runner: run validatesRunner tests > > > Key: BEAM-7039 > URL: https://issues.apache.org/jira/browse/BEAM-7039 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7067) Flink job server: can't disable cleanArtifactsPerJob
[ https://issues.apache.org/jira/browse/BEAM-7067?focusedWorklogId=227091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227091 ] ASF GitHub Bot logged work on BEAM-7067: Author: ASF GitHub Bot Created on: 13/Apr/19 01:55 Start Date: 13/Apr/19 01:55 Worklog Time Spent: 10m Work Description: angoenka commented on issue #8293: [BEAM-7067] make cleanArtifactsPerJob configurable for Flink job serv… URL: https://github.com/apache/beam/pull/8293#issuecomment-482766707 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227091) Time Spent: 20m (was: 10m) > Flink job server: can't disable cleanArtifactsPerJob > > > Key: BEAM-7067 > URL: https://issues.apache.org/jira/browse/BEAM-7067 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > After [https://github.com/apache/beam/pull/8210], cleanArtifactsPerJob is now > an explicit boolean defaulting to true, so > [flink_job_server.gradle|https://github.com/apache/beam/blob/531cdf57c4f308482b2137f35ca8c77c96c64ffe/runners/flink/job-server/flink_job_server.gradle#L98-L99] > needs to be adjusted so it can be configured. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6975) Merge portability status into capability matrix
[ https://issues.apache.org/jira/browse/BEAM-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816790#comment-16816790 ] Melissa Pashniak commented on BEAM-6975: Unassigning as I'm not actively working on this > Merge portability status into capability matrix > --- > > Key: BEAM-6975 > URL: https://issues.apache.org/jira/browse/BEAM-6975 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Ahmet Altay >Priority: Major > > Should the portability status: > https://s.apache.org/apache-beam-portability-support-table > be merged into capability matrix > https://beam.apache.org/documentation/runners/capability-matrix/ ? > (That is add portable runners to the list of runners as columns in the > capability matrix.) > cc: [~kenn] [~lcwik] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6975) Merge portability status into capability matrix
[ https://issues.apache.org/jira/browse/BEAM-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Melissa Pashniak reassigned BEAM-6975: -- Assignee: (was: Melissa Pashniak) > Merge portability status into capability matrix > --- > > Key: BEAM-6975 > URL: https://issues.apache.org/jira/browse/BEAM-6975 > Project: Beam > Issue Type: Sub-task > Components: website >Reporter: Ahmet Altay >Priority: Major > > Should the portability status: > https://s.apache.org/apache-beam-portability-support-table > be merged into capability matrix > https://beam.apache.org/documentation/runners/capability-matrix/ ? > (That is add portable runners to the list of runners as columns in the > capability matrix.) > cc: [~kenn] [~lcwik] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6853) Make Java and python portable options same
[ https://issues.apache.org/jira/browse/BEAM-6853?focusedWorklogId=227080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227080 ] ASF GitHub Bot logged work on BEAM-6853: Author: ASF GitHub Bot Created on: 13/Apr/19 01:30 Start Date: 13/Apr/19 01:30 Worklog Time Spent: 10m Work Description: angoenka commented on issue #8286: [BEAM-6853] Make sdkWorkerParallelism option consistent URL: https://github.com/apache/beam/pull/8286#issuecomment-482764982 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227080) Time Spent: 1h 40m (was: 1.5h) > Make Java and python portable options same > -- > > Key: BEAM-6853 > URL: https://issues.apache.org/jira/browse/BEAM-6853 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Java > [PortableRunnerOptions|https://github.com/apache/beam/blob/f21cfaefd54afb798103dc90ab57290739e81e81/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L80] > and [Python Portable > options|https://github.com/apache/beam/blob/f21cfaefd54afb798103dc90ab57290739e81e81/sdks/python/apache_beam/options/pipeline_options.py#L719] > don't have the same values limiting the use of sdk-worker-parallelism and > environment-cache-millis in python sdk. > > Add these options to the python sdk. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6853) Make Java and python portable options same
[ https://issues.apache.org/jira/browse/BEAM-6853?focusedWorklogId=227077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227077 ] ASF GitHub Bot logged work on BEAM-6853: Author: ASF GitHub Bot Created on: 13/Apr/19 01:20 Start Date: 13/Apr/19 01:20 Worklog Time Spent: 10m Work Description: angoenka commented on issue #8286: [BEAM-6853] Make sdkWorkerParallelism option consistent URL: https://github.com/apache/beam/pull/8286#issuecomment-482764219 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227077) Time Spent: 1.5h (was: 1h 20m) > Make Java and python portable options same > -- > > Key: BEAM-6853 > URL: https://issues.apache.org/jira/browse/BEAM-6853 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Java > [PortableRunnerOptions|https://github.com/apache/beam/blob/f21cfaefd54afb798103dc90ab57290739e81e81/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L80] > and [Python Portable > options|https://github.com/apache/beam/blob/f21cfaefd54afb798103dc90ab57290739e81e81/sdks/python/apache_beam/options/pipeline_options.py#L719] > don't have the same values limiting the use of sdk-worker-parallelism and > environment-cache-millis in python sdk. > > Add these options to the python sdk. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6668) use add experiment methods (Java and Python)
[ https://issues.apache.org/jira/browse/BEAM-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816779#comment-16816779 ] Valentyn Tymofieiev commented on BEAM-6668: --- Python portion is completed by https://github.com/apache/beam/pull/8225. > use add experiment methods (Java and Python) > > > Key: BEAM-6668 > URL: https://issues.apache.org/jira/browse/BEAM-6668 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Priority: Minor > Labels: beginner, easyfix, newbie > > Python: > Convert instances of experiments.append(...) > to debug_options.add_experiment(...) > Java: > Use ExperimentalOptions.addExperiment(...) > instead of getExperiments(), modify, setExperiments() pattern. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7069) Emit a warning when a pipeline option with a hyphen is encountered by BeamArgumentParser
Valentyn Tymofieiev created BEAM-7069: - Summary: Emit a warning when a pipeline option with a hyphen is encountered by BeamArgumentParser Key: BEAM-7069 URL: https://issues.apache.org/jira/browse/BEAM-7069 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Valentyn Tymofieiev Assignee: Valentyn Tymofieiev CC: [~altay] [~ehudm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=227072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227072 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 13/Apr/19 01:10 Start Date: 13/Apr/19 01:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227072) Time Spent: 11h 40m (was: 11.5h) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=227071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227071 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 13/Apr/19 01:10 Start Date: 13/Apr/19 01:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275098905 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -738,13 +801,13 @@ def _add_argparse_args(cls, parser): '""} }. All fields in the json are optional except ' 'command.')) parser.add_argument( -'--sdk-worker-parallelism', default=None, +'--sdk_worker_parallelism', default=None, Review comment: Awesome, sounds like we are safe. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227071) Time Spent: 11.5h (was: 11h 20m) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 11.5h > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7035) Clear() method of OutputTimer is inconsistent
[ https://issues.apache.org/jira/browse/BEAM-7035?focusedWorklogId=227068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227068 ] ASF GitHub Bot logged work on BEAM-7035: Author: ASF GitHub Bot Created on: 13/Apr/19 01:02 Start Date: 13/Apr/19 01:02 Worklog Time Spent: 10m Work Description: tweise commented on pull request #8300: [BEAM-7035] Compatible wire representation for timers in Python SDK URL: https://github.com/apache/beam/pull/8300 This change is just to fix the timer encoding. Changes to support cancellation of timers on the Flink runner side and test coverage to follow. Adding @lukecwik who will be familiar with the timestamp encoding workarounds :) Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch)
[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=227062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227062 ] ASF GitHub Bot logged work on BEAM-6138: Author: ASF GitHub Bot Created on: 13/Apr/19 00:39 Start Date: 13/Apr/19 00:39 Worklog Time Spent: 10m Work Description: ajamato commented on pull request #8299: [DO NOT MERGE] [BEAM-6138] Add the Sampled Byte Count counters to the Java SDK URL: https://github.com/apache/beam/pull/8299 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) --- |Java | Python | Go | Website --- | --- | --- | --- | ---
[jira] [Work logged] (BEAM-7068) Improve error messages when binding functions
[ https://issues.apache.org/jira/browse/BEAM-7068?focusedWorklogId=227032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227032 ] ASF GitHub Bot logged work on BEAM-7068: Author: ASF GitHub Bot Created on: 13/Apr/19 00:15 Start Date: 13/Apr/19 00:15 Worklog Time Spent: 10m Work Description: youngoli commented on issue #8298: [BEAM-7068] Improving error messages when binding functions in Go SDK URL: https://github.com/apache/beam/pull/8298#issuecomment-482758365 R: @lostluck @ibzib This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227032) Time Spent: 20m (was: 10m) > Improve error messages when binding functions > - > > Key: BEAM-7068 > URL: https://issues.apache.org/jira/browse/BEAM-7068 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > This bug covers error messages around the file core/graph/bind.go which binds > the inputs and outputs of functions to make sure all the types match up > properly. When DoFns are created with the wrong number of inputs/outputs or > wrong input/output types, this is where errors will usually originate from. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7068) Improve error messages when binding functions
[ https://issues.apache.org/jira/browse/BEAM-7068?focusedWorklogId=227031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227031 ] ASF GitHub Bot logged work on BEAM-7068: Author: ASF GitHub Bot Created on: 13/Apr/19 00:13 Start Date: 13/Apr/19 00:13 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #8298: [BEAM-7068] Improving error messages when binding functions in Go SDK URL: https://github.com/apache/beam/pull/8298 I improve the error messages by adding useful context wherever it comes in handy, plus making the error messages more human-readable, for example by printing out a function name instead of address. To show an example, this is the current error message when a DoFn has the wrong type as input: `failed to bind {Fn:0xc000aa4aa0 Param:[{Kind:Value T:int} {Kind:Emit T:func(string)}] Ret:[]} to input [string]: string is not assignable to int` New error message: `creating new DoFn in scope root/CountWords: binding fn main.extractWordsFn: binding params [{Value int}] to input string: string is not assignable to int` Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build
[jira] [Work logged] (BEAM-6747) Adding ExternalTransform in JavaSDK
[ https://issues.apache.org/jira/browse/BEAM-6747?focusedWorklogId=227029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227029 ] ASF GitHub Bot logged work on BEAM-6747: Author: ASF GitHub Bot Created on: 13/Apr/19 00:08 Start Date: 13/Apr/19 00:08 Worklog Time Spent: 10m Work Description: ihji commented on pull request #7954: [BEAM-6747] Adding ExternalTransform in JavaSDK URL: https://github.com/apache/beam/pull/7954#discussion_r275094777 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java ## @@ -43,7 +43,12 @@ import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap; import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables; -/** Cross-language external transform. */ +/** + * Cross-language external transform. + * + * {@link External} provides a cross-language transform via expansion services in non-Java SDKs. + * This is a low-level API and mainly for internal use. + */ Review comment: Thanks. Added more comments on requirements. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227029) Time Spent: 6h 10m (was: 6h) > Adding ExternalTransform in JavaSDK > --- > > Key: BEAM-6747 > URL: https://issues.apache.org/jira/browse/BEAM-6747 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > Adding Java counterpart of Python ExternalTransform for testing Python > transforms from pipelines in Java SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6747) Adding ExternalTransform in JavaSDK
[ https://issues.apache.org/jira/browse/BEAM-6747?focusedWorklogId=227028=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227028 ] ASF GitHub Bot logged work on BEAM-6747: Author: ASF GitHub Bot Created on: 13/Apr/19 00:04 Start Date: 13/Apr/19 00:04 Worklog Time Spent: 10m Work Description: ihji commented on pull request #7954: [BEAM-6747] Adding ExternalTransform in JavaSDK URL: https://github.com/apache/beam/pull/7954#discussion_r275094474 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExternalTest.java ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction; + +import com.google.auto.service.AutoService; +import java.io.IOException; +import java.io.Serializable; +import java.nio.charset.StandardCharsets; +import java.util.Map; +import org.apache.beam.runners.core.construction.expansion.ExpansionService; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.UsesCrossLanguageTransforms; +import org.apache.beam.sdk.testing.ValidatesRunner; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.MapElements; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.apache.beam.sdk.values.TypeDescriptors; +import org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ConnectivityState; +import org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ManagedChannel; +import org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ManagedChannelBuilder; +import org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Server; +import org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ServerBuilder; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Test External transforms. */ +@RunWith(JUnit4.class) +public class ExternalTest implements Serializable { + @Rule public transient TestPipeline testPipeline = TestPipeline.create(); + + private static final String TEST_URN_SIMPLE = "simple"; + private static final String TEST_URN_LE = "le"; + private static final String TEST_URN_MULTI = "multi"; + + private static String pythonServerCommand; + private static Integer expansionPort; + private static String localExpansionAddr; + private static Server localExpansionServer; + + @BeforeClass + public static void setUp() throws IOException { +pythonServerCommand = System.getProperty("pythonTestExpansionCommand"); +expansionPort = Integer.valueOf(System.getProperty("expansionPort")); +int localExpansionPort = expansionPort + 100; +localExpansionAddr = String.format("localhost:%s", localExpansionPort); + +localExpansionServer = +ServerBuilder.forPort(localExpansionPort).addService(new ExpansionService()).build(); +localExpansionServer.start(); + } + + @AfterClass + public static void tearDown() { +localExpansionServer.shutdownNow(); + } + + @Test + @Category({ValidatesRunner.class, UsesCrossLanguageTransforms.class}) + public void expandSingleTest() { +PCollection col = +testPipeline +.apply(Create.of(1, 2, 3)) +.apply(External.of(TEST_URN_SIMPLE, new byte[] {}, localExpansionAddr)); +PAssert.that(col).containsInAnyOrder(2, 3, 4); +testPipeline.run(); + } + + @Test + @Category({ValidatesRunner.class, UsesCrossLanguageTransforms.class}) + public void expandMultipleTest() { +PCollection pcol = +testPipeline +.apply(Create.of(1,
[jira] [Updated] (BEAM-7068) Improve error messages when binding functions
[ https://issues.apache.org/jira/browse/BEAM-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-7068: -- Summary: Improve error messages when binding functions (was: Improve error messages for errors binding functions) > Improve error messages when binding functions > - > > Key: BEAM-7068 > URL: https://issues.apache.org/jira/browse/BEAM-7068 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Minor > > This bug covers error messages around the file core/graph/bind.go which binds > the inputs and outputs of functions to make sure all the types match up > properly. When DoFns are created with the wrong number of inputs/outputs or > wrong input/output types, this is where errors will usually originate from. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7068) Improve error messages for errors binding functions
Daniel Oliveira created BEAM-7068: - Summary: Improve error messages for errors binding functions Key: BEAM-7068 URL: https://issues.apache.org/jira/browse/BEAM-7068 Project: Beam Issue Type: Sub-task Components: sdk-go Reporter: Daniel Oliveira Assignee: Daniel Oliveira This bug covers error messages around the file core/graph/bind.go which binds the inputs and outputs of functions to make sure all the types match up properly. When DoFns are created with the wrong number of inputs/outputs or wrong input/output types, this is where errors will usually originate from. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=227020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227020 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 23:36 Start Date: 12/Apr/19 23:36 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#issuecomment-482753276 Not user friendly at all! An attempt to limit data loss. The real solution will be nano precision, a logical type at the Beam schema level. We will have to reimplement date time functions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227020) Time Spent: 2h 20m (was: 2h 10m) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4543) Remove dependency on googledatastore in favor of google-cloud-datastore.
[ https://issues.apache.org/jira/browse/BEAM-4543?focusedWorklogId=227014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227014 ] ASF GitHub Bot logged work on BEAM-4543: Author: ASF GitHub Bot Created on: 12/Apr/19 23:15 Start Date: 12/Apr/19 23:15 Worklog Time Spent: 10m Work Description: udim commented on issue #8262: [BEAM-4543] Python Datastore IO using google-cloud-datastore URL: https://github.com/apache/beam/pull/8262#issuecomment-482750289 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227014) Time Spent: 20m (was: 10m) > Remove dependency on googledatastore in favor of google-cloud-datastore. > > > Key: BEAM-4543 > URL: https://issues.apache.org/jira/browse/BEAM-4543 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > Labels: triaged > Time Spent: 20m > Remaining Estimate: 0h > > apache-beam[gcp] package depends [1] on googledatastore package [2]. We > should replace this dependency with google-cloud-datastore [3] which is > officially supported, has better release cadence and also has Python 3 > support. > [1] > https://github.com/apache/beam/blob/fad655462f8fadfdfaab0b7a09cab538f076f94e/sdks/python/setup.py#L126 > [2] [https://pypi.org/project/googledatastore/] > [3] [https://pypi.org/project/google-cloud-datastore/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6669) revert service_default_cmek_config experiment flag
[ https://issues.apache.org/jira/browse/BEAM-6669?focusedWorklogId=227012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227012 ] ASF GitHub Bot logged work on BEAM-6669: Author: ASF GitHub Bot Created on: 12/Apr/19 23:10 Start Date: 12/Apr/19 23:10 Worklog Time Spent: 10m Work Description: udim commented on issue #8296: [BEAM-6669] Set Dataflow KMS key name URL: https://github.com/apache/beam/pull/8296#issuecomment-482749302 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227012) Time Spent: 50m (was: 40m) > revert service_default_cmek_config experiment flag > --- > > Key: BEAM-6669 > URL: https://issues.apache.org/jira/browse/BEAM-6669 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Do this when --dataflowKmsKey is supported on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6669) revert service_default_cmek_config experiment flag
[ https://issues.apache.org/jira/browse/BEAM-6669?focusedWorklogId=227013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227013 ] ASF GitHub Bot logged work on BEAM-6669: Author: ASF GitHub Bot Created on: 12/Apr/19 23:10 Start Date: 12/Apr/19 23:10 Worklog Time Spent: 10m Work Description: udim commented on issue #8297: [BEAM-6669] Set Dataflow KMS key name (Python) URL: https://github.com/apache/beam/pull/8297#issuecomment-482749309 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227013) Time Spent: 1h (was: 50m) > revert service_default_cmek_config experiment flag > --- > > Key: BEAM-6669 > URL: https://issues.apache.org/jira/browse/BEAM-6669 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Do this when --dataflowKmsKey is supported on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6669) revert service_default_cmek_config experiment flag
[ https://issues.apache.org/jira/browse/BEAM-6669?focusedWorklogId=227010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227010 ] ASF GitHub Bot logged work on BEAM-6669: Author: ASF GitHub Bot Created on: 12/Apr/19 23:06 Start Date: 12/Apr/19 23:06 Worklog Time Spent: 10m Work Description: udim commented on issue #8297: [BEAM-6669] Set Dataflow KMS key name (Python) URL: https://github.com/apache/beam/pull/8297#issuecomment-482748612 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227010) Time Spent: 40m (was: 0.5h) > revert service_default_cmek_config experiment flag > --- > > Key: BEAM-6669 > URL: https://issues.apache.org/jira/browse/BEAM-6669 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Do this when --dataflowKmsKey is supported on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6669) revert service_default_cmek_config experiment flag
[ https://issues.apache.org/jira/browse/BEAM-6669?focusedWorklogId=227008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227008 ] ASF GitHub Bot logged work on BEAM-6669: Author: ASF GitHub Bot Created on: 12/Apr/19 23:05 Start Date: 12/Apr/19 23:05 Worklog Time Spent: 10m Work Description: udim commented on pull request #8297: [BEAM-6669] Set Dataflow KMS key name (Python) URL: https://github.com/apache/beam/pull/8297 Use the official Environment.service_kms_key_name proto field to pass --dataflow_kms_key to the runner. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) --- |Java | Python | Go | Website --- | --- | --- | --- |
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=226997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226997 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 12/Apr/19 23:02 Start Date: 12/Apr/19 23:02 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #8273: [BEAM-4461] A transform to perform binary joins of PCollections with schemas URL: https://github.com/apache/beam/pull/8273#discussion_r274686707 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/CoGroup.java ## @@ -244,7 +244,7 @@ public static By fieldAccessDescriptor(FieldAccessDescriptor fieldAccessDescript * * This only affects the results of expandCrossProduct. */ -public By withOuterJoinParticipation() { +public By withOptionalParticipation() { return toBuilder().setOuterJoinParticipation(true).build(); Review comment: I think this should be changed to get/setOptionalParticipation() This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226997) Time Spent: 25h 10m (was: 25h) > Create a library of useful transforms that use schemas > -- > > Key: BEAM-4461 > URL: https://issues.apache.org/jira/browse/BEAM-4461 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Labels: triaged > Time Spent: 25h 10m > Remaining Estimate: 0h > > e.g. JoinBy(fields). Project, Filter, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=226998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226998 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 12/Apr/19 23:02 Start Date: 12/Apr/19 23:02 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #8273: [BEAM-4461] A transform to perform binary joins of PCollections with schemas URL: https://github.com/apache/beam/pull/8273#discussion_r274585649 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/CoGroup.java ## @@ -159,7 +159,7 @@ * * {@code * PCollection joined = PCollectionTuple.of("input1", input1, "input2", input2) - * .apply(CoGroup.join("input1", By.fieldNames("user").withOuterJoinParticipation()) + * .apply(CoGroup.join("input1", By.fieldNames("user").withOptionalParticipation()) Review comment: Nit: update doc at L152 as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226998) Time Spent: 25h 20m (was: 25h 10m) > Create a library of useful transforms that use schemas > -- > > Key: BEAM-4461 > URL: https://issues.apache.org/jira/browse/BEAM-4461 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Labels: triaged > Time Spent: 25h 20m > Remaining Estimate: 0h > > e.g. JoinBy(fields). Project, Filter, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=227000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227000 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 12/Apr/19 23:02 Start Date: 12/Apr/19 23:02 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #8273: [BEAM-4461] A transform to perform binary joins of PCollections with schemas URL: https://github.com/apache/beam/pull/8273#discussion_r275006903 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/JoinTest.java ## @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static junit.framework.TestCase.assertEquals; +import static org.apache.beam.sdk.schemas.transforms.JoinTestUtils.innerJoin; + +import java.util.List; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.transforms.Join.FieldsEqual; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +/** Tests for {@link org.apache.beam.sdk.schemas.transforms.Join}. */ +public class JoinTest { + @Rule public final transient TestPipeline pipeline = TestPipeline.create(); + + private static final Schema CG_SCHEMA_1 = + Schema.builder() + .addStringField("user") + .addInt32Field("count") + .addStringField("country") + .build(); + private static final Schema CG_SCHEMA_2 = + Schema.builder() + .addStringField("user2") + .addInt32Field("count2") + .addStringField("country2") + .build(); + + @Test + @Category(NeedsRunner.class) + public void testInnerJoinSameKeys() { +List pc1Rows = +Lists.newArrayList( +Row.withSchema(CG_SCHEMA_1).addValues("user1", 1, "us").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 2, "us").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 3, "il").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 4, "il").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 5, "fr").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 6, "fr").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 7, "ar").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 8, "ar").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user3", 8, "ar").build()); +List pc2Rows = +Lists.newArrayList( +Row.withSchema(CG_SCHEMA_1).addValues("user1", 9, "us").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 10, "us").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 11, "il").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 12, "il").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 13, "fr").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 14, "fr").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 15, "ar").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 16, "ar").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user4", 8, "ar").build()); + +PCollection pc1 = pipeline.apply("Create1", Create.of(pc1Rows)).setRowSchema(CG_SCHEMA_1); +PCollection pc2 = pipeline.apply("Create2", Create.of(pc2Rows)).setRowSchema(CG_SCHEMA_1); + +Schema expectedSchema = +Schema.builder() +.addRowField(Join.LHS_TAG, CG_SCHEMA_1) +.addRowField(Join.RHS_TAG, CG_SCHEMA_1) +.build(); + +PCollection joined = pc1.apply(Join.innerJoin(pc2).using("user", "country")); + +
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=227001=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227001 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 12/Apr/19 23:02 Start Date: 12/Apr/19 23:02 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #8273: [BEAM-4461] A transform to perform binary joins of PCollections with schemas URL: https://github.com/apache/beam/pull/8273#discussion_r275068137 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Join.java ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import javax.annotation.Nullable; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.Row; + +/** + * A transform that performs equijoins across two schema {@link PCollection}s. + * + * This transform allows joins between two input PCollections simply by specifying the fields to + * join on. The resulting {@code PCollection} will have two fields named "lhs" and "rhs" + * respectively, each with the schema of the corresponding input PCollection. + * + * For example, the following demonstrates joining two PCollections using a natural join on the + * "user" and "country" fields, where both the left-hand and the right-hand PCollections have fields + * with these names. + * + * + * {@code PCollection joined = pCollection1.apply(Join.innerJoin(pCollection2).using("user", "country")); + * } + * + * If the right-hand PCollection contains fields with different names to join against, you can + * specify them as follows: + * + * {@code PCollection joined = pCollection1.apply(Join.innerJoin(pCollection2) + * .on(FieldsEqual.left("user", "country").right("otherUser", "otherCountry"))); + * } + * + * Full outer joins, left outer joins, and right outer joins are also supported. + */ +public class Join { + public static final String LHS_TAG = "lhs"; + public static final String RHS_TAG = "rhs"; + + /** Predicate object to specify fields to compare when doing an equi-join. */ + public static class FieldsEqual { +public static Inner left(String... fieldNames) { + return new Inner( + FieldAccessDescriptor.withFieldNames(fieldNames), FieldAccessDescriptor.create()); +} + +public static Inner left(Integer... fieldIds) { + return new Inner( + FieldAccessDescriptor.withFieldIds(fieldIds), FieldAccessDescriptor.create()); +} + +public static Inner left(FieldAccessDescriptor fieldAccessDescriptor) { + return new Inner(fieldAccessDescriptor, FieldAccessDescriptor.create()); +} + +public Inner right(String... fieldNames) { + return new Inner( + FieldAccessDescriptor.create(), FieldAccessDescriptor.withFieldNames(fieldNames)); +} + +public Inner right(Integer... fieldIds) { + return new Inner( + FieldAccessDescriptor.create(), FieldAccessDescriptor.withFieldIds(fieldIds)); +} + +public Inner right(FieldAccessDescriptor fieldAccessDescriptor) { + return new Inner(FieldAccessDescriptor.create(), fieldAccessDescriptor); +} + +/** Implementation class for FieldsEqual. */ +public static class Inner { + private FieldAccessDescriptor lhs; + private FieldAccessDescriptor rhs; + + private Inner(FieldAccessDescriptor lhs, FieldAccessDescriptor rhs) { +this.lhs = lhs; +this.rhs = rhs; + } + + public Inner left(String... fieldNames) { +return new Inner(FieldAccessDescriptor.withFieldNames(fieldNames), rhs); + } + + public Inner left(Integer... fieldIds) { +return new Inner(FieldAccessDescriptor.withFieldIds(fieldIds), rhs); + } + + public Inner left(FieldAccessDescriptor fieldAccessDescriptor) { +
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=226999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226999 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 12/Apr/19 23:02 Start Date: 12/Apr/19 23:02 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #8273: [BEAM-4461] A transform to perform binary joins of PCollections with schemas URL: https://github.com/apache/beam/pull/8273#discussion_r275007051 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/transforms/JoinTest.java ## @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import static junit.framework.TestCase.assertEquals; +import static org.apache.beam.sdk.schemas.transforms.JoinTestUtils.innerJoin; + +import java.util.List; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.transforms.Join.FieldsEqual; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Lists; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +/** Tests for {@link org.apache.beam.sdk.schemas.transforms.Join}. */ +public class JoinTest { + @Rule public final transient TestPipeline pipeline = TestPipeline.create(); + + private static final Schema CG_SCHEMA_1 = + Schema.builder() + .addStringField("user") + .addInt32Field("count") + .addStringField("country") + .build(); + private static final Schema CG_SCHEMA_2 = + Schema.builder() + .addStringField("user2") + .addInt32Field("count2") + .addStringField("country2") + .build(); + + @Test + @Category(NeedsRunner.class) + public void testInnerJoinSameKeys() { +List pc1Rows = +Lists.newArrayList( +Row.withSchema(CG_SCHEMA_1).addValues("user1", 1, "us").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 2, "us").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 3, "il").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 4, "il").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 5, "fr").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 6, "fr").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 7, "ar").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 8, "ar").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user3", 8, "ar").build()); +List pc2Rows = +Lists.newArrayList( +Row.withSchema(CG_SCHEMA_1).addValues("user1", 9, "us").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 10, "us").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 11, "il").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user1", 12, "il").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 13, "fr").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 14, "fr").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 15, "ar").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user2", 16, "ar").build(), +Row.withSchema(CG_SCHEMA_1).addValues("user4", 8, "ar").build()); + +PCollection pc1 = pipeline.apply("Create1", Create.of(pc1Rows)).setRowSchema(CG_SCHEMA_1); +PCollection pc2 = pipeline.apply("Create2", Create.of(pc2Rows)).setRowSchema(CG_SCHEMA_1); + +Schema expectedSchema = +Schema.builder() +.addRowField(Join.LHS_TAG, CG_SCHEMA_1) +.addRowField(Join.RHS_TAG, CG_SCHEMA_1) +.build(); + +PCollection joined = pc1.apply(Join.innerJoin(pc2).using("user", "country")); + +
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=227002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227002 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 12/Apr/19 23:02 Start Date: 12/Apr/19 23:02 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #8273: [BEAM-4461] A transform to perform binary joins of PCollections with schemas URL: https://github.com/apache/beam/pull/8273#discussion_r274714548 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Join.java ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import javax.annotation.Nullable; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.Row; + +/** + * A transform that performs equijoins across two schema {@link PCollection}s. + * + * This transform allows joins between two input PCollections simply by specifying the fields to + * join on. The resulting {@code PCollection} will have two fields named "lhs" and "rhs" + * respectively, each with the schema of the corresponding input PCollection. + * + * For example, the following demonstrates joining two PCollections using a natural join on the + * "user" and "country" fields, where both the left-hand and the right-hand PCollections have fields + * with these names. + * + * + * {@code PCollection joined = pCollection1.apply(Join.innerJoin(pCollection2).using("user", "country")); + * } + * + * If the right-hand PCollection contains fields with different names to join against, you can + * specify them as follows: + * + * {@code PCollection joined = pCollection1.apply(Join.innerJoin(pCollection2) + * .on(FieldsEqual.left("user", "country").right("otherUser", "otherCountry"))); + * } + * + * Full outer joins, left outer joins, and right outer joins are also supported. + */ +public class Join { Review comment: Shouldn't we mark this class and CoGroup with `@Experimental(Kind.SCHEMAS)`? I think we may be able to make some improvement on the API design later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 227002) > Create a library of useful transforms that use schemas > -- > > Key: BEAM-4461 > URL: https://issues.apache.org/jira/browse/BEAM-4461 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Labels: triaged > Time Spent: 25.5h > Remaining Estimate: 0h > > e.g. JoinBy(fields). Project, Filter, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=227003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227003 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 12/Apr/19 23:02 Start Date: 12/Apr/19 23:02 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #8273: [BEAM-4461] A transform to perform binary joins of PCollections with schemas URL: https://github.com/apache/beam/pull/8273#discussion_r275078312 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Join.java ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.transforms; + +import javax.annotation.Nullable; +import org.apache.beam.sdk.schemas.FieldAccessDescriptor; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.Row; + +/** + * A transform that performs equijoins across two schema {@link PCollection}s. + * + * This transform allows joins between two input PCollections simply by specifying the fields to + * join on. The resulting {@code PCollection} will have two fields named "lhs" and "rhs" + * respectively, each with the schema of the corresponding input PCollection. + * + * For example, the following demonstrates joining two PCollections using a natural join on the + * "user" and "country" fields, where both the left-hand and the right-hand PCollections have fields + * with these names. + * + * + * {@code PCollection joined = pCollection1.apply(Join.innerJoin(pCollection2).using("user", "country")); + * } + * + * If the right-hand PCollection contains fields with different names to join against, you can + * specify them as follows: + * + * {@code PCollection joined = pCollection1.apply(Join.innerJoin(pCollection2) + * .on(FieldsEqual.left("user", "country").right("otherUser", "otherCountry"))); + * } + * + * Full outer joins, left outer joins, and right outer joins are also supported. + */ +public class Join { + public static final String LHS_TAG = "lhs"; + public static final String RHS_TAG = "rhs"; + + /** Predicate object to specify fields to compare when doing an equi-join. */ + public static class FieldsEqual { +public static Inner left(String... fieldNames) { + return new Inner( + FieldAccessDescriptor.withFieldNames(fieldNames), FieldAccessDescriptor.create()); +} + +public static Inner left(Integer... fieldIds) { + return new Inner( + FieldAccessDescriptor.withFieldIds(fieldIds), FieldAccessDescriptor.create()); +} + +public static Inner left(FieldAccessDescriptor fieldAccessDescriptor) { + return new Inner(fieldAccessDescriptor, FieldAccessDescriptor.create()); +} + +public Inner right(String... fieldNames) { + return new Inner( + FieldAccessDescriptor.create(), FieldAccessDescriptor.withFieldNames(fieldNames)); +} + +public Inner right(Integer... fieldIds) { + return new Inner( + FieldAccessDescriptor.create(), FieldAccessDescriptor.withFieldIds(fieldIds)); +} + +public Inner right(FieldAccessDescriptor fieldAccessDescriptor) { + return new Inner(FieldAccessDescriptor.create(), fieldAccessDescriptor); +} + +/** Implementation class for FieldsEqual. */ +public static class Inner { + private FieldAccessDescriptor lhs; + private FieldAccessDescriptor rhs; + + private Inner(FieldAccessDescriptor lhs, FieldAccessDescriptor rhs) { +this.lhs = lhs; +this.rhs = rhs; + } + + public Inner left(String... fieldNames) { +return new Inner(FieldAccessDescriptor.withFieldNames(fieldNames), rhs); + } + + public Inner left(Integer... fieldIds) { +return new Inner(FieldAccessDescriptor.withFieldIds(fieldIds), rhs); + } + + public Inner left(FieldAccessDescriptor fieldAccessDescriptor) { +
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226995 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 23:00 Start Date: 12/Apr/19 23:00 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482747782 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226995) Time Spent: 2h (was: 1h 50m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226992 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 23:00 Start Date: 12/Apr/19 23:00 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482747618 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226992) Time Spent: 1.5h (was: 1h 20m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226993 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 23:00 Start Date: 12/Apr/19 23:00 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482747669 Run Apex ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226993) Time Spent: 1h 40m (was: 1.5h) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226994 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 23:00 Start Date: 12/Apr/19 23:00 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482747741 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226994) Time Spent: 1h 50m (was: 1h 40m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6669) revert service_default_cmek_config experiment flag
[ https://issues.apache.org/jira/browse/BEAM-6669?focusedWorklogId=226982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226982 ] ASF GitHub Bot logged work on BEAM-6669: Author: ASF GitHub Bot Created on: 12/Apr/19 22:46 Start Date: 12/Apr/19 22:46 Worklog Time Spent: 10m Work Description: udim commented on issue #8296: [BEAM-6669] Set Dataflow KMS key name URL: https://github.com/apache/beam/pull/8296#issuecomment-482745450 run java postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226982) Time Spent: 20m (was: 10m) > revert service_default_cmek_config experiment flag > --- > > Key: BEAM-6669 > URL: https://issues.apache.org/jira/browse/BEAM-6669 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Do this when --dataflowKmsKey is supported on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6669) revert service_default_cmek_config experiment flag
[ https://issues.apache.org/jira/browse/BEAM-6669?focusedWorklogId=226981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226981 ] ASF GitHub Bot logged work on BEAM-6669: Author: ASF GitHub Bot Created on: 12/Apr/19 22:45 Start Date: 12/Apr/19 22:45 Worklog Time Spent: 10m Work Description: udim commented on pull request #8296: [BEAM-6669] Set Dataflow KMS key name URL: https://github.com/apache/beam/pull/8296 Use the official Environment.service_kms_key_name proto field to pass --dataflowKmsKey to the runner. Stop setting `service_default_cmek_config` experimental flag. It doesn't work anyway. The `service_kms_key_name` experimental flag replaced it a while back. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch)
[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=226979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226979 ] ASF GitHub Bot logged work on BEAM-6138: Author: ASF GitHub Bot Created on: 12/Apr/19 22:44 Start Date: 12/Apr/19 22:44 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #8280: [BEAM-6138] Update java SDK to report user distribution tuple metrics over the FN API URL: https://github.com/apache/beam/pull/8280#discussion_r274980022 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricUrns.java ## @@ -31,6 +31,9 @@ public static MetricName parseUrn(String urn) { if (urn.startsWith(MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX)) { urn = urn.substring(MonitoringInfoConstants.Urns.USER_COUNTER_PREFIX.length()); } +if (urn.startsWith(MonitoringInfoConstants.Urns.USER_DISTRIBUTION_PREFIX)) { Review comment: you can invert order and use else if This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226979) Time Spent: 8h 40m (was: 8.5h) > Add User Metric Support to Java SDK > --- > > Key: BEAM-6138 > URL: https://issues.apache.org/jira/browse/BEAM-6138 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Labels: triaged > Time Spent: 8h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=226978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226978 ] ASF GitHub Bot logged work on BEAM-6138: Author: ASF GitHub Bot Created on: 12/Apr/19 22:44 Start Date: 12/Apr/19 22:44 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #8280: [BEAM-6138] Update java SDK to report user distribution tuple metrics over the FN API URL: https://github.com/apache/beam/pull/8280#discussion_r275084090 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java ## @@ -41,7 +41,7 @@ public static final String TOTAL_MSECS = extractUrn(MonitoringInfoSpecs.Enum.TOTAL_MSECS); public static final String USER_COUNTER_PREFIX = extractUrn(MonitoringInfoSpecs.Enum.USER_COUNTER); -public static final String USER_DISTRIBUTION_COUNTER_PREFIX = +public static final String USER_DISTRIBUTION_PREFIX = Review comment: Better keep original name. Otherwise it is inconsistent with USER_COUNTER_PREFIX. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226978) Time Spent: 8.5h (was: 8h 20m) > Add User Metric Support to Java SDK > --- > > Key: BEAM-6138 > URL: https://issues.apache.org/jira/browse/BEAM-6138 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Labels: triaged > Time Spent: 8.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=226980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226980 ] ASF GitHub Bot logged work on BEAM-6138: Author: ASF GitHub Bot Created on: 12/Apr/19 22:44 Start Date: 12/Apr/19 22:44 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #8280: [BEAM-6138] Update java SDK to report user distribution tuple metrics over the FN API URL: https://github.com/apache/beam/pull/8280#discussion_r275084353 ## File path: runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/MetricsContainerImplTest.java ## @@ -178,6 +178,34 @@ public void testMonitoringInfosArePopulatedForUserCounters() { assertThat(actualMonitoringInfos, containsInAnyOrder(builder1.build(), builder2.build())); } + @Test + public void testMonitoringInfosArePopulatedForUserDistributions() { +MetricsContainerImpl testObject = new MetricsContainerImpl("step1"); +DistributionCell c1 = testObject.getDistribution(MetricName.named("ns", "name1")); +DistributionCell c2 = testObject.getDistribution(MetricName.named("ns", "name2")); +c1.update(5L); +c2.update(4L); + +SimpleMonitoringInfoBuilder builder1 = new SimpleMonitoringInfoBuilder(); +builder1.setUrnForUserDistribution("ns", "name1"); +builder1.setInt64DistributionValue(DistributionData.create(5, 1, 5, 5)); +builder1.setPTransformLabel("step1"); +builder1.build(); Review comment: No need to call build() here and next initialization. You do it at L206. Also it is not required since you're ignoring return value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226980) Time Spent: 8h 50m (was: 8h 40m) > Add User Metric Support to Java SDK > --- > > Key: BEAM-6138 > URL: https://issues.apache.org/jira/browse/BEAM-6138 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Alex Amato >Priority: Major > Labels: triaged > Time Spent: 8h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=226974=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226974 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Apr/19 22:38 Start Date: 12/Apr/19 22:38 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #8062: [BEAM-4374] Emit SampledByteCount distribution tuple system metric from Python SDK (@Ardagan helped contribute) URL: https://github.com/apache/beam/pull/8062#discussion_r275083279 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -611,35 +691,99 @@ def process(self, element): result_metrics = res.monitoring_metrics() -def assert_contains_metric(src, urn, pcollection, value): - for item in src: -if item.urn == urn: - if item.labels['PCOLLECTION'] == pcollection: -self.assertEqual(item.metric.counter_data.int64_value, value, - str(("Metric has incorrect value", value, item))) -return - self.fail(str(("Metric not found", urn, pcollection, src))) - counters = result_metrics.monitoring_infos() +# All element count and byte count metrics must have a PCOLLECTION_LABEL. self.assertFalse([x for x in counters if - x.urn == monitoring_infos.ELEMENT_COUNT_URN + x.urn in [monitoring_infos.ELEMENT_COUNT_URN, +monitoring_infos.SAMPLED_BYTE_SIZE_URN] and monitoring_infos.PCOLLECTION_LABEL not in x.labels]) +try: + labels = {'PCOLLECTION' : 'Impulse'} + self.assert_has_counter( + counters, monitoring_infos.ELEMENT_COUNT_URN, labels, 1) -assert_contains_metric(counters, monitoring_infos.ELEMENT_COUNT_URN, - 'Impulse', 1) -assert_contains_metric(counters, monitoring_infos.ELEMENT_COUNT_URN, - 'ref_PCollection_PCollection_1', 2) - -# Skipping other pcollections due to non-deterministic naming for multiple -# outputs. - -assert_contains_metric(counters, monitoring_infos.ELEMENT_COUNT_URN, - 'ref_PCollection_PCollection_5', 8) -assert_contains_metric(counters, monitoring_infos.ELEMENT_COUNT_URN, - 'ref_PCollection_PCollection_6', 8) -assert_contains_metric(counters, monitoring_infos.ELEMENT_COUNT_URN, - 'ref_PCollection_PCollection_7', 2) + # Create/Read, "out" output. + labels = {'PCOLLECTION' : 'ref_PCollection_PCollection_1'} Review comment: use constant for monitoring_infos.PCOLLECTION_LABEL This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226974) Time Spent: 14h 50m (was: 14h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 14h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226957 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 21:58 Start Date: 12/Apr/19 21:58 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#issuecomment-482736066 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226957) Time Spent: 2h 10m (was: 2h) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226953 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 21:45 Start Date: 12/Apr/19 21:45 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#issuecomment-482732938 Thanks for detailed review, PTAL, @aaltay . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226953) Time Spent: 11h 20m (was: 11h 10m) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226952 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 21:44 Start Date: 12/Apr/19 21:44 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275073307 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -738,13 +801,13 @@ def _add_argparse_args(cls, parser): '""} }. All fields in the json are optional except ' 'command.')) parser.add_argument( -'--sdk-worker-parallelism', default=None, +'--sdk_worker_parallelism', default=None, Review comment: This may help clarify argparse behavior: ``` In [1]: import argparse In [2]: parser=argparse.ArgumentParser() In [3]: parser.add_argument('--flag-with-dash') Out[3]: _StoreAction(option_strings=['--flag-with-dash'], dest='flag_with_dash', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None) In [5]: parser.parse_args(['--flag-with-dash=abc']) Out[5]: Namespace(flag_with_dash='abc') In [6]: parser.parse_args(['--flag_with_dash=I am using underscores instead of dashes in flag name!']) usage: ipython [-h] [--flag-with-dash FLAG_WITH_DASH] ipython: error: unrecognized arguments: --flag_with_dash=I am using underscores instead of dashes in flag name! An exception has occurred, use %tb to see the full traceback. In [7]: parser.add_argument('--flag_with_dash') Out[7]: _StoreAction(option_strings=['--flag_with_dash'], dest='flag_with_dash', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None) In [8]: parser.parse_args(['--flag_with_dash=Now I can use both dashes and underscores']) Out[8]: Namespace(flag_with_dash='Now I can use both dashes and underscores') In [9]: parser.parse_args(['--flag-with-dash=Same dest is used for flags with dashes and underscores']) Out[9]: Namespace(flag_with_dash='Same dest is used for flags with dashes and underscores') ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226952) Time Spent: 11h 10m (was: 11h) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226951 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 21:44 Start Date: 12/Apr/19 21:44 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275073307 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -738,13 +801,13 @@ def _add_argparse_args(cls, parser): '""} }. All fields in the json are optional except ' 'command.')) parser.add_argument( -'--sdk-worker-parallelism', default=None, +'--sdk_worker_parallelism', default=None, Review comment: This may help clarify argparse behavior: ``` In [1]: import argparse In [2]: parser=argparse.ArgumentParser() In [3]: parser.add_argument('--flag-with-dash') Out[3]: _StoreAction(option_strings=['--flag-with-dash'], dest='flag_with_dash', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None) In [5]: parser.parse_args(['--flag-with-dash=abc']) Out[5]: Namespace(flag_with_dash='abc') In [6]: parser.parse_args(['--flag_with_dash=I am using underscores instead of dashes in flag name!']) usage: ipython [-h] [--flag-with-dash FLAG_WITH_DASH] ipython: error: unrecognized arguments: --flag_with_dash=I am using underscores instead of dashes in flag name! An exception has occurred, use %tb to see the full traceback. In [7]: parser.add_argument('--flag_with_dash') Out[7]: _StoreAction(option_strings=['--flag_with_dash'], dest='flag_with_dash', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None) In [8]: parser.parse_args(['--flag_with_dash=Now I can use both dashes and underscores']) Out[8]: Namespace(flag_with_dash='Now I can use both dashes and underscores') In [9]: parser.parse_args(['--flag-with-dash=Same store is used for flags with dashes and underscores']) Out[9]: Namespace(flag_with_dash='Same store is used for flags with dashes and underscores') ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226951) Time Spent: 11h (was: 10h 50m) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 11h > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226947 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 21:32 Start Date: 12/Apr/19 21:32 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275070821 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -738,13 +801,13 @@ def _add_argparse_args(cls, parser): '""} }. All fields in the json are optional except ' 'command.')) parser.add_argument( -'--sdk-worker-parallelism', default=None, +'--sdk_worker_parallelism', default=None, Review comment: The two options with dashes were added in https://github.com/apache/beam/pull/8082 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226947) Time Spent: 10h 50m (was: 10h 40m) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226946 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 21:31 Start Date: 12/Apr/19 21:31 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275070618 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -738,13 +801,13 @@ def _add_argparse_args(cls, parser): '""} }. All fields in the json are optional except ' 'command.')) parser.add_argument( -'--sdk-worker-parallelism', default=None, +'--sdk_worker_parallelism', default=None, Review comment: Yes, this would have been a backwards-incompatible change. Luckily, looks like sdk-worker-parallelism didn't make it into 2.12.0: https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/options/pipeline_options.py#L719. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226946) Time Spent: 10h 40m (was: 10.5h) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 10h 40m > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7066) Finish reimaging the remaining half of Beam Jenkins workers
[ https://issues.apache.org/jira/browse/BEAM-7066?focusedWorklogId=226930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226930 ] ASF GitHub Bot logged work on BEAM-7066: Author: ASF GitHub Bot Created on: 12/Apr/19 21:06 Start Date: 12/Apr/19 21:06 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #8292: [BEAM-7066] Disables Py37 tox suite to reduce test flakiness. URL: https://github.com/apache/beam/pull/8292 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226930) Time Spent: 0.5h (was: 20m) > Finish reimaging the remaining half of Beam Jenkins workers > --- > > Key: BEAM-7066 > URL: https://issues.apache.org/jira/browse/BEAM-7066 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: yifan zou >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Hi [~yifanzou] filing this to track the remaining work that you identified in > https://lists.apache.org/thread.html/4fe2ec304529216d082a860b912b090b958d39474c8040e79ade46d2@%3Cdev.beam.apache.org%3E, > since I'll have some changes waiting on the completion of this work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7051) Support LIMIT OFFSET
[ https://issues.apache.org/jira/browse/BEAM-7051?focusedWorklogId=226922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226922 ] ASF GitHub Bot logged work on BEAM-7051: Author: ASF GitHub Bot Created on: 12/Apr/19 20:48 Start Date: 12/Apr/19 20:48 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #8290: [BEAM-7051] Support LIMIT OFFSET URL: https://github.com/apache/beam/pull/8290#discussion_r275059366 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSortRelTest.java ## @@ -212,6 +217,20 @@ public void testOrderBy_nullsLast() throws Exception { pipeline.run().waitUntilFinish(); } + @Test + public void testOrderBy_with_offset2() throws Exception { +Schema schema = Schema.builder().addField("count_star", Schema.FieldType.INT64).build(); + +String sql = +"INSERT INTO COUNT_TABLE(count_star) " ++ "SELECT COUNT(*) FROM (SELECT * FROM " ++ "(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) LIMIT 3 OFFSET 1)"; + +PCollection rows = compilePipeline(sql, pipeline); + PAssert.that(rows).containsInAnyOrder(TestUtils.RowsBuilder.of(schema).addRows(2L).getRows()); Review comment: Yes. 2L here is COUNT(*)'s result, which is two rows. I should have made the test case more clear by using STRING in SELECT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226922) Time Spent: 2.5h (was: 2h 20m) > Support LIMIT OFFSET > > > Key: BEAM-7051 > URL: https://issues.apache.org/jira/browse/BEAM-7051 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7011) Replace "urn:beam:sideinput:materialization:multimap:0.1" with standard side input type materialization
[ https://issues.apache.org/jira/browse/BEAM-7011?focusedWorklogId=226923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226923 ] ASF GitHub Bot logged work on BEAM-7011: Author: ASF GitHub Bot Created on: 12/Apr/19 20:48 Start Date: 12/Apr/19 20:48 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8295: [BEAM-7011] Clean-up additional references to removed URN. URL: https://github.com/apache/beam/pull/8295#issuecomment-482717748 CC: @angoenka This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226923) Time Spent: 3.5h (was: 3h 20m) > Replace "urn:beam:sideinput:materialization:multimap:0.1" with standard side > input type materialization > --- > > Key: BEAM-7011 > URL: https://issues.apache.org/jira/browse/BEAM-7011 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-go, sdk-java-core, sdk-py-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Fix For: 2.13.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Use the StandardSideInput defined within beam_runner_api.proto: > https://github.com/apache/beam/blob/206d98b0765ac662730edd28d669b3db24dd851d/model/pipeline/src/main/proto/beam_runner_api.proto#L311 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7011) Replace "urn:beam:sideinput:materialization:multimap:0.1" with standard side input type materialization
[ https://issues.apache.org/jira/browse/BEAM-7011?focusedWorklogId=226921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226921 ] ASF GitHub Bot logged work on BEAM-7011: Author: ASF GitHub Bot Created on: 12/Apr/19 20:48 Start Date: 12/Apr/19 20:48 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8295: [BEAM-7011] Clean-up additional references to removed URN. URL: https://github.com/apache/beam/pull/8295#issuecomment-482717696 R: @aljoscha @tweise This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226921) Time Spent: 3h 20m (was: 3h 10m) > Replace "urn:beam:sideinput:materialization:multimap:0.1" with standard side > input type materialization > --- > > Key: BEAM-7011 > URL: https://issues.apache.org/jira/browse/BEAM-7011 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-go, sdk-java-core, sdk-py-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Fix For: 2.13.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Use the StandardSideInput defined within beam_runner_api.proto: > https://github.com/apache/beam/blob/206d98b0765ac662730edd28d669b3db24dd851d/model/pipeline/src/main/proto/beam_runner_api.proto#L311 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7011) Replace "urn:beam:sideinput:materialization:multimap:0.1" with standard side input type materialization
[ https://issues.apache.org/jira/browse/BEAM-7011?focusedWorklogId=226920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226920 ] ASF GitHub Bot logged work on BEAM-7011: Author: ASF GitHub Bot Created on: 12/Apr/19 20:48 Start Date: 12/Apr/19 20:48 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8295: [BEAM-7011] Clean-up additional references to removed URN. URL: https://github.com/apache/beam/pull/8295 This cleans up a few places which were missed in #8232 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) --- |Java | Python | Go | Website --- | --- | --- | --- | --- Non-portable | [![Build
[jira] [Commented] (BEAM-7058) Python SDK metric process_bundle_msecs reported as zero
[ https://issues.apache.org/jira/browse/BEAM-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816628#comment-16816628 ] Thomas Weise commented on BEAM-7058: The test that I pointed to in the email thread should be sufficient to reproduce by adding a sleep time into the user code: [https://github.com/apache/beam/blob/d38645ae8758d834c3e819b715a66dd82c78f6d4/sdks/python/apache_beam/runners/portability/flink_runner_test.py#L181] It would be nice to assert that the metrics are indeed reported then. In the Flink runner bundles in streaming mode are by default 1000 elements or 1s. > Python SDK metric process_bundle_msecs reported as zero > --- > > Key: BEAM-7058 > URL: https://issues.apache.org/jira/browse/BEAM-7058 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Thomas Weise >Assignee: Alex Amato >Priority: Major > Labels: portability-flink > > With the portable Flink runner, the metric is reported as 0, while the count > metric works as expected. > [https://lists.apache.org/thread.html/25eec8104bda6e4c71cc6c5e9864c335833c3ae2afe225d372479f30@%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7051) Support LIMIT OFFSET
[ https://issues.apache.org/jira/browse/BEAM-7051?focusedWorklogId=226912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226912 ] ASF GitHub Bot logged work on BEAM-7051: Author: ASF GitHub Bot Created on: 12/Apr/19 20:34 Start Date: 12/Apr/19 20:34 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #8290: [BEAM-7051] Support LIMIT OFFSET URL: https://github.com/apache/beam/pull/8290#issuecomment-482713554 Thanks for valuable suggestions on my mess code. Comments are addressed now. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226912) Time Spent: 2h 20m (was: 2h 10m) > Support LIMIT OFFSET > > > Key: BEAM-7051 > URL: https://issues.apache.org/jira/browse/BEAM-7051 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7058) Python SDK metric process_bundle_msecs reported as zero
[ https://issues.apache.org/jira/browse/BEAM-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816622#comment-16816622 ] Alex Amato commented on BEAM-7058: -- We have an end to end integration test successfully collecting these metrics in Dataflow python and several unit tests, These tests do force some sleep times though. So the the python SDK is emitting the metrics in that case. One theory is that small bundles may not trigger the state sampler code properly. Or the particular test is too small, and executes too fast to exercise the sampling code at all (and it really isn't running much). We should test this on a test with a high element count. If that's the case, the state sampler should be setup to trigger intervals periodically, and not reset the interval on new bundles. It could be related to the behaviour of this specific test. We would need someone to debug this test, is there a repro? > Python SDK metric process_bundle_msecs reported as zero > --- > > Key: BEAM-7058 > URL: https://issues.apache.org/jira/browse/BEAM-7058 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Thomas Weise >Assignee: Alex Amato >Priority: Major > Labels: portability-flink > > With the portable Flink runner, the metric is reported as 0, while the count > metric works as expected. > [https://lists.apache.org/thread.html/25eec8104bda6e4c71cc6c5e9864c335833c3ae2afe225d372479f30@%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226911 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 20:29 Start Date: 12/Apr/19 20:29 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482712066 Run Java Flink PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226911) Time Spent: 1h 20m (was: 1h 10m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226910 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 20:28 Start Date: 12/Apr/19 20:28 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482711924 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226910) Time Spent: 1h 10m (was: 1h) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226908 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 20:28 Start Date: 12/Apr/19 20:28 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482711837 Run Java PortabilityApi PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226908) Time Spent: 1h (was: 50m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226907 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 20:27 Start Date: 12/Apr/19 20:27 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482711609 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226907) Time Spent: 50m (was: 40m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226906 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 20:26 Start Date: 12/Apr/19 20:26 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482711478 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226906) Time Spent: 40m (was: 0.5h) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226903 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 20:23 Start Date: 12/Apr/19 20:23 Worklog Time Spent: 10m Work Description: aaltay commented on issue #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#issuecomment-482710445 Left one last question related to backward compatibility of 'sdk-worker-parallelism' change. Otherwise this change looks good and we do not need to block on open questions. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226903) Time Spent: 10.5h (was: 10h 20m) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 10.5h > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226902 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 20:22 Start Date: 12/Apr/19 20:22 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275051771 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -738,13 +801,13 @@ def _add_argparse_args(cls, parser): '""} }. All fields in the json are optional except ' 'command.')) parser.add_argument( -'--sdk-worker-parallelism', default=None, +'--sdk_worker_parallelism', default=None, Review comment: This is a bit confusing. Would a user's pipeline break if they have been using the 'sdk-worker-parallelism' version in command line or by directly passing to pipeline options in kwargs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226902) Time Spent: 10h 20m (was: 10h 10m) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 10h 20m > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226900 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 20:21 Start Date: 12/Apr/19 20:21 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275051421 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -246,7 +276,40 @@ def display_data(self): return self.get_all_options(True) def view_as(self, cls): +"""Returns a view of current object as provided PipelineOption subclass. + +Example Usage:: + + options = PipelineOptions(['--runner', 'Direct', '--streaming']) + standard_options = options.view_as(StandardOptions) + if standard_options.streaming: +# ... start a streaming job ... + +Note that options objects may have multiple views, and modifications +of values in any view-object will apply to current object and other +view-objects. + +Args: + cls: PipelineOptions class or any of its subclasses. + +Returns: + An instance of cls that is intitialized using options contained in current + object. + +""" view = cls(self._flags) +for option_name in view._visible_option_list(): + # Initialize values of keys defined by a cls. + # + # Note that we do initialization only once per key to make sure that + # values in _all_options dict are not-recreated with each new view. + # This is important to make sure that values of multi-options keys are + # backed by the same list across multiple views, and that any overrides of + # pipeline options already stored in _all_options are preserved. + if option_name not in self._all_options: +self._all_options[option_name] = getattr(view._visible_options, + option_name) +# Note that views will still store _all_options of the source object. Review comment: Great, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226900) Time Spent: 10h 10m (was: 10h) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 10h 10m > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226899 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 20:21 Start Date: 12/Apr/19 20:21 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275051339 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -150,28 +160,48 @@ def __init__(self, flags=None, **kwargs): arguments and then parse the command line specified by flags or by default the one obtained from sys.argv. -The subclasses are not expected to require a redefinition of __init__. +The subclasses of PipelineOptions should not redefine __init__. Review comment: New versions looks good, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226899) Time Spent: 10h (was: 9h 50m) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 10h > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6919) [beam_Release_NightlySnapshot] Cannot publish artifacts to Apache Maven repo
[ https://issues.apache.org/jira/browse/BEAM-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816616#comment-16816616 ] yifan zou commented on BEAM-6919: - Our beam* nodes got updated Thu Jan 24 13:50:03 2019 +. This change specifically removed the nexus credentials from the filesystem because they are not supposed to be on external build nodes (nodes Infra does not manage.) Working with Infra to have a new role account on the JNLP nodes. > [beam_Release_NightlySnapshot] Cannot publish artifacts to Apache Maven repo > > > Key: BEAM-6919 > URL: https://issues.apache.org/jira/browse/BEAM-6919 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Assignee: yifan zou >Priority: Blocker > Labels: currently-failing, triaged > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_Release_NightlySnapshot/384/] > * [Gradle Build Scan|https://scans.gradle.com/s/osuzhgqygga2o] > Initial investigation: > Nightly snapshots are failing in beam:publish due to being unable to publish > artifacts to Apache's Maven repo: > {noformat} > 12:30:57 * What went wrong: > 12:30:57 Execution failed for task > ':beam-examples-java:publishMavenJavaPublicationToMavenRepository'. > 12:30:57 > Failed to publish publication 'mavenJava' to repository 'maven' > 12:30:57> Could not write to resource > 'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-examples-java/2.12.0-SNAPSHOT/beam-examples-java-2.12.0-20190326.191838-10.jar'. > 12:30:57 > Could not PUT > 'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-examples-java/2.12.0-SNAPSHOT/beam-examples-java-2.12.0-20190326.191838-10.jar'. > Received status code 401 from server: Unauthorized > {noformat} > This happens for many modules, not just beam-examples-java as in the example > above. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6942) Pipeline options to experiment propagation is not working in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-6942?focusedWorklogId=226896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226896 ] ASF GitHub Bot logged work on BEAM-6942: Author: ASF GitHub Bot Created on: 12/Apr/19 20:17 Start Date: 12/Apr/19 20:17 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8225: [BEAM-6942] Make modifications to pipeline options to be visible to all views. URL: https://github.com/apache/beam/pull/8225#discussion_r275050282 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -70,9 +70,9 @@ class TemplateUserOptions(PipelineOptions): @classmethod def _add_argparse_args(cls, parser): -parser.add_value_provider_argument('--vp-arg1', default='start') -parser.add_value_provider_argument('--vp-arg2') -parser.add_argument('--non-vp-arg') +parser.add_value_provider_argument('--vp_arg1', default='start') Review comment: A warning would be good in that case. (We can resolve this comment.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226896) Time Spent: 9h 50m (was: 9h 40m) > Pipeline options to experiment propagation is not working in Dataflow runner. > - > > Key: BEAM-6942 > URL: https://issues.apache.org/jira/browse/BEAM-6942 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 9h 50m > Remaining Estimate: 0h > > Relevant code: > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L356-L388] > 3 experiments/options are affected. We need to fix it in 2.12.0 > cc: [~altay] [~apilloud] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5120) Support Complex type in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang reassigned BEAM-5120: -- Assignee: (was: Rui Wang) > Support Complex type in BeamSQL > --- > > Key: BEAM-5120 > URL: https://issues.apache.org/jira/browse/BEAM-5120 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Labels: triaged > > We want have a smooth experience of complex type in BeamSQL. > > For example, BeamSQL could support nested row in arbitrary levels > (Row>>) in both read and write from/to arbitrary sources. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5505) Disable Row flattening in Apache Calcite
[ https://issues.apache.org/jira/browse/BEAM-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang reassigned BEAM-5505: -- Assignee: (was: Rui Wang) > Disable Row flattening in Apache Calcite > > > Key: BEAM-5505 > URL: https://issues.apache.org/jira/browse/BEAM-5505 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Labels: triaged > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-7049) BeamUnionRel should work on mutiple input
[ https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang reassigned BEAM-7049: -- Assignee: (was: Rui Wang) > BeamUnionRel should work on mutiple input > -- > > Key: BEAM-7049 > URL: https://issues.apache.org/jira/browse/BEAM-7049 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` > will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If > BeamUnionRel can handle multiple shuffles, we will have only one shuffle -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3061) BigtableIO should support emitting a sentinel "done" value when a bundle completes
[ https://issues.apache.org/jira/browse/BEAM-3061?focusedWorklogId=226876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226876 ] ASF GitHub Bot logged work on BEAM-3061: Author: ASF GitHub Bot Created on: 12/Apr/19 19:36 Start Date: 12/Apr/19 19:36 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #7805: [BEAM-3061] Done notifications for BigtableIO.Write URL: https://github.com/apache/beam/pull/7805#issuecomment-482697247 I think @chamikaramj is probably best for review, not me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226876) Time Spent: 1h 10m (was: 1h) > BigtableIO should support emitting a sentinel "done" value when a bundle > completes > -- > > Key: BEAM-3061 > URL: https://issues.apache.org/jira/browse/BEAM-3061 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Steve Niemitz >Assignee: Steve Niemitz >Priority: Major > Labels: triaged > Time Spent: 1h 10m > Remaining Estimate: 0h > > There was some discussion of this on the dev@ mailing list [1]. This > approach was taken based on discussion there. > [1] > https://lists.apache.org/thread.html/949b33782f722a9000c9bf9e37042739c6fd0927589b99752b78d7bd@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6957) Spark Portable Runner: Support metrics
[ https://issues.apache.org/jira/browse/BEAM-6957?focusedWorklogId=226875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226875 ] ASF GitHub Bot logged work on BEAM-6957: Author: ASF GitHub Bot Created on: 12/Apr/19 19:32 Start Date: 12/Apr/19 19:32 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #8294: [BEAM-6957] Spark portable runner: support metrics URL: https://github.com/apache/beam/pull/8294 This was quite easy to achieve by plugging existing classes. Note that this continues to use Spark's deprecated Accumulator, not AccumulatorV2. However, when we do get around to upgrading the [underlying classes](https://github.com/apache/beam/blob/247e93ef7fbe87dbb1cd82a0ce6b2e6aa5254f7c/runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/MetricsAccumulator.java#L48), it shouldn't affect this code much. R: @angoenka @iemejia Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) --- |Java | Python | Go | Website --- | --- | --- | --- | --- Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build
[jira] [Work logged] (BEAM-5806) Allow to change the PubsubClientFactory when using PubsubIO
[ https://issues.apache.org/jira/browse/BEAM-5806?focusedWorklogId=226873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226873 ] ASF GitHub Bot logged work on BEAM-5806: Author: ASF GitHub Bot Created on: 12/Apr/19 19:28 Start Date: 12/Apr/19 19:28 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #6769: WIP: [BEAM-5806] Update PubsubIO to be able to change the PubsubClientFactory URL: https://github.com/apache/beam/pull/6769#issuecomment-482695037 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226873) Time Spent: 2h 10m (was: 2h) Remaining Estimate: 21h 50m (was: 22h) > Allow to change the PubsubClientFactory when using PubsubIO > --- > > Key: BEAM-5806 > URL: https://issues.apache.org/jira/browse/BEAM-5806 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Logan HAUSPIE >Priority: Minor > Original Estimate: 24h > Time Spent: 2h 10m > Remaining Estimate: 21h 50m > > By using the PubsubIO to read from or write to Pub/Sub we are obliged to use > the PubsubJsonClient to interact with the Pub/Sub API. > This PubsubJsonClient encode the message in base 64 and increase the size of > this one by 30% and there is no way to change the PubsubClient used by > PubsubIO. > > What I suggest is to allow developper to change the PubsubClientFactory by > specifying it at the definition-time like the following: > {{^PubsubIO.Read read = PubsubIO.readStrings() > .fromTopic(StaticValueProvider.of(topic))^}} > .withTimestampAttribute("myTimestamp")^}} > .withIdAttribute("myId")^}} > *.withClientFactory(PubsubGrpcClient.FACTORY)*;^}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6943) DataInputOperation object has no attribute 'needs_finalization' in Jupyter
[ https://issues.apache.org/jira/browse/BEAM-6943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-6943: -- Affects Version/s: (was: 2.12.0) > DataInputOperation object has no attribute 'needs_finalization' in Jupyter > --- > > Key: BEAM-6943 > URL: https://issues.apache.org/jira/browse/BEAM-6943 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Blocker > Fix For: 2.12.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226868 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 19:22 Start Date: 12/Apr/19 19:22 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#issuecomment-482693285 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226868) Time Spent: 2h (was: 1h 50m) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 2h > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226867 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 19:21 Start Date: 12/Apr/19 19:21 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#discussion_r275031566 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroUtils.java ## @@ -73,8 +69,19 @@ public static Object convertAvroFormat(Field beamField, Object value) { default: throw new RuntimeException("Does not support converting unknown type value"); } + } -return ret; + private static ReadableInstant safeToMillis(Object value) { +long subMilliPrecision = ((long) value) % 1000; Review comment: I was also wondering where long is generated. If BQ timestamp converts to long which contains micros, will that lose precision? I did some math: BigQuery doc says TIMESTAMP type range is `0001-01-01 00:00:00 to -12-31 23:59:59.99 UTC` so it should ranges from `-6.127488×10¹⁶ to 2.4976512×10¹⁷` in micros, calculated by `([0001-01-01 | -12-31] - 1970-01-01) * 12 * 30 * 24 *3600 * 1000 * 1000`, which falls into INT64's range. So no precision loss. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226867) Time Spent: 1h 50m (was: 1h 40m) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226863 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 19:13 Start Date: 12/Apr/19 19:13 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#discussion_r275031566 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroUtils.java ## @@ -73,8 +69,19 @@ public static Object convertAvroFormat(Field beamField, Object value) { default: throw new RuntimeException("Does not support converting unknown type value"); } + } -return ret; + private static ReadableInstant safeToMillis(Object value) { +long subMilliPrecision = ((long) value) % 1000; Review comment: I was also wondering where long is generated. If BQ timestamp converts to long which contains micros, will that lose precision? I did some math: BigQuery doc says TIMESTAMP type range is `0001-01-01 00:00:00 to -12-31 23:59:59.99 UTC` so it should ranges from `-6.127488×10¹⁶ to 2.4976512×10¹⁷` in micros, calculated by `([0001-01-01 | -12-31] - 1970) * 12 * 30 * 24 *3600 * 1000 * 1000`, which falls into INT64's range. So no precision loss. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226863) Time Spent: 1h 40m (was: 1.5h) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226862 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 19:13 Start Date: 12/Apr/19 19:13 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#discussion_r275031566 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroUtils.java ## @@ -73,8 +69,19 @@ public static Object convertAvroFormat(Field beamField, Object value) { default: throw new RuntimeException("Does not support converting unknown type value"); } + } -return ret; + private static ReadableInstant safeToMillis(Object value) { +long subMilliPrecision = ((long) value) % 1000; Review comment: I was also wondering where long is generated. If BQ timestamp converts to long which contains micros, will that lose precision? I did some math: BigQuery doc says TIMESTAMP type range is `0001-01-01 00:00:00 to -12-31 23:59:59.99 UTC` so it should ranges from `-6.127488×10¹⁶ to 2.4976512×10¹⁷` in micros, calculated by `((0001-01-01 / -12-31) - 1970) * 12 * 30 * 24 *3600 * 1000 * 1000`, which falls into INT64's range. So no precision loss. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226862) Time Spent: 1.5h (was: 1h 20m) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226861 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 19:12 Start Date: 12/Apr/19 19:12 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#discussion_r275031566 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroUtils.java ## @@ -73,8 +69,19 @@ public static Object convertAvroFormat(Field beamField, Object value) { default: throw new RuntimeException("Does not support converting unknown type value"); } + } -return ret; + private static ReadableInstant safeToMillis(Object value) { +long subMilliPrecision = ((long) value) % 1000; Review comment: I was also wondering where long is generated. If BQ timestamp converts to long which contains micros, will that lose precision? I did some math: BigQuery doc says TIMESTAMP type range is `0001-01-01 00:00:00 to -12-31 23:59:59.99 UTC` so it should ranges from `-6.127488×10¹⁶ to 2.4976512×10¹⁷` in micros, calculated by `((0001-01-01 / -12-31) -/+ 1970) * 12 * 30 * 24 *3600 * 1000 * 1000`, which falls into INT64's range. So no precision loss. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226861) Time Spent: 1h 20m (was: 1h 10m) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7039) Spark portable runner: run validatesRunner tests
[ https://issues.apache.org/jira/browse/BEAM-7039?focusedWorklogId=226858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226858 ] ASF GitHub Bot logged work on BEAM-7039: Author: ASF GitHub Bot Created on: 12/Apr/19 19:06 Start Date: 12/Apr/19 19:06 Worklog Time Spent: 10m Work Description: ibzib commented on issue #8285: [BEAM-7039] set up validatesPortableRunner tests for Spark URL: https://github.com/apache/beam/pull/8285#issuecomment-482689060 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226858) Time Spent: 1h 40m (was: 1.5h) > Spark portable runner: run validatesRunner tests > > > Key: BEAM-7039 > URL: https://issues.apache.org/jira/browse/BEAM-7039 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226857 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 19:04 Start Date: 12/Apr/19 19:04 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#discussion_r275029235 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroUtils.java ## @@ -73,8 +69,19 @@ public static Object convertAvroFormat(Field beamField, Object value) { default: throw new RuntimeException("Does not support converting unknown type value"); } + } -return ret; + private static ReadableInstant safeToMillis(Object value) { +long subMilliPrecision = ((long) value) % 1000; Review comment: Yes. I was asking because input is an object. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226857) Time Spent: 1h 10m (was: 1h) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226855 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 19:00 Start Date: 12/Apr/19 19:00 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#discussion_r275027986 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroUtils.java ## @@ -73,8 +69,19 @@ public static Object convertAvroFormat(Field beamField, Object value) { default: throw new RuntimeException("Does not support converting unknown type value"); } + } -return ret; + private static ReadableInstant safeToMillis(Object value) { +long subMilliPrecision = ((long) value) % 1000; Review comment: It is because this `AvroUtils` is part of the BigQuery connector, not general. And we know BigQuery is always micros. Is that your question? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226855) Time Spent: 1h (was: 50m) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 1h > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6493) examples in Kotlin
[ https://issues.apache.org/jira/browse/BEAM-6493?focusedWorklogId=226851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226851 ] ASF GitHub Bot logged work on BEAM-6493: Author: ASF GitHub Bot Created on: 12/Apr/19 18:50 Start Date: 12/Apr/19 18:50 Worklog Time Spent: 10m Work Description: harshithdwivedi commented on issue #8291: [BEAM-6493] Convert the WordCount samples to Kotlin URL: https://github.com/apache/beam/pull/8291#issuecomment-482684189 Certainly, that would be great! I'll ping you on slack to discuss more on this, LMK if that sounds good? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226851) Time Spent: 10h 10m (was: 10h) Remaining Estimate: 494h 50m (was: 495h) > examples in Kotlin > -- > > Key: BEAM-6493 > URL: https://issues.apache.org/jira/browse/BEAM-6493 > Project: Beam > Issue Type: Task > Components: examples-java >Affects Versions: Not applicable >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Minor > Labels: documentation, triaged > Fix For: Not applicable > > Original Estimate: 504h > Time Spent: 10h 10m > Remaining Estimate: 494h 50m > > I have been using Apache Beam for few of my projects in production since the > past 6 months and apart from Java, [Kotlin|https://kotlinlang.org/] also > seems to work as well with no issues whatsoever. > But currently, the Github Repository of Apache Beam contains examples only in > Java which might be an issue for other developers who want to use Apache Beam > SDK with kotlin as there are no sample resources available. > That said, I would love to go ahead and add kotlin examples alongside the > current java examples in the [Beam > repository|https://github.com/apache/beam/tree/master/examples/java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6493) examples in Kotlin
[ https://issues.apache.org/jira/browse/BEAM-6493?focusedWorklogId=226848=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226848 ] ASF GitHub Bot logged work on BEAM-6493: Author: ASF GitHub Bot Created on: 12/Apr/19 18:46 Start Date: 12/Apr/19 18:46 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8291: [BEAM-6493] Convert the WordCount samples to Kotlin URL: https://github.com/apache/beam/pull/8291 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226848) Time Spent: 9h 50m (was: 9h 40m) Remaining Estimate: 495h 10m (was: 495h 20m) > examples in Kotlin > -- > > Key: BEAM-6493 > URL: https://issues.apache.org/jira/browse/BEAM-6493 > Project: Beam > Issue Type: Task > Components: examples-java >Affects Versions: Not applicable >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Minor > Labels: documentation, triaged > Fix For: Not applicable > > Original Estimate: 504h > Time Spent: 9h 50m > Remaining Estimate: 495h 10m > > I have been using Apache Beam for few of my projects in production since the > past 6 months and apart from Java, [Kotlin|https://kotlinlang.org/] also > seems to work as well with no issues whatsoever. > But currently, the Github Repository of Apache Beam contains examples only in > Java which might be an issue for other developers who want to use Apache Beam > SDK with kotlin as there are no sample resources available. > That said, I would love to go ahead and add kotlin examples alongside the > current java examples in the [Beam > repository|https://github.com/apache/beam/tree/master/examples/java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6493) examples in Kotlin
[ https://issues.apache.org/jira/browse/BEAM-6493?focusedWorklogId=226849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226849 ] ASF GitHub Bot logged work on BEAM-6493: Author: ASF GitHub Bot Created on: 12/Apr/19 18:46 Start Date: 12/Apr/19 18:46 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8291: [BEAM-6493] Convert the WordCount samples to Kotlin URL: https://github.com/apache/beam/pull/8291#issuecomment-482682889 Done. Thanks so much for your contribution and your patience. if you feel up to it, we can add a little blog post to let users check it out : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226849) Time Spent: 10h (was: 9h 50m) Remaining Estimate: 495h (was: 495h 10m) > examples in Kotlin > -- > > Key: BEAM-6493 > URL: https://issues.apache.org/jira/browse/BEAM-6493 > Project: Beam > Issue Type: Task > Components: examples-java >Affects Versions: Not applicable >Reporter: Harshit Dwivedi >Assignee: Harshit Dwivedi >Priority: Minor > Labels: documentation, triaged > Fix For: Not applicable > > Original Estimate: 504h > Time Spent: 10h > Remaining Estimate: 495h > > I have been using Apache Beam for few of my projects in production since the > past 6 months and apart from Java, [Kotlin|https://kotlinlang.org/] also > seems to work as well with no issues whatsoever. > But currently, the Github Repository of Apache Beam contains examples only in > Java which might be an issue for other developers who want to use Apache Beam > SDK with kotlin as there are no sample resources available. > That said, I would love to go ahead and add kotlin examples alongside the > current java examples in the [Beam > repository|https://github.com/apache/beam/tree/master/examples/java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=226842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226842 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 12/Apr/19 18:38 Start Date: 12/Apr/19 18:38 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194#issuecomment-482680290 Run JavaPortabilityApi PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226842) Time Spent: 0.5h (was: 20m) > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-7058) Python SDK metric process_bundle_msecs reported as zero
[ https://issues.apache.org/jira/browse/BEAM-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada reassigned BEAM-7058: --- Assignee: Pablo Estrada > Python SDK metric process_bundle_msecs reported as zero > --- > > Key: BEAM-7058 > URL: https://issues.apache.org/jira/browse/BEAM-7058 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Thomas Weise >Assignee: Pablo Estrada >Priority: Major > Labels: portability-flink > > With the portable Flink runner, the metric is reported as 0, while the count > metric works as expected. > [https://lists.apache.org/thread.html/25eec8104bda6e4c71cc6c5e9864c335833c3ae2afe225d372479f30@%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-7058) Python SDK metric process_bundle_msecs reported as zero
[ https://issues.apache.org/jira/browse/BEAM-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada reassigned BEAM-7058: --- Assignee: Alex Amato (was: Pablo Estrada) > Python SDK metric process_bundle_msecs reported as zero > --- > > Key: BEAM-7058 > URL: https://issues.apache.org/jira/browse/BEAM-7058 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-py-harness >Reporter: Thomas Weise >Assignee: Alex Amato >Priority: Major > Labels: portability-flink > > With the portable Flink runner, the metric is reported as 0, while the count > metric works as expected. > [https://lists.apache.org/thread.html/25eec8104bda6e4c71cc6c5e9864c335833c3ae2afe225d372479f30@%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7039) Spark portable runner: run validatesRunner tests
[ https://issues.apache.org/jira/browse/BEAM-7039?focusedWorklogId=226840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226840 ] ASF GitHub Bot logged work on BEAM-7039: Author: ASF GitHub Bot Created on: 12/Apr/19 18:35 Start Date: 12/Apr/19 18:35 Worklog Time Spent: 10m Work Description: ibzib commented on issue #8285: [BEAM-7039] set up validatesPortableRunner tests for Spark URL: https://github.com/apache/beam/pull/8285#issuecomment-482679460 Run Website PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226840) Time Spent: 1.5h (was: 1h 20m) > Spark portable runner: run validatesRunner tests > > > Key: BEAM-7039 > URL: https://issues.apache.org/jira/browse/BEAM-7039 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort
[ https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=226839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226839 ] ASF GitHub Bot logged work on BEAM-5519: Author: ASF GitHub Bot Created on: 12/Apr/19 18:33 Start Date: 12/Apr/19 18:33 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] Remove call to groupByKey in Spark Streaming. URL: https://github.com/apache/beam/pull/6511#issuecomment-482667518 I have spent a little bit of time trying to understand the Nexmark performance tests. I believe I have narrowed down the issue a little bit. The Nexmark BoundedEventSource splits based on the numEventsGenerator (which is 100). Before this PR, the first time there is a GroupByKey in these queries the RDD would be partitioned to the default parallelism (not 100). With this PR, the first time there is a GroupByKey in these queries the RDD would stay partitioned at 100. I believe this is what is affecting the performance. I would say we could accept this degradation in performance because in real use when we ask a Source to split itself we expect it will do so in the way we asked. Unless there are a large number of sources that tend not to split well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226839) Time Spent: 4h 40m (was: 4.5h) > Spark Streaming Duplicated Encoding/Decoding Effort > --- > > Key: BEAM-5519 > URL: https://issues.apache.org/jira/browse/BEAM-5519 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Labels: spark, spark-streaming, triaged > Fix For: 2.13.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > When using the SparkRunner in streaming mode. There is a call to groupByKey > followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this > used to cause 2 shuffles but it still causes 2 encode/decode cycles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226838 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 18:31 Start Date: 12/Apr/19 18:31 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#issuecomment-482677983 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226838) Time Spent: 50m (was: 40m) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226837 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 18:29 Start Date: 12/Apr/19 18:29 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#discussion_r275017781 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroUtils.java ## @@ -73,8 +69,19 @@ public static Object convertAvroFormat(Field beamField, Object value) { default: throw new RuntimeException("Does not support converting unknown type value"); } + } -return ret; + private static ReadableInstant safeToMillis(Object value) { +long subMilliPrecision = ((long) value) % 1000; Review comment: How do we know that that the value returned is not millis but micros? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226837) Time Spent: 40m (was: 0.5h) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7064) Conversion of timestamp from BigQuery row to Beam row loses precision
[ https://issues.apache.org/jira/browse/BEAM-7064?focusedWorklogId=226836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226836 ] ASF GitHub Bot logged work on BEAM-7064: Author: ASF GitHub Bot Created on: 12/Apr/19 18:28 Start Date: 12/Apr/19 18:28 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #8289: [BEAM-7064] Reject BigQuery data with sub-millisecond precision, instead of losing data URL: https://github.com/apache/beam/pull/8289#discussion_r275017781 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AvroUtils.java ## @@ -73,8 +69,19 @@ public static Object convertAvroFormat(Field beamField, Object value) { default: throw new RuntimeException("Does not support converting unknown type value"); } + } -return ret; + private static ReadableInstant safeToMillis(Object value) { +long subMilliPrecision = ((long) value) % 1000; Review comment: How do we know that that the value returned is not millis but micro? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226836) Time Spent: 0.5h (was: 20m) > Conversion of timestamp from BigQuery row to Beam row loses precision > - > > Key: BEAM-7064 > URL: https://issues.apache.org/jira/browse/BEAM-7064 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, the utilities to convert from BigQuery row to Beam row simply > truncate timestamps at millisecond precision. This is unacceptable. Instead, > an error should be raised indicating that it is not supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6865) Refactor common portable runner infrastructure for better reuse
[ https://issues.apache.org/jira/browse/BEAM-6865?focusedWorklogId=226834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226834 ] ASF GitHub Bot logged work on BEAM-6865: Author: ASF GitHub Bot Created on: 12/Apr/19 18:23 Start Date: 12/Apr/19 18:23 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #8176: [BEAM-6865] Move MetricsApi updates from flink.metrics to core.metrics URL: https://github.com/apache/beam/pull/8176 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226834) Time Spent: 3h 10m (was: 3h) > Refactor common portable runner infrastructure for better reuse > --- > > Key: BEAM-6865 > URL: https://issues.apache.org/jira/browse/BEAM-6865 > Project: Beam > Issue Type: Improvement > Components: runner-flink, runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > The Flink runner is currently Beam's most mature portable OSS runner. Much of > the Flink portable runner's implementation details are not unique to Flink, > and yet are confined to the Flink runner code. In order to ease development > on other portable runners such as the Spark runner, this reusable code should > be moved into some common location. > I've set this up to track my progress on these ongoing improvements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7039) Spark portable runner: run validatesRunner tests
[ https://issues.apache.org/jira/browse/BEAM-7039?focusedWorklogId=226831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226831 ] ASF GitHub Bot logged work on BEAM-7039: Author: ASF GitHub Bot Created on: 12/Apr/19 18:04 Start Date: 12/Apr/19 18:04 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #8285: [BEAM-7039] set up validatesPortableRunner tests for Spark URL: https://github.com/apache/beam/pull/8285#discussion_r275010043 ## File path: runners/spark/job-server/build.gradle ## @@ -0,0 +1,122 @@ +import org.apache.beam.gradle.BeamModulePlugin + +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Spark Runner JobServer build file + */ + +apply plugin: 'org.apache.beam.module' +apply plugin: 'application' +// we need to set mainClassName before applying shadow plugin +mainClassName = "org.apache.beam.runners.spark.SparkJobServerDriver" + +applyJavaNature( + validateShadowJar: false, + exportJavadoc: false, + shadowClosure: { +append "reference.conf" + }, +) + +def sparkRunnerProject = ":${project.name.replace("-job-server", "")}" + +description = project(sparkRunnerProject).description + " :: Job Server" + +configurations { + validatesPortableRunner +} + +configurations.all { + exclude group: "org.slf4j", module: "slf4j-jdk14" +} + +dependencies { + compile project(path: sparkRunnerProject, configuration: "shadow") + compile project(path: sparkRunnerProject, configuration: "provided") + validatesPortableRunner project(path: sparkRunnerProject, configuration: "shadowTest") + validatesPortableRunner project(path: sparkRunnerProject, configuration: "provided") + validatesPortableRunner project(path: ":beam-sdks-java-core", configuration: "shadowTest") + validatesPortableRunner project(path: ":beam-runners-core-java", configuration: "shadowTest") + validatesPortableRunner project(path: ":beam-runners-reference-java", configuration: "shadowTest") + compile project(path: ":beam-sdks-java-extensions-google-cloud-platform-core", configuration: "shadow") +// TODO: Enable AWS and HDFS file system. +} + +// NOTE: runShadow must be used in order to run the job server. The standard run +// task will not work because the Spark runner classes only exist in the shadow +// jar. +runShadow { + args = [] + if (project.hasProperty('jobHost')) +args += ["--job-host=${project.property('jobHost')}"] + if (project.hasProperty('artifactsDir')) +args += ["--artifacts-dir=${project.property('artifactsDir')}"] + if (project.hasProperty('cleanArtifactsPerJob')) +args += ["--clean-artifacts-per-job"] Review comment: I made a PR to fix this for Flink as well. #8293 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226831) Time Spent: 1h 20m (was: 1h 10m) > Spark portable runner: run validatesRunner tests > > > Key: BEAM-7039 > URL: https://issues.apache.org/jira/browse/BEAM-7039 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7067) Flink job server: can't disable cleanArtifactsPerJob
[ https://issues.apache.org/jira/browse/BEAM-7067?focusedWorklogId=226830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226830 ] ASF GitHub Bot logged work on BEAM-7067: Author: ASF GitHub Bot Created on: 12/Apr/19 18:03 Start Date: 12/Apr/19 18:03 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #8293: [BEAM-7067] make cleanArtifactsPerJob configurable for Flink job serv… URL: https://github.com/apache/beam/pull/8293 …er task R: @mxm Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) --- |Java | Python | Go | Website --- | --- | --- | --- | --- Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build
[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort
[ https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=226829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226829 ] ASF GitHub Bot logged work on BEAM-5519: Author: ASF GitHub Bot Created on: 12/Apr/19 18:00 Start Date: 12/Apr/19 18:00 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] Remove call to groupByKey in Spark Streaming. URL: https://github.com/apache/beam/pull/6511#issuecomment-482667518 I have spent a little bit of time trying to understand the Nexmark performance tests. I believe I have narrowed down the issue a little bit. The Nexmark BoundedEventSource splits based on the numEventsGenerator (which is 100). Before this PR, the first time there is a GroupByKey in these queries the RDD would be partitioned to the default parallelism (not 100). With this PR, the first time there is a GroupByKey in these queries the RDD would stay partitioned at 100. I believe this is what is affecting the performance. I would say we could accept this degradation in performance because in real use when we ask a Source to split itself in the way we asked. Unless there are a large number of sources that tend not to split well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 226829) Time Spent: 4.5h (was: 4h 20m) > Spark Streaming Duplicated Encoding/Decoding Effort > --- > > Key: BEAM-5519 > URL: https://issues.apache.org/jira/browse/BEAM-5519 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Labels: spark, spark-streaming, triaged > Fix For: 2.13.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > When using the SparkRunner in streaming mode. There is a call to groupByKey > followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this > used to cause 2 shuffles but it still causes 2 encode/decode cycles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)