[jira] [Work logged] (BEAM-6895) Upgrade joda-time to 2.10.1
[ https://issues.apache.org/jira/browse/BEAM-6895?focusedWorklogId=221601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221601 ] ASF GitHub Bot logged work on BEAM-6895: Author: ASF GitHub Bot Created on: 02/Apr/19 03:36 Start Date: 02/Apr/19 03:36 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8125: [BEAM-6895] Update joda-time to 2.10.1 URL: https://github.com/apache/beam/pull/8125#issuecomment-478831524 Looks like it is going to be just fine. The Dataflow tests hit a quota issue but that should be resolved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221601) Time Spent: 2h 40m (was: 2.5h) > Upgrade joda-time to 2.10.1 > --- > > Key: BEAM-6895 > URL: https://issues.apache.org/jira/browse/BEAM-6895 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Affects Versions: 2.11.0 >Reporter: David Moravek >Assignee: David Moravek >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > Upgrade joda-time dependency from 2.4 to 2.10.1. > We have reports that 2.4 version (from 2014), causes incompatibilities with > Spark's bundled version (2.9). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6895) Upgrade joda-time to 2.10.1
[ https://issues.apache.org/jira/browse/BEAM-6895?focusedWorklogId=221600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221600 ] ASF GitHub Bot logged work on BEAM-6895: Author: ASF GitHub Bot Created on: 02/Apr/19 03:35 Start Date: 02/Apr/19 03:35 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8125: [BEAM-6895] Update joda-time to 2.10.1 URL: https://github.com/apache/beam/pull/8125#issuecomment-478831417 run dataflow validatesrunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221600) Time Spent: 2.5h (was: 2h 20m) > Upgrade joda-time to 2.10.1 > --- > > Key: BEAM-6895 > URL: https://issues.apache.org/jira/browse/BEAM-6895 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Affects Versions: 2.11.0 >Reporter: David Moravek >Assignee: David Moravek >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > Upgrade joda-time dependency from 2.4 to 2.10.1. > We have reports that 2.4 version (from 2014), causes incompatibilities with > Spark's bundled version (2.9). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=221559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221559 ] ASF GitHub Bot logged work on BEAM-4046: Author: ASF GitHub Bot Created on: 02/Apr/19 01:21 Start Date: 02/Apr/19 01:21 Worklog Time Spent: 10m Work Description: adude3141 commented on pull request #8194: [DO NOT MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids URL: https://github.com/apache/beam/pull/8194 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) --- |Java | Python | Go | Website --- | --- | --- | --- | ---
[jira] [Work logged] (BEAM-6948) Remove superseded ant.xml for javadoc creation
[ https://issues.apache.org/jira/browse/BEAM-6948?focusedWorklogId=221553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221553 ] ASF GitHub Bot logged work on BEAM-6948: Author: ASF GitHub Bot Created on: 02/Apr/19 00:36 Start Date: 02/Apr/19 00:36 Worklog Time Spent: 10m Work Description: adude3141 commented on issue #8181: [BEAM-6948] remove superseded ant.xml for javadoc creation URL: https://github.com/apache/beam/pull/8181#issuecomment-478797265 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221553) Time Spent: 40m (was: 0.5h) > Remove superseded ant.xml for javadoc creation > -- > > Key: BEAM-6948 > URL: https://issues.apache.org/jira/browse/BEAM-6948 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Michael Luckey >Assignee: Michael Luckey >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Javadoc previously was created by maven delegating to ant. After migration to > Gradle, the corresponding ant build file is superseded by a build.gradle [1], > [2] > The pom.xml was deleted later after gradle migration was accepted. ant.xml > seems to be forgotten. > [1] https://issues.apache.org/jira/browse/BEAM-4108 > [2] https://github.com/apache/beam/pull/5121 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807313#comment-16807313 ] Michael Luckey commented on BEAM-4046: -- As Luke pointed out, we hit a confirmed gradle issue with duplicate project names here, see https://github.com/gradle/gradle/issues/847 {noformat} container dataflow examples fn-execution java jdbc job-server job-server-container py3 {noformat} Maybe one of the proposed workarounds is viable. > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Luckey reassigned BEAM-4046: Assignee: Michael Luckey > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Major > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6919) [beam_Release_NightlySnapshot] Cannot publish artifacts to Apache Maven repo
[ https://issues.apache.org/jira/browse/BEAM-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807293#comment-16807293 ] yifan zou commented on BEAM-6919: - Nothing weird on our side. File https://issues.apache.org/jira/browse/INFRA-18148 to the ASF Infra, also sent email to [reposit...@apache.org|mailto:reposit...@apache.org] for further investigation. > [beam_Release_NightlySnapshot] Cannot publish artifacts to Apache Maven repo > > > Key: BEAM-6919 > URL: https://issues.apache.org/jira/browse/BEAM-6919 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Daniel Oliveira >Assignee: yifan zou >Priority: Blocker > Labels: currently-failing, triaged > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_Release_NightlySnapshot/384/] > * [Gradle Build Scan|https://scans.gradle.com/s/osuzhgqygga2o] > Initial investigation: > Nightly snapshots are failing in beam:publish due to being unable to publish > artifacts to Apache's Maven repo: > {noformat} > 12:30:57 * What went wrong: > 12:30:57 Execution failed for task > ':beam-examples-java:publishMavenJavaPublicationToMavenRepository'. > 12:30:57 > Failed to publish publication 'mavenJava' to repository 'maven' > 12:30:57> Could not write to resource > 'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-examples-java/2.12.0-SNAPSHOT/beam-examples-java-2.12.0-20190326.191838-10.jar'. > 12:30:57 > Could not PUT > 'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-examples-java/2.12.0-SNAPSHOT/beam-examples-java-2.12.0-20190326.191838-10.jar'. > Received status code 401 from server: Unauthorized > {noformat} > This happens for many modules, not just beam-examples-java as in the example > above. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=221544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221544 ] ASF GitHub Bot logged work on BEAM-5723: Author: ASF GitHub Bot Created on: 01/Apr/19 23:38 Start Date: 01/Apr/19 23:38 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #8178: [BEAM-5723] Do not use the default shadow closure when building CassandraIO URL: https://github.com/apache/beam/pull/8178#discussion_r271088692 ## File path: sdks/java/io/cassandra/build.gradle ## @@ -18,11 +18,23 @@ plugins { id 'org.apache.beam.module' } // Do not relocate guava to avoid issues with Cassandra's version. -applyJavaNature(shadowClosure: DEFAULT_SHADOW_CLOSURE << { - dependencies { -exclude(dependency(project.library.java.guava)) - } -}) +applyJavaNature( +shadowClosure: { + dependencies { +// is default, but when omitted the default action is to include all runtime deps +include(dependency(project.library.java.guava)) + +// hack: now exclude the only thing that was included +exclude(dependency(project.library.java.guava)) + Review comment: The default is the `runtime` configuration and it has to be overridden. We never did that because we use the dependency filters. Something that I think should work instead is this: ```groovy shadowJar { configurations = [project.configurations.compile] } ``` I tried the above and it did not work. I think maybe it is the order of the concatenation of the `shadowClosure` with the parts of the configuration that are hardcoded into the main plugin. I am not sure. And I'm not sure if there is a var I can just set `dependencies = []`. Basically we want to turn off the shadow plugin here but I don't want to make such a big change. I manually confirmed the contents of the jar. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221544) Time Spent: 9.5h (was: 9h 20m) > CassandraIO is broken because of use of bad relocation of guava > --- > > Key: BEAM-5723 > URL: https://issues.apache.org/jira/browse/BEAM-5723 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Ismaël Mejía >Priority: Major > Fix For: 2.12.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6914) Use BigQuerySink as default for 2.12.
[ https://issues.apache.org/jira/browse/BEAM-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada closed BEAM-6914. --- Resolution: Fixed > Use BigQuerySink as default for 2.12. > - > > Key: BEAM-6914 > URL: https://issues.apache.org/jira/browse/BEAM-6914 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6914) Use BigQuerySink as default for 2.12.
[ https://issues.apache.org/jira/browse/BEAM-6914?focusedWorklogId=221541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221541 ] ASF GitHub Bot logged work on BEAM-6914: Author: ASF GitHub Bot Created on: 01/Apr/19 23:28 Start Date: 01/Apr/19 23:28 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8170: [release-2.12.0][BEAM-6914] Reverting behavior of Native BQ sink in Python (#8143) URL: https://github.com/apache/beam/pull/8170 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221541) Time Spent: 6h 10m (was: 6h) > Use BigQuerySink as default for 2.12. > - > > Key: BEAM-6914 > URL: https://issues.apache.org/jira/browse/BEAM-6914 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6914) Use BigQuerySink as default for 2.12.
[ https://issues.apache.org/jira/browse/BEAM-6914?focusedWorklogId=221542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221542 ] ASF GitHub Bot logged work on BEAM-6914: Author: ASF GitHub Bot Created on: 01/Apr/19 23:28 Start Date: 01/Apr/19 23:28 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8170: [release-2.12.0][BEAM-6914] Reverting behavior of Native BQ sink in Python (#8143) URL: https://github.com/apache/beam/pull/8170#issuecomment-478784520 Thanks all : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221542) Time Spent: 6h 20m (was: 6h 10m) > Use BigQuerySink as default for 2.12. > - > > Key: BEAM-6914 > URL: https://issues.apache.org/jira/browse/BEAM-6914 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6956) --experiments=worker_threads=100 issue
[ https://issues.apache.org/jira/browse/BEAM-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Zhao resolved BEAM-6956. -- Resolution: Fixed Fix Version/s: Not applicable > --experiments=worker_threads=100 issue > -- > > Key: BEAM-6956 > URL: https://issues.apache.org/jira/browse/BEAM-6956 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Jiayi Zhao >Priority: Major > Fix For: Not applicable > > > I noticed that without this "–experiments=worker_threads=100", pipeline will > stuck, > The weird thing is, I tried some complex pipeline using the in thread flink > method (./gradlew :beam-runners-flink_2.11-job-server:runShadow) > "–experiments=worker_threads=100" doesn't work, but > "–experiments=worker_threads=1000" works fine > Then I tried the same pipeline using the separate local flink cluster > (./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version > doesn't work, see BEAM-6915) > Neither did "–experiments=worker_threads=1000" or > "–experiments=worker_threads=1" work, pipeline stuck at certain stage > (shows running in flink UI but won't finish forever) > any real fix to that? Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6956) --experiments=worker_threads=100 issue
[ https://issues.apache.org/jira/browse/BEAM-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807249#comment-16807249 ] Jiayi Zhao commented on BEAM-6956: -- *set the following parameter to my flink in conf/flink-conf.yaml file solved my issue* *taskmanager.heap.mb: 10240* > --experiments=worker_threads=100 issue > -- > > Key: BEAM-6956 > URL: https://issues.apache.org/jira/browse/BEAM-6956 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Jiayi Zhao >Priority: Major > > I noticed that without this "–experiments=worker_threads=100", pipeline will > stuck, > The weird thing is, I tried some complex pipeline using the in thread flink > method (./gradlew :beam-runners-flink_2.11-job-server:runShadow) > "–experiments=worker_threads=100" doesn't work, but > "–experiments=worker_threads=1000" works fine > Then I tried the same pipeline using the separate local flink cluster > (./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version > doesn't work, see BEAM-6915) > Neither did "–experiments=worker_threads=1000" or > "–experiments=worker_threads=1" work, pipeline stuck at certain stage > (shows running in flink UI but won't finish forever) > any real fix to that? Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=221514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221514 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 01/Apr/19 22:28 Start Date: 01/Apr/19 22:28 Worklog Time Spent: 10m Work Description: ajamato commented on issue #8062: [BEAM-4374] Emit MeanByteCount distribution tuple system metric from Python SDK URL: https://github.com/apache/beam/pull/8062#issuecomment-478770322 @Ardagan I reset my branch and pushed it back to this PR, I'll continue iterating on this and fixing it up. Thanks for the contributions This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221514) Time Spent: 13h 10m (was: 13h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 13h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=221513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221513 ] ASF GitHub Bot logged work on BEAM-6876: Author: ASF GitHub Bot Created on: 01/Apr/19 22:23 Start Date: 01/Apr/19 22:23 Worklog Time Spent: 10m Work Description: apilloud commented on issue #8173: [release-2.12] [BEAM-6876] Cleanup user state in portable Flink Runner URL: https://github.com/apache/beam/pull/8173#issuecomment-478769017 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221513) Time Spent: 5h 50m (was: 5h 40m) > User state cleanup in portable Flink runner > --- > > Key: BEAM-6876 > URL: https://issues.apache.org/jira/browse/BEAM-6876 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.11.0 >Reporter: Thomas Weise >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Fix For: 2.12.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > State is currently not being cleaned up by the runner. > [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6914) Use BigQuerySink as default for 2.12.
[ https://issues.apache.org/jira/browse/BEAM-6914?focusedWorklogId=221511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221511 ] ASF GitHub Bot logged work on BEAM-6914: Author: ASF GitHub Bot Created on: 01/Apr/19 22:16 Start Date: 01/Apr/19 22:16 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8170: [release-2.12.0][BEAM-6914] Reverting behavior of Native BQ sink in Python (#8143) URL: https://github.com/apache/beam/pull/8170#issuecomment-478767496 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221511) Time Spent: 6h (was: 5h 50m) > Use BigQuerySink as default for 2.12. > - > > Key: BEAM-6914 > URL: https://issues.apache.org/jira/browse/BEAM-6914 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner
[ https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221500&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221500 ] ASF GitHub Bot logged work on BEAM-6945: Author: ASF GitHub Bot Created on: 01/Apr/19 22:07 Start Date: 01/Apr/19 22:07 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #8175: [BEAM-6945] Add single entry point for metrics.proto constants in java DF worker URL: https://github.com/apache/beam/pull/8175#issuecomment-478765304 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221500) Time Spent: 1h (was: 50m) > Utilize label and urn values from metrics.proto in Java DF Runner > - > > Key: BEAM-6945 > URL: https://issues.apache.org/jira/browse/BEAM-6945 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > We have some of values from metrics.proto hardcoded in java code. > Generalize access to those values and refactor Java DF Runner to utilize new > approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner
[ https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221501&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221501 ] ASF GitHub Bot logged work on BEAM-6945: Author: ASF GitHub Bot Created on: 01/Apr/19 22:07 Start Date: 01/Apr/19 22:07 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #8175: [BEAM-6945] Add single entry point for metrics.proto constants in java DF worker URL: https://github.com/apache/beam/pull/8175#issuecomment-478765304 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221501) Time Spent: 1h 10m (was: 1h) > Utilize label and urn values from metrics.proto in Java DF Runner > - > > Key: BEAM-6945 > URL: https://issues.apache.org/jira/browse/BEAM-6945 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > We have some of values from metrics.proto hardcoded in java code. > Generalize access to those values and refactor Java DF Runner to utilize new > approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221499&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221499 ] ASF GitHub Bot logged work on BEAM-6953: Author: ASF GitHub Bot Created on: 01/Apr/19 22:06 Start Date: 01/Apr/19 22:06 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #8188: [BEAM-6953] Make bq constants args URL: https://github.com/apache/beam/pull/8188 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221499) Time Spent: 1h 20m (was: 1h 10m) > BigQueryIO has constants that should be PipelineOptions > --- > > Key: BEAM-6953 > URL: https://issues.apache.org/jira/browse/BEAM-6953 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Reuven Lax >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221497 ] ASF GitHub Bot logged work on BEAM-6502: Author: ASF GitHub Bot Created on: 01/Apr/19 21:56 Start Date: 01/Apr/19 21:56 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8146: [BEAM-6502] Re-Remove runner time execution information from public API surface (now including Watch) URL: https://github.com/apache/beam/pull/8146#issuecomment-478762573 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221497) Time Spent: 2h 20m (was: 2h 10m) > SplittableDoFn: Re-Remove runner time execution information from public API > surface > --- > > Key: BEAM-6502 > URL: https://issues.apache.org/jira/browse/BEAM-6502 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Labels: triaged > Time Spent: 2h 20m > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6966) Spark portable runner: get PAssert working
[ https://issues.apache.org/jira/browse/BEAM-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807220#comment-16807220 ] Kyle Weaver commented on BEAM-6966: --- Note to self: READ isn't actually needed; it can be replaced with impulse. > Spark portable runner: get PAssert working > -- > > Key: BEAM-6966 > URL: https://issues.apache.org/jira/browse/BEAM-6966 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > This would help a lot with testing, such as validatesRunner tests and others. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner
[ https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221493 ] ASF GitHub Bot logged work on BEAM-6945: Author: ASF GitHub Bot Created on: 01/Apr/19 21:48 Start Date: 01/Apr/19 21:48 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #8175: [BEAM-6945] Add single entry point for metrics.proto constants in java DF worker URL: https://github.com/apache/beam/pull/8175#discussion_r271061889 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps; +import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec; + +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +/** This static class fetches MonitoringInfo related values from metrics.proto. */ +public final class MonitoringInfoConstants { + + /** Supported MonitoringInfo Urns. */ Review comment: I have $0.02: group things according to functionality not what type of thing they are. So, do not collocate these things because they are constants - collocate them because they are for working with `MonitoringInfo` objects. It is somewhat traditional at this point to use a plural like `MonitoringInfos` for utility methods and constants. It is not necessary, and IMO not useful, to distinguish constants from other static utility bits. I think these being collocated does make sense, but that is up to style. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221493) Time Spent: 50m (was: 40m) > Utilize label and urn values from metrics.proto in Java DF Runner > - > > Key: BEAM-6945 > URL: https://issues.apache.org/jira/browse/BEAM-6945 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We have some of values from metrics.proto hardcoded in java code. > Generalize access to those values and refactor Java DF Runner to utilize new > approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner
[ https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221492 ] ASF GitHub Bot logged work on BEAM-6945: Author: ASF GitHub Bot Created on: 01/Apr/19 21:46 Start Date: 01/Apr/19 21:46 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8175: [BEAM-6945] Add single entry point for metrics.proto constants in java DF worker URL: https://github.com/apache/beam/pull/8175#discussion_r271063358 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps; +import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec; + +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +/** This static class fetches MonitoringInfo related values from metrics.proto. */ +public final class MonitoringInfoConstants { + + /** Supported MonitoringInfo Urns. */ Review comment: I think it makes sense to have all of these in one file. They're not so many that it's hard to track. I think either approach is acceptable though, so I wouldn't block the change either way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221492) Time Spent: 40m (was: 0.5h) > Utilize label and urn values from metrics.proto in Java DF Runner > - > > Key: BEAM-6945 > URL: https://issues.apache.org/jira/browse/BEAM-6945 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > We have some of values from metrics.proto hardcoded in java code. > Generalize access to those values and refactor Java DF Runner to utilize new > approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6956) --experiments=worker_threads=100 issue
[ https://issues.apache.org/jira/browse/BEAM-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807065#comment-16807065 ] Jiayi Zhao edited comment on BEAM-6956 at 4/1/19 9:41 PM: -- it might due to some random stuff, I tried in thread flink again with 100 worker_threads, and this time it works(I also changed -parallelism=1 instead of -parallelism=2 before, not sure if it's related), but when I tried separate local flink cluster, it works for some simple pipeline but got stuck for some complex tensorflow transform pipelines was (Author: 1025kb): it might due to some random stuff, I tried in thread flink again with 100 worker_threads, and this time it works(I also changed --parallelism=1 instead of --parallelism=2 before, not sure if it's related), when I tried separate local flink cluster, each time pipeline stuck at different places, I will try confirm it and put more information about where it stucks > --experiments=worker_threads=100 issue > -- > > Key: BEAM-6956 > URL: https://issues.apache.org/jira/browse/BEAM-6956 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Jiayi Zhao >Priority: Major > > I noticed that without this "–experiments=worker_threads=100", pipeline will > stuck, > The weird thing is, I tried some complex pipeline using the in thread flink > method (./gradlew :beam-runners-flink_2.11-job-server:runShadow) > "–experiments=worker_threads=100" doesn't work, but > "–experiments=worker_threads=1000" works fine > Then I tried the same pipeline using the separate local flink cluster > (./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version > doesn't work, see BEAM-6915) > Neither did "–experiments=worker_threads=1000" or > "–experiments=worker_threads=1" work, pipeline stuck at certain stage > (shows running in flink UI but won't finish forever) > any real fix to that? Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6602) Support schemas in BigQueryIO.Write
[ https://issues.apache.org/jira/browse/BEAM-6602?focusedWorklogId=221481&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221481 ] ASF GitHub Bot logged work on BEAM-6602: Author: ASF GitHub Bot Created on: 01/Apr/19 21:11 Start Date: 01/Apr/19 21:11 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #7840: [BEAM-6602] BigQueryIO.write natively understands Beam schemas URL: https://github.com/apache/beam/pull/7840#issuecomment-478749573 Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221481) Time Spent: 2h 50m (was: 2h 40m) > Support schemas in BigQueryIO.Write > --- > > Key: BEAM-6602 > URL: https://issues.apache.org/jira/browse/BEAM-6602 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Labels: triaged > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator
[ https://issues.apache.org/jira/browse/BEAM-6906?focusedWorklogId=221480&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221480 ] ASF GitHub Bot logged work on BEAM-6906: Author: ASF GitHub Bot Created on: 01/Apr/19 21:10 Start Date: 01/Apr/19 21:10 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #8134: [BEAM-6906] Update spec on CombineFn and DoFn to clarify mutability of parameters URL: https://github.com/apache/beam/pull/8134 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221480) Time Spent: 40m (was: 0.5h) > Mutating accumulators in fused stages is generally unsafe - need to provide a > single mutable accumulator > > > Key: BEAM-6906 > URL: https://issues.apache.org/jira/browse/BEAM-6906 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Kenneth Knowles >Assignee: Yueyang Qiu >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > Our current docs encourage a CombineFn author to mutate accumulators for > efficiency. This is important, but cannot be done generally without losing > efficiency - it is not safe to share accumulators within a stage or across > sliding windows. The ownership story needs to be clear. Any accumulator that > is mutable is from that point on owned by the CombineFn, not the runner and > cannot be given to other steps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner
[ https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221479 ] ASF GitHub Bot logged work on BEAM-6945: Author: ASF GitHub Bot Created on: 01/Apr/19 21:08 Start Date: 01/Apr/19 21:08 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #8175: [BEAM-6945] Add single entry point for metrics.proto constants in java DF worker URL: https://github.com/apache/beam/pull/8175#discussion_r271051938 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps; +import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec; + +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +/** This static class fetches MonitoringInfo related values from metrics.proto. */ +public final class MonitoringInfoConstants { + + /** Supported MonitoringInfo Urns. */ Review comment: I believe all of these have tight relation through MonitoringInfo. Having them in single file/namespace is convenient to highlight the use case. Also, I don't feel we win much of shorter usage. To make thinks even shorter, you can import sub-class directly and then it will be: Urls.ELEMENT_COUNT, etc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221479) Time Spent: 0.5h (was: 20m) > Utilize label and urn values from metrics.proto in Java DF Runner > - > > Key: BEAM-6945 > URL: https://issues.apache.org/jira/browse/BEAM-6945 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > We have some of values from metrics.proto hardcoded in java code. > Generalize access to those values and refactor Java DF Runner to utilize new > approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem
[ https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=221478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221478 ] ASF GitHub Bot logged work on BEAM-6821: Author: ASF GitHub Bot Created on: 01/Apr/19 21:07 Start Date: 01/Apr/19 21:07 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8054: [BEAM-6821] FileBasedSink improper paths URL: https://github.com/apache/beam/pull/8054#issuecomment-478748435 Please take a look at test failures: Click on Details -> Gradle Build Scan. There is a lint error and a test failure. To run an individual test locally see: https://cwiki.apache.org/confluence/display/BEAM/Contribution+Testing+Guide#ContributionTestingGuide-HowtorunPythonunittests Perhaps that's a flake. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221478) Time Spent: 2h (was: 1h 50m) > FileBasedSink is not creating file paths according to target filesystem > --- > > Key: BEAM-6821 > URL: https://issues.apache.org/jira/browse/BEAM-6821 > Project: Beam > Issue Type: Bug > Components: io-java-text >Affects Versions: 2.11.0 > Environment: Windows 10 >Reporter: Gregory Kovelman >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > File path generated in _open_writer_ method is not according to target > filesystem, because > os.path.join is used and not FileSystems.join. > apache_beam\io\filebasedsink.py extract: > > {code:java} > def _create_temp_dir(self, file_path_prefix): > base_path, last_component = FileSystems.split(file_path_prefix) > if not last_component: ># Trying to re-split the base_path to check if it's a root. >new_base_path, _ = FileSystems.split(base_path) >if base_path == new_base_path: > raise ValueError('Cannot create a temporary directory for root path ' > 'prefix %s. Please specify a file path prefix with ' > 'at least two components.' % file_path_prefix) > path_components = [base_path, > 'beam-temp-' + last_component + '-' + uuid.uuid1().hex] > return FileSystems.join(*path_components) > @check_accessible(['file_path_prefix', 'file_name_suffix']) > def open_writer(self, init_result, uid): > # A proper suffix is needed for AUTO compression detection. > # We also ensure there will be no collisions with uid and a > # (possibly unsharded) file_path_prefix and a (possibly empty) > # file_name_suffix. > file_path_prefix = self.file_path_prefix.get() > file_name_suffix = self.file_name_suffix.get() > suffix = ( > '.' + os.path.basename(file_path_prefix) + file_name_suffix) > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > {code} > > > This created incompatibilities between, for example, Windows and GCS. > Expected: gs://bucket/beam-temp-result-uuid\\uid.result > Actual: gs://bucket/beam-temp-result-uuid/uid.result > Replacing os.path.join with FileSystems.join fixes the issue -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner
[ https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221477 ] ASF GitHub Bot logged work on BEAM-6945: Author: ASF GitHub Bot Created on: 01/Apr/19 21:05 Start Date: 01/Apr/19 21:05 Worklog Time Spent: 10m Work Description: ajamato commented on pull request #8175: [BEAM-6945] Add single entry point for metrics.proto constants in java DF worker URL: https://github.com/apache/beam/pull/8175#discussion_r271050465 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.metrics; + +import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps; +import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec; + +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs; +import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns; +import org.apache.beam.model.pipeline.v1.RunnerApi; + +/** This static class fetches MonitoringInfo related values from metrics.proto. */ +public final class MonitoringInfoConstants { + + /** Supported MonitoringInfo Urns. */ Review comment: Can we make this multiple classes instead, which will shorten the use a bit MonitoringInfoConstants.Urns.ELEMENT_COUNT becomes MonitoringInfoUrns.ELEMENT_COUNT MonitoringInfoConstants.Labels.PTRANSFORM becomes MonitoringInfoLabels.PTRANSFORM MonitoringInfoConstants.TypeUrns.SUM_INT64 becomes MonitoringInfoTypeUrns.SUM_INT64 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221477) Time Spent: 20m (was: 10m) > Utilize label and urn values from metrics.proto in Java DF Runner > - > > Key: BEAM-6945 > URL: https://issues.apache.org/jira/browse/BEAM-6945 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We have some of values from metrics.proto hardcoded in java code. > Generalize access to those values and refactor Java DF Runner to utilize new > approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3312) Add convenient "with" to MqttIO.ConnectionConfiguration
[ https://issues.apache.org/jira/browse/BEAM-3312?focusedWorklogId=221475&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221475 ] ASF GitHub Bot logged work on BEAM-3312: Author: ASF GitHub Bot Created on: 01/Apr/19 20:58 Start Date: 01/Apr/19 20:58 Worklog Time Spent: 10m Work Description: iemejia commented on issue #8167: [BEAM-3312] Improve the builder to MqttIO connection URL: https://github.com/apache/beam/pull/8167#issuecomment-478745144 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221475) Time Spent: 50m (was: 40m) > Add convenient "with" to MqttIO.ConnectionConfiguration > --- > > Key: BEAM-3312 > URL: https://issues.apache.org/jira/browse/BEAM-3312 > Project: Beam > Issue Type: Improvement > Components: io-java-mqtt >Reporter: Jean-Baptiste Onofré >Assignee: LI Guobao >Priority: Major > Labels: triaged > Time Spent: 50m > Remaining Estimate: 0h > > Now, for instance, {{MqttIO}} requires a {{ConnectionConfiguration}} object > to pass the URL and topic name. It means, the user has to do something like: > {code} > MqttIO.read() > .withConnectionConfiguration(MqttIO.ConnectionConfiguration.create("tcp://localhost:1883", > "CAR")) > {code} > It's pretty verbose and long. I think it makes sense to provide convenient > "direct" method allowing to do: > {code} > MqttIO.read().withUrl().withTopic() > {code} > or even: > {code} > MqttIO.read().withConnection("url", "topic") > {code} > The same apply for some other IOs (JMS, ...). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner
[ https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221474&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221474 ] ASF GitHub Bot logged work on BEAM-6945: Author: ASF GitHub Bot Created on: 01/Apr/19 20:55 Start Date: 01/Apr/19 20:55 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #8175: [BEAM-6945] Add single entry point for metrics.proto constants in java DF worker URL: https://github.com/apache/beam/pull/8175#issuecomment-478744263 R: @ajamato C: @pabloem, @kennknowles This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221474) Time Spent: 10m Remaining Estimate: 0h > Utilize label and urn values from metrics.proto in Java DF Runner > - > > Key: BEAM-6945 > URL: https://issues.apache.org/jira/browse/BEAM-6945 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We have some of values from metrics.proto hardcoded in java code. > Generalize access to those values and refactor Java DF Runner to utilize new > approach. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6966) Spark portable runner: get PAssert working
Kyle Weaver created BEAM-6966: - Summary: Spark portable runner: get PAssert working Key: BEAM-6966 URL: https://issues.apache.org/jira/browse/BEAM-6966 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Kyle Weaver Assignee: Kyle Weaver This would help a lot with testing, such as validatesRunner tests and others. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6753) Create proto representation for schemas
[ https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221465&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221465 ] ASF GitHub Bot logged work on BEAM-6753: Author: ASF GitHub Bot Created on: 01/Apr/19 20:38 Start Date: 01/Apr/19 20:38 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #7952: [BEAM-6753] Set the stage to make schema coder update compatible URL: https://github.com/apache/beam/pull/7952#issuecomment-478738280 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221465) Time Spent: 1.5h (was: 1h 20m) > Create proto representation for schemas > --- > > Key: BEAM-6753 > URL: https://issues.apache.org/jira/browse/BEAM-6753 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221464&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221464 ] ASF GitHub Bot logged work on BEAM-6953: Author: ASF GitHub Bot Created on: 01/Apr/19 20:38 Start Date: 01/Apr/19 20:38 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #8188: [BEAM-6953] Make bq constants args URL: https://github.com/apache/beam/pull/8188#issuecomment-478738133 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221464) Time Spent: 1h 10m (was: 1h) > BigQueryIO has constants that should be PipelineOptions > --- > > Key: BEAM-6953 > URL: https://issues.apache.org/jira/browse/BEAM-6953 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Reuven Lax >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6965) Spark portable translator: translate READ
Kyle Weaver created BEAM-6965: - Summary: Spark portable translator: translate READ Key: BEAM-6965 URL: https://issues.apache.org/jira/browse/BEAM-6965 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Kyle Weaver Assignee: Kyle Weaver -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=221458&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221458 ] ASF GitHub Bot logged work on BEAM-6876: Author: ASF GitHub Bot Created on: 01/Apr/19 20:13 Start Date: 01/Apr/19 20:13 Worklog Time Spent: 10m Work Description: apilloud commented on issue #8173: [release-2.12] [BEAM-6876] Cleanup user state in portable Flink Runner URL: https://github.com/apache/beam/pull/8173#issuecomment-478729355 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221458) Time Spent: 5h 40m (was: 5.5h) > User state cleanup in portable Flink runner > --- > > Key: BEAM-6876 > URL: https://issues.apache.org/jira/browse/BEAM-6876 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.11.0 >Reporter: Thomas Weise >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Fix For: 2.12.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > State is currently not being cleaned up by the runner. > [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6944) Add support for MeanByteCount metric for Python Streaming to Java DF Runner
[ https://issues.apache.org/jira/browse/BEAM-6944?focusedWorklogId=221453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221453 ] ASF GitHub Bot logged work on BEAM-6944: Author: ASF GitHub Bot Created on: 01/Apr/19 20:10 Start Date: 01/Apr/19 20:10 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #8171: [BEAM-6944] Add MeanByteCountMonitoringInfoToCounterUpdateTransformer URL: https://github.com/apache/beam/pull/8171 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221453) Time Spent: 1h (was: 50m) > Add support for MeanByteCount metric for Python Streaming to Java DF Runner > --- > > Key: BEAM-6944 > URL: https://issues.apache.org/jira/browse/BEAM-6944 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-harness >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6922) Remove LGPL test library dependency in cassandraio-test
[ https://issues.apache.org/jira/browse/BEAM-6922?focusedWorklogId=221452&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221452 ] ASF GitHub Bot logged work on BEAM-6922: Author: ASF GitHub Bot Created on: 01/Apr/19 20:08 Start Date: 01/Apr/19 20:08 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8182: [BEAM-6922] Replace cassandra-unit with achilles URL: https://github.com/apache/beam/pull/8182#issuecomment-478727534 I trust also that CassandraIO was working before, so it is still working. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221452) Time Spent: 1.5h (was: 1h 20m) > Remove LGPL test library dependency in cassandraio-test > --- > > Key: BEAM-6922 > URL: https://issues.apache.org/jira/browse/BEAM-6922 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > cassandra-io tests use cassandra-unit test library that has LGPLV3 ASF > category X licence , we cannot deliver test jars that depend on LGPL licence. > A similar discussion at > https://issues.apache.org/jira/browse/LEGAL-153?focusedCommentId=13548819 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6922) Remove LGPL test library dependency in cassandraio-test
[ https://issues.apache.org/jira/browse/BEAM-6922?focusedWorklogId=221451&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221451 ] ASF GitHub Bot logged work on BEAM-6922: Author: ASF GitHub Bot Created on: 01/Apr/19 20:07 Start Date: 01/Apr/19 20:07 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #8182: [BEAM-6922] Replace cassandra-unit with achilles URL: https://github.com/apache/beam/pull/8182#discussion_r271029223 ## File path: sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java ## @@ -88,63 +92,64 @@ private static final long NUM_ROWS = 20L; private static final String CASSANDRA_KEYSPACE = "beam_ks"; private static final String CASSANDRA_HOST = "127.0.0.1"; - private static final int CASSANDRA_PORT = 9142; + private static final int CASSANDRA_PORT = 9042; Review comment: So this still cannot have more than one on a host? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221451) Time Spent: 1h 20m (was: 1h 10m) > Remove LGPL test library dependency in cassandraio-test > --- > > Key: BEAM-6922 > URL: https://issues.apache.org/jira/browse/BEAM-6922 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > cassandra-io tests use cassandra-unit test library that has LGPLV3 ASF > category X licence , we cannot deliver test jars that depend on LGPL licence. > A similar discussion at > https://issues.apache.org/jira/browse/LEGAL-153?focusedCommentId=13548819 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator
[ https://issues.apache.org/jira/browse/BEAM-6906?focusedWorklogId=221449&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221449 ] ASF GitHub Bot logged work on BEAM-6906: Author: ASF GitHub Bot Created on: 01/Apr/19 20:01 Start Date: 01/Apr/19 20:01 Worklog Time Spent: 10m Work Description: robinyqiu commented on issue #8134: [BEAM-6906] Update spec on CombineFn and DoFn to clarify mutability of parameters URL: https://github.com/apache/beam/pull/8134#issuecomment-478724935 Ping @kennknowles This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221449) Time Spent: 0.5h (was: 20m) > Mutating accumulators in fused stages is generally unsafe - need to provide a > single mutable accumulator > > > Key: BEAM-6906 > URL: https://issues.apache.org/jira/browse/BEAM-6906 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Kenneth Knowles >Assignee: Yueyang Qiu >Priority: Critical > Time Spent: 0.5h > Remaining Estimate: 0h > > Our current docs encourage a CombineFn author to mutate accumulators for > efficiency. This is important, but cannot be done generally without losing > efficiency - it is not safe to share accumulators within a stage or across > sliding windows. The ownership story needs to be clear. Any accumulator that > is mutable is from that point on owned by the CombineFn, not the runner and > cannot be given to other steps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221445&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221445 ] ASF GitHub Bot logged work on BEAM-6953: Author: ASF GitHub Bot Created on: 01/Apr/19 19:50 Start Date: 01/Apr/19 19:50 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #8188: [BEAM-6953] Make bq constants args URL: https://github.com/apache/beam/pull/8188#issuecomment-478720944 Run RAT PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221445) Time Spent: 1h (was: 50m) > BigQueryIO has constants that should be PipelineOptions > --- > > Key: BEAM-6953 > URL: https://issues.apache.org/jira/browse/BEAM-6953 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Reuven Lax >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221444 ] ASF GitHub Bot logged work on BEAM-6953: Author: ASF GitHub Bot Created on: 01/Apr/19 19:50 Start Date: 01/Apr/19 19:50 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #8188: [BEAM-6953] Make bq constants args URL: https://github.com/apache/beam/pull/8188#issuecomment-478720856 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221444) Time Spent: 50m (was: 40m) > BigQueryIO has constants that should be PipelineOptions > --- > > Key: BEAM-6953 > URL: https://issues.apache.org/jira/browse/BEAM-6953 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Reuven Lax >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6753) Create proto representation for schemas
[ https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221443&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221443 ] ASF GitHub Bot logged work on BEAM-6753: Author: ASF GitHub Bot Created on: 01/Apr/19 19:50 Start Date: 01/Apr/19 19:50 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #7952: [BEAM-6753] Set the stage to make schema coder update compatible URL: https://github.com/apache/beam/pull/7952#issuecomment-478720719 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221443) Time Spent: 1h 20m (was: 1h 10m) > Create proto representation for schemas > --- > > Key: BEAM-6753 > URL: https://issues.apache.org/jira/browse/BEAM-6753 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property
[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6954: Description: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize the json, it shouldn't serialize null values at [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655] Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. --- Okay, I have investigated further, and it seems the default value is indeed calculated before the json serialization by calling the mentioned method. The problem is that it returns a RuntimeValueProvider, which gets serialized as null ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279] ), because the isAccessible returns false ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250] + [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220] ) ... and then during deserialization it is found in the jsonOptions at ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] ), so it executes the getValueFromJson which uses an ObjectMapper to create a ValueProvider from a NullNode ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498] ) . The problem here is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes. -> [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)] For this part of the issue see BEAM-6963 (that still doesn't solve this issue btw, but might be required for it) was: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !is
[jira] [Work logged] (BEAM-6753) Create proto representation for schemas
[ https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221440 ] ASF GitHub Bot logged work on BEAM-6753: Author: ASF GitHub Bot Created on: 01/Apr/19 19:38 Start Date: 01/Apr/19 19:38 Worklog Time Spent: 10m Work Description: dpmills commented on issue #7952: [BEAM-6753] Set the stage to make schema coder update compatible URL: https://github.com/apache/beam/pull/7952#issuecomment-478716551 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221440) Time Spent: 1h 10m (was: 1h) > Create proto representation for schemas > --- > > Key: BEAM-6753 > URL: https://issues.apache.org/jira/browse/BEAM-6753 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6964) Simplify key context handling in ExecutableStageDoFnOperator
[ https://issues.apache.org/jira/browse/BEAM-6964?focusedWorklogId=221436&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221436 ] ASF GitHub Bot logged work on BEAM-6964: Author: ASF GitHub Bot Created on: 01/Apr/19 19:30 Start Date: 01/Apr/19 19:30 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8190: [BEAM-6964] Simplify key context handling in ExecutableStageDoFnOperator URL: https://github.com/apache/beam/pull/8190 The key handling code in ExecutableStageDoFnOperator was a bit hard to understand because of the different contexts in which the current key is set. This simplifies the key setting by always using the state backend to set the key. An additional guard has been added to ensure that no concurrent access is performed during key access. This also adds an additional class for consistent use of key coders. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) --- |Java | Python | Go | Website --- | --- | --- | --- | --- Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache
[jira] [Work logged] (BEAM-6753) Create proto representation for schemas
[ https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221434&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221434 ] ASF GitHub Bot logged work on BEAM-6753: Author: ASF GitHub Bot Created on: 01/Apr/19 19:24 Start Date: 01/Apr/19 19:24 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7952: [BEAM-6753] Set the stage to make schema coder update compatible URL: https://github.com/apache/beam/pull/7952#discussion_r271015156 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java ## @@ -285,15 +289,24 @@ public InstrumentedType prepare(InstrumentedType instrumentedType) { // per-field Coders. static Row decodeDelegate(Schema schema, Coder[] coders, InputStream inputStream) throws IOException { + int fieldCount = VAR_INT_CODER.decode(inputStream); Review comment: as discussed, we don't want to have to reorder fields in the encoding. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221434) Time Spent: 1h (was: 50m) > Create proto representation for schemas > --- > > Key: BEAM-6753 > URL: https://issues.apache.org/jira/browse/BEAM-6753 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization
[ https://issues.apache.org/jira/browse/BEAM-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6963: Description: Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: ( [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext] ) ) "Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. My guess is that either we should completely omit serializing RuntimeValueProviders, or during deserialization the proper runtime value provider should be created again - which requires more than just a simple "null" being present in the json. was: Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: ( [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext] ) ) "Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. My guess is that either we should completely omit serializing RuntimeValueProviders, or during deserialization the proper runtime value provider should be created again - which requires more than a "null" being present in the json. > Bug in RuntimeValueProvider JSON serialization > -- > > Key: BEAM-6963 > URL: https://issues.apache.org/jira/browse/BEAM-6963 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority: Critical > > Classes affected: > org.apache.beam.sdk.options.ValueProvider.Serializer > org.apache.beam.sdk.options.ValueProvider.Deserializer > The problem is that according to the JsonDeserializer documentation, the > deserialize method isn't executed for null nodes: > ( > [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext] > ) > ) > "Note that this method is never called for JSON null literal, and thus > deserializers need (and should) not check for it." > If we serialize a RuntimeValueProvider, the isAccessible() will return false, > so we call a writeNull(). During deserialization this isn't handled properly > as mentioned and our deserialization will return null. > The end result is that getters with ValueProvider return values will return > "null". AFAIK ValueProvider getters should be never null. > My guess is that either we should completely omit serializing > RuntimeValueProviders, or during deserialization the proper runtime value > provider should be created again - which requires more than just a simple > "null" being
[jira] [Created] (BEAM-6964) Simplify key context handling in ExecutableStageDoFnOperator
Maximilian Michels created BEAM-6964: Summary: Simplify key context handling in ExecutableStageDoFnOperator Key: BEAM-6964 URL: https://issues.apache.org/jira/browse/BEAM-6964 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Maximilian Michels Assignee: Maximilian Michels The code is a bit hard to understand because of multiple keys being set in different contexts, for 1) state requests 2) timers setting 3) timer firing. This should be simplified. We can also introduce a helper class for key encoding/decoding to achieve consistency across all places which use a key serializer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221423&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221423 ] ASF GitHub Bot logged work on BEAM-6953: Author: ASF GitHub Bot Created on: 01/Apr/19 19:06 Start Date: 01/Apr/19 19:06 Worklog Time Spent: 10m Work Description: dpmills commented on issue #8188: [BEAM-6953] Make bq constants args URL: https://github.com/apache/beam/pull/8188#issuecomment-478705542 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221423) Time Spent: 40m (was: 0.5h) > BigQueryIO has constants that should be PipelineOptions > --- > > Key: BEAM-6953 > URL: https://issues.apache.org/jira/browse/BEAM-6953 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Reuven Lax >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization
[ https://issues.apache.org/jira/browse/BEAM-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6963: Description: Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: ( [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext] ) ) "Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. My guess is that either we should completely omit serializing RuntimeValueProviders, or during deserialization the proper runtime value provider should be created again - which requires more than a "null" being present in the json. was: Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: ( [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext] ) ) "Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. > Bug in RuntimeValueProvider JSON serialization > -- > > Key: BEAM-6963 > URL: https://issues.apache.org/jira/browse/BEAM-6963 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority: Critical > > Classes affected: > org.apache.beam.sdk.options.ValueProvider.Serializer > org.apache.beam.sdk.options.ValueProvider.Deserializer > The problem is that according to the JsonDeserializer documentation, the > deserialize method isn't executed for null nodes: > ( > [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext] > ) > ) > "Note that this method is never called for JSON null literal, and thus > deserializers need (and should) not check for it." > If we serialize a RuntimeValueProvider, the isAccessible() will return false, > so we call a writeNull(). During deserialization this isn't handled properly > as mentioned and our deserialization will return null. > The end result is that getters with ValueProvider return values will return > "null". AFAIK ValueProvider getters should be never null. > My guess is that either we should completely omit serializing > RuntimeValueProviders, or during deserialization the proper runtime value > provider should be created again - which requires more than a "null" being > present in the json. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization
[ https://issues.apache.org/jira/browse/BEAM-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6963: Description: Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: ( [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext] ) ) "Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. was: Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: ( https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext) ) "Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. > Bug in RuntimeValueProvider JSON serialization > -- > > Key: BEAM-6963 > URL: https://issues.apache.org/jira/browse/BEAM-6963 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority: Critical > > Classes affected: > org.apache.beam.sdk.options.ValueProvider.Serializer > org.apache.beam.sdk.options.ValueProvider.Deserializer > The problem is that according to the JsonDeserializer documentation, the > deserialize method isn't executed for null nodes: > ( > [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext] > ) > ) > "Note that this method is never called for JSON null literal, and thus > deserializers need (and should) not check for it." > If we serialize a RuntimeValueProvider, the isAccessible() will return false, > so we call a writeNull(). During deserialization this isn't handled properly > as mentioned and our deserialization will return null. > The end result is that getters with ValueProvider return values will return > "null". AFAIK ValueProvider getters should be never null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization
[ https://issues.apache.org/jira/browse/BEAM-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6963: Description: Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: ( https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext) ) "Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. was: Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext) ]"Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. > Bug in RuntimeValueProvider JSON serialization > -- > > Key: BEAM-6963 > URL: https://issues.apache.org/jira/browse/BEAM-6963 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority: Critical > > Classes affected: > org.apache.beam.sdk.options.ValueProvider.Serializer > org.apache.beam.sdk.options.ValueProvider.Deserializer > The problem is that according to the JsonDeserializer documentation, the > deserialize method isn't executed for null nodes: > ( > https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext) > ) > "Note that this method is never called for JSON null literal, and thus > deserializers need (and should) not check for it." > If we serialize a RuntimeValueProvider, the isAccessible() will return false, > so we call a writeNull(). During deserialization this isn't handled properly > as mentioned and our deserialization will return null. > The end result is that getters with ValueProvider return values will return > "null". AFAIK ValueProvider getters should be never null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property
[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6954: Description: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize the json, it shouldn't serialize null values at [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655] Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. --- Okay, I have investigated further, and it seems the default value is indeed calculated before the json serialization by calling the mentioned method. The problem is that it returns a RuntimeValueProvider, which gets serialized as null ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279] ), because the isAccessible returns false ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250] + [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220] ) ... and then during deserialization it is found in the jsonOptions at ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] ), so it executes the getValueFromJson which uses an ObjectMapper to create a ValueProvider from a NullNode ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498] ) . The problem here is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes. -> [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)] For this part of the issue see BEAM-6963 (that still doesn't solve this issue btw) was: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we seri
[jira] [Created] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization
Balázs Németh created BEAM-6963: --- Summary: Bug in RuntimeValueProvider JSON serialization Key: BEAM-6963 URL: https://issues.apache.org/jira/browse/BEAM-6963 Project: Beam Issue Type: Bug Components: sdk-java-core Affects Versions: 2.11.0 Reporter: Balázs Németh Classes affected: org.apache.beam.sdk.options.ValueProvider.Serializer org.apache.beam.sdk.options.ValueProvider.Deserializer The problem is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes: [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext) ]"Note that this method is never called for JSON null literal, and thus deserializers need (and should) not check for it." If we serialize a RuntimeValueProvider, the isAccessible() will return false, so we call a writeNull(). During deserialization this isn't handled properly as mentioned and our deserialization will return null. The end result is that getters with ValueProvider return values will return "null". AFAIK ValueProvider getters should be never null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6960) Run Go PostCommit tests against a ULR
[ https://issues.apache.org/jira/browse/BEAM-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-6960: --- Description: See parent task :https://issues.apache.org/jira/browse/BEAM-6958 [Instructions on running against the Java ULR|https://cwiki.apache.org/confluence/display/BEAM/Usage+Guide] was:See parent task :https://issues.apache.org/jira/browse/BEAM-6958 > Run Go PostCommit tests against a ULR > - > > Key: BEAM-6960 > URL: https://issues.apache.org/jira/browse/BEAM-6960 > Project: Beam > Issue Type: Sub-task > Components: sdk-go, testing >Reporter: Robert Burke >Priority: Minor > > See parent task :https://issues.apache.org/jira/browse/BEAM-6958 > > [Instructions on running against the Java > ULR|https://cwiki.apache.org/confluence/display/BEAM/Usage+Guide] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6962) Add wordcount_xlang to Python post-commit
Chamikara Jayalath created BEAM-6962: Summary: Add wordcount_xlang to Python post-commit Key: BEAM-6962 URL: https://issues.apache.org/jira/browse/BEAM-6962 Project: Beam Issue Type: Improvement Components: runner-flink, sdk-py-core, sdk-py-harness Reporter: Chamikara Jayalath Assignee: Heejong Lee This example works great but we should add it to Python post-commit to prevent bit-rot. [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang.py] Also, please consider adding a step to the pipeline that performs a checksum on the output to make sure that the output is valid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6961) Run Go SDK Postcommits Dataflow as their own jenkins target
Robert Burke created BEAM-6961: -- Summary: Run Go SDK Postcommits Dataflow as their own jenkins target Key: BEAM-6961 URL: https://issues.apache.org/jira/browse/BEAM-6961 Project: Beam Issue Type: Sub-task Components: runner-dataflow, sdk-go, testing Reporter: Robert Burke See parent task: https://issues.apache.org/jira/browse/BEAM-6958 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6960) Run Go PostCommit tests against a ULR
Robert Burke created BEAM-6960: -- Summary: Run Go PostCommit tests against a ULR Key: BEAM-6960 URL: https://issues.apache.org/jira/browse/BEAM-6960 Project: Beam Issue Type: Sub-task Components: sdk-go, testing Reporter: Robert Burke See parent task :https://issues.apache.org/jira/browse/BEAM-6958 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6959) Run Go SDK Post Commit tests against the Flink Runner.
Robert Burke created BEAM-6959: -- Summary: Run Go SDK Post Commit tests against the Flink Runner. Key: BEAM-6959 URL: https://issues.apache.org/jira/browse/BEAM-6959 Project: Beam Issue Type: Sub-task Components: runner-flink, sdk-go, testing Reporter: Robert Burke See parent task BEAM-6958 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4152) Support Go session windowing
[ https://issues.apache.org/jira/browse/BEAM-4152?focusedWorklogId=221406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221406 ] ASF GitHub Bot logged work on BEAM-4152: Author: ASF GitHub Bot Created on: 01/Apr/19 18:40 Start Date: 01/Apr/19 18:40 Worklog Time Spent: 10m Work Description: lostluck commented on issue #8111: [BEAM-4152] Kinds & URNs for Session Windows URL: https://github.com/apache/beam/pull/8111#issuecomment-478696170 @ptomasroos At present the "testing and experimenting" path that works for the Go SDK is based on the direct runner, as @robertwb mentions. Right now the direct runner is a constraint since work hasn't been done to use the Go SDK with the (new?) Python ULR, though I do know it was working with the Java ULR at some point, let alone a free runner such as Flink. Otherwise, the only thing that is tested via the post commits is Dataflow, but that's not great for users experimenting, at the end of the day since it costs money, which isn't ideal for users just wanting to try things. https://issues.apache.org/jira/browse/BEAM-6958 is a task to split out the integration tests in the post commit rubric. Which would be natural starting point for demonstrating how to use the SDK against other runners ensuring that the SDK works on said runners to boot. That should help a touch, once the jenkin's machine apocalypse is over. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221406) Time Spent: 2.5h (was: 2h 20m) > Support Go session windowing > > > Key: BEAM-4152 > URL: https://issues.apache.org/jira/browse/BEAM-4152 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Support session windowing and how to handle merging windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6956) --experiments=worker_threads=100 issue
[ https://issues.apache.org/jira/browse/BEAM-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807065#comment-16807065 ] Jiayi Zhao commented on BEAM-6956: -- it might due to some random stuff, I tried in thread flink again with 100 worker_threads, and this time it works(I also changed --parallelism=1 instead of --parallelism=2 before, not sure if it's related), when I tried separate local flink cluster, each time pipeline stuck at different places, I will try confirm it and put more information about where it stucks > --experiments=worker_threads=100 issue > -- > > Key: BEAM-6956 > URL: https://issues.apache.org/jira/browse/BEAM-6956 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Jiayi Zhao >Priority: Major > > I noticed that without this "–experiments=worker_threads=100", pipeline will > stuck, > The weird thing is, I tried some complex pipeline using the in thread flink > method (./gradlew :beam-runners-flink_2.11-job-server:runShadow) > "–experiments=worker_threads=100" doesn't work, but > "–experiments=worker_threads=1000" works fine > Then I tried the same pipeline using the separate local flink cluster > (./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version > doesn't work, see BEAM-6915) > Neither did "–experiments=worker_threads=1000" or > "–experiments=worker_threads=1" work, pipeline stuck at certain stage > (shows running in flink UI but won't finish forever) > any real fix to that? Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6958) Split Go PostCommit Test results by Runner
Robert Burke created BEAM-6958: -- Summary: Split Go PostCommit Test results by Runner Key: BEAM-6958 URL: https://issues.apache.org/jira/browse/BEAM-6958 Project: Beam Issue Type: Improvement Components: sdk-go, testing Reporter: Robert Burke At present the Go SDK only has a single column filled in on the master branch Post-Commit Tests Status testing rubric, which is unclear, and not-ideal. Right now the jenkin's [Go PostCommit tests|https://github.com/apache/beam/blob/ec3f79214e9ef204fa32b744051a291fe4b61e23/.test-infra/jenkins/job_PostCommit_Go.groovy#L24] trigger the [go integration test task|https://github.com/apache/beam/blob/58a70b273367c22fd7c8562c42bc10a07dbe7156/build.gradle#L178], which only runs the [tests on Dataflow via a shell script|https://github.com/apache/beam/blob/master/sdks/go/test/run_integration_tests.sh#L78]. It doesn't even run the unit tests as per the pre-commit. The end goal for this task is to: * Have the Go SDK column represent the Go SDK Unit Tests as a post commit. * Or better, to avoid pre-commit-run duplication, run the integration tests against the ULR if other runners are doing so. * Have the integration tests run against the Dataflow, be represented in the column. This will set the basis for adding and the integration tests against other portable runners (Flink, Spark, Python ULR, future portable runners...) It looks like there are three bits of work to accomplish here: * Adjust the Gradle tasks/task names to accurately represent what they're running against. * Add the new Jenkins tasks for each of the runners. (The other languages call these ValidateRunner_ tasks), * Add the cool "badges" to the new Jenkins tasks to the Post Commit rubric. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property
[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6954: Description: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize the json, it shouldn't serialize null values at [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655] Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. --- Okay, I have investigated further, and it seems the default value is indeed calculated before the json serialization by calling the mentioned method. The problem is that it returns a RuntimeValueProvider, which gets serialized as null ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279] ), because the isAccessible returns false ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250] + [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220] ) ... and then during deserialization it is found in the jsonOptions at ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] ), so it executes the getValueFromJson which uses an ObjectMapper to create a ValueProvider from a NullNode ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498] ) . The problem here is that according to the JsonDeserializer documentation, the deserialize method isn't executed for null nodes. -> [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)] That requires overriding the getNullValue(DeserializationContext ctxt) method. was: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize
[jira] [Work logged] (BEAM-6914) Use BigQuerySink as default for 2.12.
[ https://issues.apache.org/jira/browse/BEAM-6914?focusedWorklogId=221399&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221399 ] ASF GitHub Bot logged work on BEAM-6914: Author: ASF GitHub Bot Created on: 01/Apr/19 18:16 Start Date: 01/Apr/19 18:16 Worklog Time Spent: 10m Work Description: apilloud commented on issue #8170: [release-2.12.0][BEAM-6914] Reverting behavior of Native BQ sink in Python (#8143) URL: https://github.com/apache/beam/pull/8170#issuecomment-478687364 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221399) Time Spent: 5h 50m (was: 5h 40m) > Use BigQuerySink as default for 2.12. > - > > Key: BEAM-6914 > URL: https://issues.apache.org/jira/browse/BEAM-6914 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Blocker > Fix For: 2.12.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=221398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221398 ] ASF GitHub Bot logged work on BEAM-6876: Author: ASF GitHub Bot Created on: 01/Apr/19 18:15 Start Date: 01/Apr/19 18:15 Worklog Time Spent: 10m Work Description: apilloud commented on issue #8173: [release-2.12] [BEAM-6876] Cleanup user state in portable Flink Runner URL: https://github.com/apache/beam/pull/8173#issuecomment-478687325 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221398) Time Spent: 5.5h (was: 5h 20m) > User state cleanup in portable Flink runner > --- > > Key: BEAM-6876 > URL: https://issues.apache.org/jira/browse/BEAM-6876 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.11.0 >Reporter: Thomas Weise >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Fix For: 2.12.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > State is currently not being cleaned up by the runner. > [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6738) Reduce Combine overhead
[ https://issues.apache.org/jira/browse/BEAM-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke resolved BEAM-6738. Resolution: Fixed Fix Version/s: Not applicable > Reduce Combine overhead > --- > > Key: BEAM-6738 > URL: https://issues.apache.org/jira/browse/BEAM-6738 > Project: Beam > Issue Type: Sub-task > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Minor > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Sibling to BEAM-4726 for ParDos, this should be to add the invoker caching to > the exec/combine.go units, since for example AddInput would be done for every > single element, and for large key spaces, the same applies for the other > CombineFn components. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6745) Cannot run pipeline on Dataflow (GO SDK)
[ https://issues.apache.org/jira/browse/BEAM-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779662#comment-16779662 ] Robert Burke edited comment on BEAM-6745 at 4/1/19 6:10 PM: There's no *dataflow side* documentation of the SDK, or any statements of support that I'm aware of. At present, if you use the Go SDK on Dataflow, you do so at your own risk. While Google does fund some work on Apache Beam, the Go SDK not currently among the set of things that are being maintained at production quality on Dataflow. ergo, things will break, documentation will become stale. It's a quirk of portability that it enables "unofficial" language SDK support on any compatible runner. However, there's nothing guaranteeing this, and there's no effort to maintain anything around it, (as determined by this bug). Official support would include be the version of the dataflow library to provide a compatible, versioned SDK container, without users ever needing to specify anything, and that tests for certain versions of the SDK run successfully against the service and similar. In short, it's a matter of it Can run on dataflow, but not necessarily that folks should use it. I'm hoping to be able to change that, but I can't speak to any timelines at present. Edit: I think the point I'm trying to make here is that the Go SDK tries to support Dataflow, but that Dataflow, as a paid service, doesn't support the Go SDK, as there are certain expectations once money gets involved. was (Author: lostluck): There's no *dataflow side* documentation of the SDK, or any statements of support that I'm aware of. At present, if you use the Go SDK on Dataflow, you do so at your own risk. While Google does fund some work on Apache Beam. It's a quirk of portability that it enables "unofficial" language SDK support on any compatible runner. However, there's nothing guaranteeing this, and there's no effort to maintain anything around it, (as determined by this bug). Official support would include be the version of the dataflow library to provide a compatible, versioned SDK container, without users ever needing to specify anything, and that tests for certain versions of the SDK run successfully against the service and similar. In short, it's a matter of it Can run on dataflow, but not necessarily that folks should use it. I'm hoping to be able to change that, but I can't speak to any timelines at present. Edit: I think the point I'm trying to make here is that the Go SDK tries to support Dataflow, but that Dataflow, as a paid service, doesn't support the Go SDK, as there are certain expectations once money gets involved. > Cannot run pipeline on Dataflow (GO SDK) > > > Key: BEAM-6745 > URL: https://issues.apache.org/jira/browse/BEAM-6745 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Michael Chemani >Priority: Major > > I got > ``` > {{Failed to retrieve staged files: failed to retrieve worker in 3 attempts: > bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; > bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; > bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; > bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want}} > ``` > > When trying to run > ``` > {{dataflow \ --runner dataflow \ --index gs://\{BUCKET}/data_100k.csv \ > --output gs://\{BUCKET}/ \ --project {PROJECT} \ --temp_location > gs://\{BUCKET}/tmp/ \ --staging_location gs://\{BUCKET}/binaries/ \ > --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515}} > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221390 ] ASF GitHub Bot logged work on BEAM-6502: Author: ASF GitHub Bot Created on: 01/Apr/19 18:08 Start Date: 01/Apr/19 18:08 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8146: [BEAM-6502] Re-Remove runner time execution information from public API surface (now including Watch) URL: https://github.com/apache/beam/pull/8146#discussion_r270987570 ## File path: sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackers.java ## @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.fn.splittabledofn; + +import javax.annotation.concurrent.ThreadSafe; +import org.apache.beam.sdk.transforms.splittabledofn.Backlog; +import org.apache.beam.sdk.transforms.splittabledofn.Backlogs; +import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; + +/** Support utilities for interacting with {@link RestrictionTracker RestrictionTrackers}. */ +public class RestrictionTrackers { + + /** Interface allowing a runner to observe the calls to {@link RestrictionTracker#tryClaim}. */ + public interface ClaimObserver { +/** Called when {@link RestrictionTracker#tryClaim} returns true. */ +void onClaimed(PositionT position); + +/** Called when {@link RestrictionTracker#tryClaim} returns false. */ +void onClaimFailed(PositionT position); + } + + /** + * A {@link RestrictionTracker} which forwards all calls to the delegate {@link + * RestrictionTracker}. + */ + @ThreadSafe + private static class RestrictionTrackerObserver + extends RestrictionTracker { +protected final RestrictionTracker delegate; +private final ClaimObserver claimObserver; + +protected RestrictionTrackerObserver( +RestrictionTracker delegate, +ClaimObserver claimObserver) { + this.delegate = delegate; + this.claimObserver = claimObserver; +} + +@Override +public synchronized boolean tryClaim(PositionT position) { + if (delegate.tryClaim(position)) { +claimObserver.onClaimed(position); +return true; + } else { +claimObserver.onClaimFailed(position); +return false; + } +} + +@Override +public synchronized RestrictionT currentRestriction() { + return delegate.currentRestriction(); +} + +@Override +public synchronized RestrictionT checkpoint() { + return delegate.checkpoint(); +} + +@Override +public synchronized void checkDone() throws IllegalStateException { + delegate.checkDone(); +} + } + + /** + * A {@link RestrictionTracker} which forwards all calls to the delegate backlog reporting {@link + * RestrictionTracker}. + */ + @ThreadSafe + private static class RestrictionTrackerObserverWithBacklog + extends RestrictionTrackerObserver implements Backlogs.HasBacklog { + +protected RestrictionTrackerObserverWithBacklog( +RestrictionTracker delegate, +ClaimObserver claimObserver) { + super(delegate, claimObserver); +} + +@Override +public synchronized Backlog getBacklog() { + return ((Backlogs.HasBacklog) delegate).getBacklog(); +} + } + + /** + * A {@link RestrictionTracker} which forwards all calls to the delegate partitioned backlog + * reporting {@link RestrictionTracker}. + */ + @ThreadSafe + private static class RestrictionTrackerObserverWithPartitionedBacklog + extends RestrictionTrackerObserverWithBacklog + implements Backlogs.HasPartitionedBacklog { + +protected RestrictionTrackerObserverWithPartitionedBacklog( +RestrictionTracker delegate, +ClaimObserver claimObserver) { + super(delegate, claimObserver); +} + +@Override +public synchronized byte[] getBacklogPartition() { + return ((Backlogs.HasPartitionedBacklog) delegate).getBacklogPartition(); +} + } + + /** + * Returns a thread safe {@
[jira] [Work logged] (BEAM-6753) Create proto representation for schemas
[ https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221391&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221391 ] ASF GitHub Bot logged work on BEAM-6753: Author: ASF GitHub Bot Created on: 01/Apr/19 18:08 Start Date: 01/Apr/19 18:08 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7952: [BEAM-6753] Set the stage to make schema coder update compatible URL: https://github.com/apache/beam/pull/7952#discussion_r270987733 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java ## @@ -285,15 +289,24 @@ public InstrumentedType prepare(InstrumentedType instrumentedType) { // per-field Coders. static Row decodeDelegate(Schema schema, Coder[] coders, InputStream inputStream) throws IOException { + int fieldCount = VAR_INT_CODER.decode(inputStream); Review comment: Unfortunately no. BitSet.size() returns the total space used, which rounds up to the next word size. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221391) Time Spent: 50m (was: 40m) > Create proto representation for schemas > --- > > Key: BEAM-6753 > URL: https://issues.apache.org/jira/browse/BEAM-6753 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221387&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221387 ] ASF GitHub Bot logged work on BEAM-6502: Author: ASF GitHub Bot Created on: 01/Apr/19 18:04 Start Date: 01/Apr/19 18:04 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8146: [BEAM-6502] Re-Remove runner time execution information from public API surface (now including Watch) URL: https://github.com/apache/beam/pull/8146#discussion_r270979431 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java ## @@ -512,6 +513,7 @@ public GenericClass apply(Long i) { } @Test +@Ignore("https://issues.apache.org/jira/browse/BEAM-6352";) Review comment: I missed it, thought only WatchTest was impacted. Fixed here and elsewhere as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221387) Time Spent: 2h (was: 1h 50m) > SplittableDoFn: Re-Remove runner time execution information from public API > surface > --- > > Key: BEAM-6502 > URL: https://issues.apache.org/jira/browse/BEAM-6502 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Labels: triaged > Time Spent: 2h > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6830) Direct runner throws error "http 404 not found" when reading Bigquery tables from any other region than "US"
[ https://issues.apache.org/jira/browse/BEAM-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aizhamal Nurmamat kyzy reassigned BEAM-6830: Assignee: Pablo Estrada (was: Aizhamal Nurmamat kyzy) > Direct runner throws error "http 404 not found" when reading Bigquery tables > from any other region than "US" > > > Key: BEAM-6830 > URL: https://issues.apache.org/jira/browse/BEAM-6830 > Project: Beam > Issue Type: Bug > Components: beam-events >Reporter: Suraj >Assignee: Pablo Estrada >Priority: Major > > When trying to read bigquery table located in region "asia-southeast-1" using > DirectRunner ,it throws error "http 404 not found" but same code works when > run using DataflowRunner -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=221379&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221379 ] ASF GitHub Bot logged work on BEAM-5723: Author: ASF GitHub Bot Created on: 01/Apr/19 17:52 Start Date: 01/Apr/19 17:52 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8178: [BEAM-5723] Do not use the default shadow closure when building CassandraIO URL: https://github.com/apache/beam/pull/8178#issuecomment-478679044 Hmm, I am not exactly sure why, but the failure is listing essentially every class. It may have to do with not having this block: ``` dependencies { include(dependency(project.library.java.guava)) } ``` Everything else in the `DEFAULT_SHADOW_CLOSURE` is just relocations of Guava. Almost everything causing the relocation error should be OK, and should not be in the shaded jar. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221379) Time Spent: 9h 20m (was: 9h 10m) > CassandraIO is broken because of use of bad relocation of guava > --- > > Key: BEAM-5723 > URL: https://issues.apache.org/jira/browse/BEAM-5723 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Ismaël Mejía >Priority: Major > Fix For: 2.12.0 > > Time Spent: 9h 20m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221378&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221378 ] ASF GitHub Bot logged work on BEAM-6502: Author: ASF GitHub Bot Created on: 01/Apr/19 17:47 Start Date: 01/Apr/19 17:47 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8146: [BEAM-6502] Re-Remove runner time execution information from public API surface (now including Watch) URL: https://github.com/apache/beam/pull/8146#discussion_r270979431 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java ## @@ -512,6 +513,7 @@ public GenericClass apply(Long i) { } @Test +@Ignore("https://issues.apache.org/jira/browse/BEAM-6352";) Review comment: I missed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221378) Time Spent: 1h 50m (was: 1h 40m) > SplittableDoFn: Re-Remove runner time execution information from public API > surface > --- > > Key: BEAM-6502 > URL: https://issues.apache.org/jira/browse/BEAM-6502 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Labels: triaged > Time Spent: 1h 50m > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6957) Spark Portable Runner: Support metrics
Kyle Weaver created BEAM-6957: - Summary: Spark Portable Runner: Support metrics Key: BEAM-6957 URL: https://issues.apache.org/jira/browse/BEAM-6957 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Kyle Weaver Assignee: Kyle Weaver -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=221376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221376 ] ASF GitHub Bot logged work on BEAM-5723: Author: ASF GitHub Bot Created on: 01/Apr/19 17:45 Start Date: 01/Apr/19 17:45 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8178: [BEAM-5723] Do not use the default shadow closure when building CassandraIO URL: https://github.com/apache/beam/pull/8178#issuecomment-478676487 Ah, no it is just that excludes are not additive to the default set. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221376) Time Spent: 9h 10m (was: 9h) > CassandraIO is broken because of use of bad relocation of guava > --- > > Key: BEAM-5723 > URL: https://issues.apache.org/jira/browse/BEAM-5723 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Ismaël Mejía >Priority: Major > Fix For: 2.12.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221375 ] ASF GitHub Bot logged work on BEAM-6953: Author: ASF GitHub Bot Created on: 01/Apr/19 17:44 Start Date: 01/Apr/19 17:44 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #8188: [BEAM-6953] Make bq constants args URL: https://github.com/apache/beam/pull/8188#discussion_r270978470 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java ## @@ -53,4 +53,16 @@ Integer getInsertBundleParallelism(); void setInsertBundleParallelism(Integer parallelism); + + @Description("The number of buckets used per table when doing streaming inserts to BigQuery.") + @Default.Integer(50) + Integer getNumStreamingBuckets(); Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221375) Time Spent: 0.5h (was: 20m) > BigQueryIO has constants that should be PipelineOptions > --- > > Key: BEAM-6953 > URL: https://issues.apache.org/jira/browse/BEAM-6953 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Reuven Lax >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface
[ https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221374&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221374 ] ASF GitHub Bot logged work on BEAM-6502: Author: ASF GitHub Bot Created on: 01/Apr/19 17:44 Start Date: 01/Apr/19 17:44 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8146: [BEAM-6502] Re-Remove runner time execution information from public API surface (now including Watch) URL: https://github.com/apache/beam/pull/8146#discussion_r270978398 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java ## @@ -228,6 +253,23 @@ Instant getWatermark() { } return res; } + + @Override + public boolean equals(Object o) { Review comment: There was no difference, swapped to use autovalue. Its just a remnant of the old way this was written. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221374) Time Spent: 1h 40m (was: 1.5h) > SplittableDoFn: Re-Remove runner time execution information from public API > surface > --- > > Key: BEAM-6502 > URL: https://issues.apache.org/jira/browse/BEAM-6502 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Labels: triaged > Time Spent: 1h 40m > Remaining Estimate: 0h > > Move the setting of "claim observers" within RestrictionTracker to another > location to clean up the RestrictionTracker interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=221373&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221373 ] ASF GitHub Bot logged work on BEAM-5723: Author: ASF GitHub Bot Created on: 01/Apr/19 17:43 Start Date: 01/Apr/19 17:43 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8178: [BEAM-5723] Do not use the default shadow closure when building CassandraIO URL: https://github.com/apache/beam/pull/8178#issuecomment-478675846 This actually exposed a huge bug in our build, if I understand it. It isn't just the Guava classes used here, but all of the ones relocated by any other module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221373) Time Spent: 9h (was: 8h 50m) > CassandraIO is broken because of use of bad relocation of guava > --- > > Key: BEAM-5723 > URL: https://issues.apache.org/jira/browse/BEAM-5723 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Ismaël Mejía >Priority: Major > Fix For: 2.12.0 > > Time Spent: 9h > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6956) --experiments=worker_threads=100 issue
Jiayi Zhao created BEAM-6956: Summary: --experiments=worker_threads=100 issue Key: BEAM-6956 URL: https://issues.apache.org/jira/browse/BEAM-6956 Project: Beam Issue Type: New Feature Components: runner-flink Reporter: Jiayi Zhao I noticed that without this "–experiments=worker_threads=100", pipeline will stuck, The weird thing is, I tried some complex pipeline using the in thread flink method (./gradlew :beam-runners-flink_2.11-job-server:runShadow) "–experiments=worker_threads=100" doesn't work, but "–experiments=worker_threads=1000" works fine Then I tried the same pipeline using the separate local flink cluster (./gradlew :beam-runners-flink_2.11-job-server:runShadow -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version doesn't work, see BEAM-6915) Neither did "–experiments=worker_threads=1000" or "–experiments=worker_threads=1" work, pipeline stuck at certain stage (shows running in flink UI but won't finish forever) any real fix to that? Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6865) Refactor common portable runner infrastructure for better reuse
[ https://issues.apache.org/jira/browse/BEAM-6865?focusedWorklogId=221370&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221370 ] ASF GitHub Bot logged work on BEAM-6865: Author: ASF GitHub Bot Created on: 01/Apr/19 17:35 Start Date: 01/Apr/19 17:35 Worklog Time Spent: 10m Work Description: ibzib commented on issue #8176: [BEAM-6865] Move MetricsApi updates from flink.metrics to core.metrics URL: https://github.com/apache/beam/pull/8176#issuecomment-478673312 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221370) Time Spent: 2h (was: 1h 50m) > Refactor common portable runner infrastructure for better reuse > --- > > Key: BEAM-6865 > URL: https://issues.apache.org/jira/browse/BEAM-6865 > Project: Beam > Issue Type: Improvement > Components: runner-flink, runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > The Flink runner is currently Beam's most mature portable OSS runner. Much of > the Flink portable runner's implementation details are not unique to Flink, > and yet are confined to the Flink runner code. In order to ease development > on other portable runners such as the Spark runner, this reusable code should > be moved into some common location. > I've set this up to track my progress on these ongoing improvements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6915) Issue when run pipeline on a separate Flink cluster
[ https://issues.apache.org/jira/browse/BEAM-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807015#comment-16807015 ] Jiayi Zhao commented on BEAM-6915: -- I git clone the head beam, Flink 1.7.2 doesn't work (above error), Flink 1.5.4 works except pipeline.wait_until_finish() call has a exception, Flink 1.5.6 works fine for a simple word count example. > Issue when run pipeline on a separate Flink cluster > --- > > Key: BEAM-6915 > URL: https://issues.apache.org/jira/browse/BEAM-6915 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Jiayi Zhao >Priority: Major > > First I tried a simple pipeline on the JobService endpoint created by: > ./gradlew :beam-runners-flink_2.11-job-server:runShadow > it works, then I tried the following examples: > _To run on a separate [Flink > cluster|https://ci.apache.org/projects/flink/flink-docs-release-1.5/quickstart/setup_quickstart.html]:_ > _1. Start a Flink cluster which exposes the Rest interface on > {{localhost:8081}} by default._ > _2. Start JobService with Flink Rest endpoint: {{./gradlew > :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081}}._ > _3. Submit the pipeline as above._ > when I run the pipeline in another console, the jobService console shows > following errors, any ideas? > > _$ ./gradlew :beam-runners-flink_2.11-job-server:runShadow > -PflinkMasterUrl=localhost:8081_ > _Configuration on demand is an incubating feature._ > _> Task :beam-runners-flink_2.11-job-server:runShadow_ > _Listening for transport dt_socket at address: 5005_ > _[main] INFO > org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - > ArtifactStagingService started on localhost:8098_ > _[main] INFO > org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - Java > ExpansionService started on localhost:8097_ > _[main] INFO > org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - > JobService started on localhost:8099_ > _[grpc-default-executor-0] ERROR > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - > Encountered Unexpected Exception for Invocation > job_e3ca1015-d683-47df-beb5-104ccbb5a457_ > _org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: NOT_FOUND_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534)_ > _at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341)_ > _at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262)_ > _at > org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:770)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)_ > _at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)_ > _at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)_ > _at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)_ > _at java.lang.Thread.run(Thread.java:748)_ > _[grpc-default-executor-0] INFO org.apache.beam.runners.flink.FlinkJobInvoker > - Invoking job > BeamApp-jyzhao-0326181339-95e448eb_3b2a57a8-c8bc-463e-8be6-47890deb48b4_ > _[grpc-default-executor-0] INFO > org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation - Starting > job invocation > BeamApp-jyzhao-0326181339-95e448eb_3b2a57a8-c8bc-463e-8be6-47890deb48b4_ > _[flink-runner-job-invoker] INFO > org.apache.beam.runners.flink.FlinkPipelineRunner - Translating pipeline to > Flink program._ > _[flink-runner-job-invoker] INFO > org.apache.beam.r
[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221365 ] ASF GitHub Bot logged work on BEAM-6953: Author: ASF GitHub Bot Created on: 01/Apr/19 17:32 Start Date: 01/Apr/19 17:32 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #8188: [BEAM-6953] Make bq constants args URL: https://github.com/apache/beam/pull/8188#discussion_r270974111 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java ## @@ -53,4 +53,16 @@ Integer getInsertBundleParallelism(); void setInsertBundleParallelism(Integer parallelism); + + @Description("The number of buckets used per table when doing streaming inserts to BigQuery.") + @Default.Integer(50) + Integer getNumStreamingBuckets(); Review comment: I think it is failing because you have to write the setter boilerplate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221365) Time Spent: 20m (was: 10m) > BigQueryIO has constants that should be PipelineOptions > --- > > Key: BEAM-6953 > URL: https://issues.apache.org/jira/browse/BEAM-6953 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Reuven Lax >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6952) concatenated compressed files bug with python sdk
[ https://issues.apache.org/jira/browse/BEAM-6952?focusedWorklogId=221364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221364 ] ASF GitHub Bot logged work on BEAM-6952: Author: ASF GitHub Bot Created on: 01/Apr/19 17:32 Start Date: 01/Apr/19 17:32 Worklog Time Spent: 10m Work Description: dlesco commented on issue #8187: [BEAM-6952] concatenated compressed files bug with python sdk URL: https://github.com/apache/beam/pull/8187#issuecomment-478672026 Updated branch with commit to replace in the unit test the use of xrange with six.moves.range, so that the unit test will run under Python 3. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221364) Time Spent: 20m (was: 10m) > concatenated compressed files bug with python sdk > - > > Key: BEAM-6952 > URL: https://issues.apache.org/jira/browse/BEAM-6952 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Daniel Lescohier >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > The Python apache_beam.io.filesystem module has a bug handling concatenated > compressed files. > The PR I will create has two commits: > # a new unit test that shows the problem > # a fix to the problem. > The unit test is added to the apache_beam.io.filesystem_test module. It was > added to this module because the test: > apache_beam.io.textio_test.test_read_gzip_concat does not encounter the > problem in the Beam 2.11 and earlier code base because the test data is too > small: the data is smaller than read_size, so it goes through logic in the > code that avoids the problem in the code. So, this test sets read_size > smaller and test data bigger, in order to encounter the problem. It would be > difficult to test in the textio_test module, because you'd need very large > test data because default read_size is 16MiB, and the ReadFromText interface > does not allow you to modify the read_size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property
[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6954: Description: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize the json, it shouldn't serialize null values at [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655] Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. --- Okay, I have investigated further, and it seems the default value is indeed calculated before the json serialization by calling the mentioned method. The problem is that it returns a RuntimeValueProvider, which gets serialized as null ( https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279 ), because the isAccessible returns false ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250] + [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220] ) was: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize the json, it shouldn't serialize null values at https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655 Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. > @Default not called if the options json has null value for a property > - > > Key: BEAM-6954 > URL: https://issues.apache.org/jira/browse/BEAM-6954 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority:
[jira] [Commented] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807009#comment-16807009 ] Michael Luckey commented on BEAM-4046: -- Thanks, Luke, for pointing that out. Indeed, I have not yet tested deeply, so it might happen that those issues are difficult to solve. Regarding the backwards compatibility, the 'best' I came up with, apart from having own gradle distribution with hacked task resolution or rewriting Gradle-wrapper.jar is to hook into parameter resolution of gradlew resp gradlew.bat. This would imply to (temporarily!) replace the current scripts by some enhanced implementation which would map any param starting with ':beam-' to the corresponding target project. This could possibly work, but feels *really* hacky :( > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6872) Add hook for user-defined JVM initialization in workers
[ https://issues.apache.org/jira/browse/BEAM-6872?focusedWorklogId=221363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221363 ] ASF GitHub Bot logged work on BEAM-6872: Author: ASF GitHub Bot Created on: 01/Apr/19 17:14 Start Date: 01/Apr/19 17:14 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8104: [BEAM-6872] Add hook for user-defined JVM initialization in workers URL: https://github.com/apache/beam/pull/8104#discussion_r270967393 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelper.java ## @@ -95,6 +97,17 @@ public static void configureLogging(DataflowWorkerHarnessOptions pipelineOptions DataflowWorkerLoggingInitializer.configure(pipelineOptions); } + public static void runUserDefinedInitialization() { +ServiceLoader loader = Review comment: Sounds worthwhile. Since it has been copied a few times now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221363) Time Spent: 1h 10m (was: 1h) > Add hook for user-defined JVM initialization in workers > --- > > Key: BEAM-6872 > URL: https://issues.apache.org/jira/browse/BEAM-6872 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Expose an interface for users to run some one-time initialization code when a > worker starts up. > This can be useful for things like overriding the Default ZoneRulesProvider, > or setting up custom SSL providers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806991#comment-16806991 ] Luke Cwik edited comment on BEAM-4046 at 4/1/19 5:06 PM: - Michael, what your suggesting makes sense to me. We also ran into two other problems: * we had more then one Gradle project with the same directory name even though they were under a different parent folder (I think it was "core") and that was leading to some strange build time behavior. * we use the project names during [javadoc generation|https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle]. was (Author: lcwik): Michael, what your suggesting makes sense to me. We also ran into two other problems: During the gradle migration, we used to have something like: * we had more then one Gradle project with the same directory name even though they were under a different parent folder (I think it was "core") and that was leading to some strange build time behavior. * we use the project names during [javadoc generation|https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle]. > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4046) Decouple gradle project names and maven artifact ids
[ https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806991#comment-16806991 ] Luke Cwik commented on BEAM-4046: - Michael, what your suggesting makes sense to me. We also ran into two other problems: During the gradle migration, we used to have something like: * we had more then one Gradle project with the same directory name even though they were under a different parent folder (I think it was "core") and that was leading to some strange build time behavior. * we use the project names during [javadoc generation|https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle]. > Decouple gradle project names and maven artifact ids > > > Key: BEAM-4046 > URL: https://issues.apache.org/jira/browse/BEAM-4046 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Kenneth Knowles >Priority: Major > > In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. > It is clumsy and requires a hacky settings.gradle that is not idiomatic. > In our second draft, we changed them to names that work well with Gradle, > like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky. > In our third draft, we regressed to the first draft to get the Maven artifact > ids right. > These should be able to be decoupled. It seems there are many StackOverflow > questions on the subject. > Since it is unidiomatic and a poor user experience, if it does turn out to be > mandatory then it needs to be documented inline everywhere - the > settings.gradle should say why it is so bizarre, and each build.gradle should > indicate what its project id is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6955) Support Dataflow --sdk_location with modified version number
[ https://issues.apache.org/jira/browse/BEAM-6955?focusedWorklogId=221356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221356 ] ASF GitHub Bot logged work on BEAM-6955: Author: ASF GitHub Bot Created on: 01/Apr/19 17:00 Start Date: 01/Apr/19 17:00 Worklog Time Spent: 10m Work Description: dlesco commented on pull request #8189: [BEAM-6955] Support Dataflow --sdk_location with modified version number URL: https://github.com/apache/beam/pull/8189 Determine the version tag to use for the Google Container Registry, for the service image versions to use on the Dataflow worker nodes. Users of Dataflow may be using a locally-modified version of Apache Beam, which they submit to Dataflow with the --sdk_location option. Those users would most likely modify the version number of Apache Beam, so they can distinguish it from the public distribution of Apache Beam. However, the remote nodes in Dataflow still need to bootsrap the worker service with a Docker image that a version tag exists for. The most appropriate way for system integrators to modify the Apache Beam version number would be to add a Local Version Identifier: https://www.python.org/dev/peps/pep-0440/#local-version-identifiers If people only use Local Version Identifiers, then we could use the "public" attribute of the pkg_resources version object. If people instead use a post-release version identifier: https://www.python.org/dev/peps/pep-0440/#post-releases then only the "base_version" attribute would work both of these version number changes. Since Dataflow documentation does not specify how to modify version numbers, I am choosing to use "base_version" attribute. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https:/
[jira] [Created] (BEAM-6955) Support Dataflow --sdk_location with modified version number
Daniel Lescohier created BEAM-6955: -- Summary: Support Dataflow --sdk_location with modified version number Key: BEAM-6955 URL: https://issues.apache.org/jira/browse/BEAM-6955 Project: Beam Issue Type: Bug Components: runner-dataflow Affects Versions: 2.11.0 Reporter: Daniel Lescohier Support Dataflow --sdk_location with modified version number Determine the version tag to use for the Google Container Registry, for the service image versions to use on the Dataflow worker nodes. Users of Dataflow may be using a locally-modified version of Apache Beam, which they submit to Dataflow with the --sdk_location option. Those users would most likely modify the version number of Apache Beam, so they can distinguish it from the public distribution of Apache Beam. However, the remote nodes in Dataflow still need to bootsrap the worker service with a Docker image that a version tag exists for. The most appropriate way for system integrators to modify the Apache Beam version number would be to add a Local Version Identifier: https://www.python.org/dev/peps/pep-0440/#local-version-identifiers If people only use Local Version Identifiers, then we could use the "public" attribute of the pkg_resources version object. If people instead use a post-release version identifier: https://www.python.org/dev/peps/pep-0440/#post-releases then only the "base_version" attribute would work both of these version number changes. Since Dataflow documentation does not specify how to modify version numbers, I am choosing to use "base_version" attribute. Will shortly submit a PR with the change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property
[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6954: Component/s: (was: runner-dataflow) > @Default not called if the options json has null value for a property > - > > Key: BEAM-6954 > URL: https://issues.apache.org/jira/browse/BEAM-6954 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority: Major > > When a pipeline options get deserialized from a json with > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] > it creates a map, where properties present in the json - even if with a null > value - will be added to the map. > So we can have String->NullNode mappings. > When you create a ProxyInvocationHandler with this Map ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] > ) this map will be the backing jsonOptions map. > Later on when a getter is called on the options it will reach this code: > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] > > Then the containsKey will return true, even for NullNodes. So we won't > execute the getDefault() method hence not calculating the default value. > > I'm not sure about the expected behaviour, but either: > - the containsKey check should be expanded with an !isNull check > OR > - when we serialize the json, it shouldn't serialize null values at > https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655 > > Instinctively I would have expected the @Default.* annotations producing > values every single time, when the value is null - so a property with a > @Default.* annotation can't be null - but I was unable to find anything > explicit regarding this in the documentation. So I'm not sure which of the > suggested change has to be made. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property
[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6954: Description: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - when we serialize the json, it shouldn't serialize null values at https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655 Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. was: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - the dataflow runner have to be modified so it doesn't persist null values at options here: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222] Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. > @Default not called if the options json has null value for a property > - > > Key: BEAM-6954 > URL: https://issues.apache.org/jira/browse/BEAM-6954 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority: Major > > When a pipeline options get deserialized from a json with > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] > it creates a map, where properties present in the json - even if with a null > value - will be added to the map. > So we can have String->NullNode mappings. > When you create a ProxyInvocationHandler with this Map ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] > ) this map will be the backing jsonOptions map. > Later on when a getter is
[jira] [Work logged] (BEAM-5959) Add Cloud KMS support to GCS copies
[ https://issues.apache.org/jira/browse/BEAM-5959?focusedWorklogId=221354&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221354 ] ASF GitHub Bot logged work on BEAM-5959: Author: ASF GitHub Bot Created on: 01/Apr/19 16:54 Start Date: 01/Apr/19 16:54 Worklog Time Spent: 10m Work Description: udim commented on pull request #7744: [BEAM-5959] KMS support for BigQuery URL: https://github.com/apache/beam/pull/7744#discussion_r270960080 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1534,6 +1550,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { return toBuilder().setIgnoreUnknownValues(true).build(); } +Write withKmsKey(String kmsKey) { Review comment: @mayansalama I fixed this here: https://github.com/apache/beam/pull/8145. I should have updated you. @kennknowles KMS support is too new to be in 2.7. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221354) Time Spent: 31.5h (was: 31h 20m) > Add Cloud KMS support to GCS copies > --- > > Key: BEAM-5959 > URL: https://issues.apache.org/jira/browse/BEAM-5959 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 31.5h > Remaining Estimate: 0h > > Beam SDK currently uses the CopyTo GCS API call, which doesn't support > copying objects that Customer Managed Encryption Keys (CMEK). > CMEKs are managed in Cloud KMS. > Items (for Java and Python SDKs): > - Update clients to versions that support KMS keys. > - Change copyTo API calls to use rewriteTo (Python - directly, Java - > possibly convert copyTo API call to use client library) > - Add unit tests. > - Add basic tests (DirectRunner and GCS buckets with CMEK). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property
[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6954: Description: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - the dataflow runner have to be modified so it doesn't persist null values at options here: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222] Instinctively I would have expected the @Default.* annotations producing values every single time, when the value is null - so a property with a @Default.* annotation can't be null - but I was unable to find anything explicit regarding this in the documentation. So I'm not sure which of the suggested change has to be made. was: When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - the dataflow runner have to be modified so it doesn't persist null values at options here: https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222 > @Default not called if the options json has null value for a property > - > > Key: BEAM-6954 > URL: https://issues.apache.org/jira/browse/BEAM-6954 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority: Major > > When a pipeline options get deserialized from a json with > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] > it creates a map, where properties present in the json - even if with a null > value - will be added to the map. > So we can have String->NullNode mappings. > When you create a ProxyInvocationHandler with this Map ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] > ) this map will be the backing jsonOptions map. > Later on when a getter is called on the options it will reach this code: > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] > > Then the containsKey will return true, even for NullN
[jira] [Assigned] (BEAM-3489) Expose the message id of received messages within PubsubMessage
[ https://issues.apache.org/jira/browse/BEAM-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed El.Hussaini reassigned BEAM-3489: --- Assignee: Ahmed El.Hussaini > Expose the message id of received messages within PubsubMessage > --- > > Key: BEAM-3489 > URL: https://issues.apache.org/jira/browse/BEAM-3489 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Luke Cwik >Assignee: Ahmed El.Hussaini >Priority: Minor > Labels: newbie, starter > > This task is about passing forward the message id from the pubsub proto to > the java PubsubMessage. > Add a message id field to PubsubMessage. > Update the coder for PubsubMessage to encode the message id. > Update the translation from the Pubsub proto message to the Dataflow message: > https://github.com/apache/beam/blob/2e275264b21db45787833502e5e42907b05e28b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L976 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property
[ https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balázs Németh updated BEAM-6954: Component/s: runner-dataflow > @Default not called if the options json has null value for a property > - > > Key: BEAM-6954 > URL: https://issues.apache.org/jira/browse/BEAM-6954 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-java-core >Affects Versions: 2.11.0 >Reporter: Balázs Németh >Priority: Major > > When a pipeline options get deserialized from a json with > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] > it creates a map, where properties present in the json - even if with a null > value - will be added to the map. > So we can have String->NullNode mappings. > When you create a ProxyInvocationHandler with this Map ( > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] > ) this map will be the backing jsonOptions map. > Later on when a getter is called on the options it will reach this code: > [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] > > Then the containsKey will return true, even for NullNodes. So we won't > execute the getDefault() method hence not calculating the default value. > > I'm not sure about the expected behaviour, but either: > - the containsKey check should be expanded with an !isNull check > OR > - the dataflow runner have to be modified so it doesn't persist null values > at options here: > https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6954) @Default not called if the options json has null value for a property
Balázs Németh created BEAM-6954: --- Summary: @Default not called if the options json has null value for a property Key: BEAM-6954 URL: https://issues.apache.org/jira/browse/BEAM-6954 Project: Beam Issue Type: Bug Components: sdk-java-core Affects Versions: 2.11.0 Reporter: Balázs Németh When a pipeline options get deserialized from a json with [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760] it creates a map, where properties present in the json - even if with a null value - will be added to the map. So we can have String->NullNode mappings. When you create a ProxyInvocationHandler with this Map ( [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125] ) this map will be the backing jsonOptions map. Later on when a getter is called on the options it will reach this code: [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158] Then the containsKey will return true, even for NullNodes. So we won't execute the getDefault() method hence not calculating the default value. I'm not sure about the expected behaviour, but either: - the containsKey check should be expanded with an !isNull check OR - the dataflow runner have to be modified so it doesn't persist null values at options here: https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221347&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221347 ] ASF GitHub Bot logged work on BEAM-6953: Author: ASF GitHub Bot Created on: 01/Apr/19 16:37 Start Date: 01/Apr/19 16:37 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #8188: [BEAM-6953] Make bq constants args URL: https://github.com/apache/beam/pull/8188 Make several hard-coded constants pipeline options. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 221347) Time Spent: 10m Remaining Estimate: 0h > BigQueryIO has constants that should be PipelineOptions > --- > > Key: BEAM-6953 > URL: https://issues.apache.org/jira/browse/BEAM-6953 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Reuven Lax >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions
Reuven Lax created BEAM-6953: Summary: BigQueryIO has constants that should be PipelineOptions Key: BEAM-6953 URL: https://issues.apache.org/jira/browse/BEAM-6953 Project: Beam Issue Type: Bug Components: io-java-gcp Reporter: Reuven Lax -- This message was sent by Atlassian JIRA (v7.6.3#76005)