[jira] [Work logged] (BEAM-6895) Upgrade joda-time to 2.10.1

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6895?focusedWorklogId=221601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221601
 ]

ASF GitHub Bot logged work on BEAM-6895:


Author: ASF GitHub Bot
Created on: 02/Apr/19 03:36
Start Date: 02/Apr/19 03:36
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8125: [BEAM-6895] 
Update joda-time to 2.10.1
URL: https://github.com/apache/beam/pull/8125#issuecomment-478831524
 
 
   Looks like it is going to be just fine. The Dataflow tests hit a quota issue 
but that should be resolved.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221601)
Time Spent: 2h 40m  (was: 2.5h)

> Upgrade joda-time to 2.10.1
> ---
>
> Key: BEAM-6895
> URL: https://issues.apache.org/jira/browse/BEAM-6895
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: David Moravek
>Assignee: David Moravek
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Upgrade joda-time dependency from 2.4 to 2.10.1.
> We have reports that 2.4 version (from 2014), causes incompatibilities with 
> Spark's bundled version (2.9).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6895) Upgrade joda-time to 2.10.1

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6895?focusedWorklogId=221600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221600
 ]

ASF GitHub Bot logged work on BEAM-6895:


Author: ASF GitHub Bot
Created on: 02/Apr/19 03:35
Start Date: 02/Apr/19 03:35
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8125: [BEAM-6895] 
Update joda-time to 2.10.1
URL: https://github.com/apache/beam/pull/8125#issuecomment-478831417
 
 
   run dataflow validatesrunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221600)
Time Spent: 2.5h  (was: 2h 20m)

> Upgrade joda-time to 2.10.1
> ---
>
> Key: BEAM-6895
> URL: https://issues.apache.org/jira/browse/BEAM-6895
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: David Moravek
>Assignee: David Moravek
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Upgrade joda-time dependency from 2.4 to 2.10.1.
> We have reports that 2.4 version (from 2014), causes incompatibilities with 
> Spark's bundled version (2.9).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=221559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221559
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 02/Apr/19 01:21
Start Date: 02/Apr/19 01:21
Worklog Time Spent: 10m 
  Work Description: adude3141 commented on pull request #8194: [DO NOT 
MERGE] [BEAM-4046] decouple gradle project names and maven artifact ids
URL: https://github.com/apache/beam/pull/8194
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   

   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
 

[jira] [Work logged] (BEAM-6948) Remove superseded ant.xml for javadoc creation

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6948?focusedWorklogId=221553&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221553
 ]

ASF GitHub Bot logged work on BEAM-6948:


Author: ASF GitHub Bot
Created on: 02/Apr/19 00:36
Start Date: 02/Apr/19 00:36
Worklog Time Spent: 10m 
  Work Description: adude3141 commented on issue #8181: [BEAM-6948] remove 
superseded ant.xml for javadoc creation
URL: https://github.com/apache/beam/pull/8181#issuecomment-478797265
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221553)
Time Spent: 40m  (was: 0.5h)

> Remove superseded ant.xml for javadoc creation
> --
>
> Key: BEAM-6948
> URL: https://issues.apache.org/jira/browse/BEAM-6948
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Michael Luckey
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Javadoc previously was created by maven delegating to ant. After migration to 
> Gradle, the corresponding ant build file is superseded by a build.gradle [1], 
> [2]
> The pom.xml was deleted later after gradle migration was accepted. ant.xml 
> seems to be forgotten.
> [1] https://issues.apache.org/jira/browse/BEAM-4108
> [2] https://github.com/apache/beam/pull/5121



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-04-01 Thread Michael Luckey (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807313#comment-16807313
 ] 

Michael Luckey commented on BEAM-4046:
--

As Luke pointed out, we hit a confirmed gradle issue with duplicate project 
names here, see https://github.com/gradle/gradle/issues/847

{noformat}
container
dataflow
examples
fn-execution
java
jdbc
job-server
job-server-container
py3
{noformat}
Maybe one of the proposed workarounds is viable.

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-04-01 Thread Michael Luckey (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Luckey reassigned BEAM-4046:


Assignee: Michael Luckey

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6919) [beam_Release_NightlySnapshot] Cannot publish artifacts to Apache Maven repo

2019-04-01 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807293#comment-16807293
 ] 

yifan zou commented on BEAM-6919:
-

Nothing weird on our side. File 
https://issues.apache.org/jira/browse/INFRA-18148 to the ASF Infra, also sent 
email to [reposit...@apache.org|mailto:reposit...@apache.org] for further 
investigation.

> [beam_Release_NightlySnapshot] Cannot publish artifacts to Apache Maven repo
> 
>
> Key: BEAM-6919
> URL: https://issues.apache.org/jira/browse/BEAM-6919
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Daniel Oliveira
>Assignee: yifan zou
>Priority: Blocker
>  Labels: currently-failing, triaged
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_Release_NightlySnapshot/384/]
>  * [Gradle Build Scan|https://scans.gradle.com/s/osuzhgqygga2o]
> Initial investigation:
> Nightly snapshots are failing in beam:publish due to being unable to publish 
> artifacts to Apache's Maven repo:
> {noformat}
> 12:30:57 * What went wrong:
> 12:30:57 Execution failed for task 
> ':beam-examples-java:publishMavenJavaPublicationToMavenRepository'.
> 12:30:57 > Failed to publish publication 'mavenJava' to repository 'maven'
> 12:30:57> Could not write to resource 
> 'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-examples-java/2.12.0-SNAPSHOT/beam-examples-java-2.12.0-20190326.191838-10.jar'.
> 12:30:57   > Could not PUT 
> 'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-examples-java/2.12.0-SNAPSHOT/beam-examples-java-2.12.0-20190326.191838-10.jar'.
>  Received status code 401 from server: Unauthorized
> {noformat}
> This happens for many modules, not just beam-examples-java as in the example 
> above.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=221544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221544
 ]

ASF GitHub Bot logged work on BEAM-5723:


Author: ASF GitHub Bot
Created on: 01/Apr/19 23:38
Start Date: 01/Apr/19 23:38
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8178: 
[BEAM-5723] Do not use the default shadow closure when building CassandraIO
URL: https://github.com/apache/beam/pull/8178#discussion_r271088692
 
 

 ##
 File path: sdks/java/io/cassandra/build.gradle
 ##
 @@ -18,11 +18,23 @@
 
 plugins { id 'org.apache.beam.module' }
 // Do not relocate guava to avoid issues with Cassandra's version.
-applyJavaNature(shadowClosure: DEFAULT_SHADOW_CLOSURE << {
-  dependencies {
-exclude(dependency(project.library.java.guava))
-  }
-})
+applyJavaNature(
+shadowClosure: {
+  dependencies {
+// is default, but when omitted the default action is to include all 
runtime deps
+include(dependency(project.library.java.guava))
+
+// hack: now exclude the only thing that was included
+exclude(dependency(project.library.java.guava))
+
 
 Review comment:
   The default is the `runtime` configuration and it has to be overridden. We 
never did that because we use the dependency filters.
   
   Something that I think should work instead is this:
   
   ```groovy
   shadowJar {
 configurations = [project.configurations.compile]
   }
   ```
   
   I tried the above and it did not work. I think maybe it is the order of the 
concatenation of the `shadowClosure` with the parts of the configuration that 
are hardcoded into the main plugin. I am not sure. And I'm not sure if there is 
a var I can just set `dependencies = []`. Basically we want to turn off the 
shadow plugin here but I don't want to make such a big change.
   
   I manually confirmed the contents of the jar.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221544)
Time Spent: 9.5h  (was: 9h 20m)

> CassandraIO is broken because of use of bad relocation of guava
> ---
>
> Key: BEAM-5723
> URL: https://issues.apache.org/jira/browse/BEAM-5723
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 2.12.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6914) Use BigQuerySink as default for 2.12.

2019-04-01 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada closed BEAM-6914.
---
Resolution: Fixed

> Use BigQuerySink as default for 2.12.
> -
>
> Key: BEAM-6914
> URL: https://issues.apache.org/jira/browse/BEAM-6914
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6914) Use BigQuerySink as default for 2.12.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6914?focusedWorklogId=221541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221541
 ]

ASF GitHub Bot logged work on BEAM-6914:


Author: ASF GitHub Bot
Created on: 01/Apr/19 23:28
Start Date: 01/Apr/19 23:28
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8170: 
[release-2.12.0][BEAM-6914] Reverting behavior of Native BQ sink in Python 
(#8143)
URL: https://github.com/apache/beam/pull/8170
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221541)
Time Spent: 6h 10m  (was: 6h)

> Use BigQuerySink as default for 2.12.
> -
>
> Key: BEAM-6914
> URL: https://issues.apache.org/jira/browse/BEAM-6914
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6914) Use BigQuerySink as default for 2.12.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6914?focusedWorklogId=221542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221542
 ]

ASF GitHub Bot logged work on BEAM-6914:


Author: ASF GitHub Bot
Created on: 01/Apr/19 23:28
Start Date: 01/Apr/19 23:28
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8170: 
[release-2.12.0][BEAM-6914] Reverting behavior of Native BQ sink in Python 
(#8143)
URL: https://github.com/apache/beam/pull/8170#issuecomment-478784520
 
 
   Thanks all : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221542)
Time Spent: 6h 20m  (was: 6h 10m)

> Use BigQuerySink as default for 2.12.
> -
>
> Key: BEAM-6914
> URL: https://issues.apache.org/jira/browse/BEAM-6914
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6956) --experiments=worker_threads=100 issue

2019-04-01 Thread Jiayi Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Zhao resolved BEAM-6956.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> --experiments=worker_threads=100 issue
> --
>
> Key: BEAM-6956
> URL: https://issues.apache.org/jira/browse/BEAM-6956
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Jiayi Zhao
>Priority: Major
> Fix For: Not applicable
>
>
> I noticed that without this "–experiments=worker_threads=100", pipeline will 
> stuck,
> The weird thing is, I tried some complex pipeline using the in thread flink 
> method (./gradlew :beam-runners-flink_2.11-job-server:runShadow)
> "–experiments=worker_threads=100" doesn't work, but 
> "–experiments=worker_threads=1000" works fine
> Then I tried the same pipeline using the separate local flink cluster 
> (./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version 
> doesn't work, see BEAM-6915)
> Neither did "–experiments=worker_threads=1000" or 
> "–experiments=worker_threads=1" work, pipeline stuck at certain stage 
> (shows running in flink UI but won't finish forever)
> any real fix to that? Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6956) --experiments=worker_threads=100 issue

2019-04-01 Thread Jiayi Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807249#comment-16807249
 ] 

Jiayi Zhao commented on BEAM-6956:
--

*set the following parameter to my flink in conf/flink-conf.yaml file solved my 
issue*

*taskmanager.heap.mb: 10240*

> --experiments=worker_threads=100 issue
> --
>
> Key: BEAM-6956
> URL: https://issues.apache.org/jira/browse/BEAM-6956
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Jiayi Zhao
>Priority: Major
>
> I noticed that without this "–experiments=worker_threads=100", pipeline will 
> stuck,
> The weird thing is, I tried some complex pipeline using the in thread flink 
> method (./gradlew :beam-runners-flink_2.11-job-server:runShadow)
> "–experiments=worker_threads=100" doesn't work, but 
> "–experiments=worker_threads=1000" works fine
> Then I tried the same pipeline using the separate local flink cluster 
> (./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version 
> doesn't work, see BEAM-6915)
> Neither did "–experiments=worker_threads=1000" or 
> "–experiments=worker_threads=1" work, pipeline stuck at certain stage 
> (shows running in flink UI but won't finish forever)
> any real fix to that? Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=221514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221514
 ]

ASF GitHub Bot logged work on BEAM-4374:


Author: ASF GitHub Bot
Created on: 01/Apr/19 22:28
Start Date: 01/Apr/19 22:28
Worklog Time Spent: 10m 
  Work Description: ajamato commented on issue #8062: [BEAM-4374] Emit 
MeanByteCount distribution tuple system metric from Python SDK
URL: https://github.com/apache/beam/pull/8062#issuecomment-478770322
 
 
   @Ardagan I reset my branch and pushed it back to this PR, I'll continue 
iterating on this and fixing it up. Thanks for the contributions
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221514)
Time Spent: 13h 10m  (was: 13h)

> Update existing metrics in the FN API to use new Metric Schema
> --
>
> Key: BEAM-4374
> URL: https://issues.apache.org/jira/browse/BEAM-4374
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Alex Amato
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Update existing metrics to use the new proto and cataloging schema defined in:
> [_https://s.apache.org/beam-fn-api-metrics_]
>  * Check in new protos
>  * Define catalog file for metrics
>  * Port existing metrics to use this new format, based on catalog 
> names+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=221513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221513
 ]

ASF GitHub Bot logged work on BEAM-6876:


Author: ASF GitHub Bot
Created on: 01/Apr/19 22:23
Start Date: 01/Apr/19 22:23
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #8173: [release-2.12] 
[BEAM-6876] Cleanup user state in portable Flink Runner
URL: https://github.com/apache/beam/pull/8173#issuecomment-478769017
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221513)
Time Spent: 5h 50m  (was: 5h 40m)

> User state cleanup in portable Flink runner
> ---
>
> Key: BEAM-6876
> URL: https://issues.apache.org/jira/browse/BEAM-6876
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink, triaged
> Fix For: 2.12.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> State is currently not being cleaned up by the runner.
> [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6914) Use BigQuerySink as default for 2.12.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6914?focusedWorklogId=221511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221511
 ]

ASF GitHub Bot logged work on BEAM-6914:


Author: ASF GitHub Bot
Created on: 01/Apr/19 22:16
Start Date: 01/Apr/19 22:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8170: 
[release-2.12.0][BEAM-6914] Reverting behavior of Native BQ sink in Python 
(#8143)
URL: https://github.com/apache/beam/pull/8170#issuecomment-478767496
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221511)
Time Spent: 6h  (was: 5h 50m)

> Use BigQuerySink as default for 2.12.
> -
>
> Key: BEAM-6914
> URL: https://issues.apache.org/jira/browse/BEAM-6914
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221500&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221500
 ]

ASF GitHub Bot logged work on BEAM-6945:


Author: ASF GitHub Bot
Created on: 01/Apr/19 22:07
Start Date: 01/Apr/19 22:07
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8175: [BEAM-6945] Add 
single entry point for metrics.proto constants in java DF worker
URL: https://github.com/apache/beam/pull/8175#issuecomment-478765304
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221500)
Time Spent: 1h  (was: 50m)

> Utilize label and urn values from metrics.proto in Java DF Runner
> -
>
> Key: BEAM-6945
> URL: https://issues.apache.org/jira/browse/BEAM-6945
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have some of values from metrics.proto hardcoded in java code. 
> Generalize access to those values and refactor Java DF Runner to utilize new 
> approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221501&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221501
 ]

ASF GitHub Bot logged work on BEAM-6945:


Author: ASF GitHub Bot
Created on: 01/Apr/19 22:07
Start Date: 01/Apr/19 22:07
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8175: [BEAM-6945] Add 
single entry point for metrics.proto constants in java DF worker
URL: https://github.com/apache/beam/pull/8175#issuecomment-478765304
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221501)
Time Spent: 1h 10m  (was: 1h)

> Utilize label and urn values from metrics.proto in Java DF Runner
> -
>
> Key: BEAM-6945
> URL: https://issues.apache.org/jira/browse/BEAM-6945
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have some of values from metrics.proto hardcoded in java code. 
> Generalize access to those values and refactor Java DF Runner to utilize new 
> approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221499&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221499
 ]

ASF GitHub Bot logged work on BEAM-6953:


Author: ASF GitHub Bot
Created on: 01/Apr/19 22:06
Start Date: 01/Apr/19 22:06
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8188: [BEAM-6953] 
Make bq constants args
URL: https://github.com/apache/beam/pull/8188
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221499)
Time Spent: 1h 20m  (was: 1h 10m)

> BigQueryIO has constants that should be PipelineOptions
> ---
>
> Key: BEAM-6953
> URL: https://issues.apache.org/jira/browse/BEAM-6953
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221497
 ]

ASF GitHub Bot logged work on BEAM-6502:


Author: ASF GitHub Bot
Created on: 01/Apr/19 21:56
Start Date: 01/Apr/19 21:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8146: [BEAM-6502] 
Re-Remove runner time execution information from public API surface (now 
including Watch)
URL: https://github.com/apache/beam/pull/8146#issuecomment-478762573
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221497)
Time Spent: 2h 20m  (was: 2h 10m)

> SplittableDoFn: Re-Remove runner time execution information from public API 
> surface
> ---
>
> Key: BEAM-6502
> URL: https://issues.apache.org/jira/browse/BEAM-6502
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: triaged
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6966) Spark portable runner: get PAssert working

2019-04-01 Thread Kyle Weaver (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807220#comment-16807220
 ] 

Kyle Weaver commented on BEAM-6966:
---

Note to self: READ isn't actually needed; it can be replaced with impulse.

> Spark portable runner: get PAssert working
> --
>
> Key: BEAM-6966
> URL: https://issues.apache.org/jira/browse/BEAM-6966
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> This would help a lot with testing, such as validatesRunner tests and others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221493
 ]

ASF GitHub Bot logged work on BEAM-6945:


Author: ASF GitHub Bot
Created on: 01/Apr/19 21:48
Start Date: 01/Apr/19 21:48
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8175: 
[BEAM-6945] Add single entry point for metrics.proto constants in java DF worker
URL: https://github.com/apache/beam/pull/8175#discussion_r271061889
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.metrics;
+
+import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps;
+import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec;
+
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import 
org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+/** This static class fetches MonitoringInfo related values from 
metrics.proto. */
+public final class MonitoringInfoConstants {
+
+  /** Supported MonitoringInfo Urns. */
 
 Review comment:
   I have $0.02: group things according to functionality not what type of thing 
they are.
   
   So, do not collocate these things because they are constants - collocate 
them because they are for working with `MonitoringInfo` objects. It is somewhat 
traditional at this point to use a plural like `MonitoringInfos` for utility 
methods and constants. It is not necessary, and IMO not useful, to distinguish 
constants from other static utility bits. I think these being collocated does 
make sense, but that is up to style.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221493)
Time Spent: 50m  (was: 40m)

> Utilize label and urn values from metrics.proto in Java DF Runner
> -
>
> Key: BEAM-6945
> URL: https://issues.apache.org/jira/browse/BEAM-6945
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We have some of values from metrics.proto hardcoded in java code. 
> Generalize access to those values and refactor Java DF Runner to utilize new 
> approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221492
 ]

ASF GitHub Bot logged work on BEAM-6945:


Author: ASF GitHub Bot
Created on: 01/Apr/19 21:46
Start Date: 01/Apr/19 21:46
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8175: [BEAM-6945] 
Add single entry point for metrics.proto constants in java DF worker
URL: https://github.com/apache/beam/pull/8175#discussion_r271063358
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.metrics;
+
+import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps;
+import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec;
+
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import 
org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+/** This static class fetches MonitoringInfo related values from 
metrics.proto. */
+public final class MonitoringInfoConstants {
+
+  /** Supported MonitoringInfo Urns. */
 
 Review comment:
   I think it makes sense to have all of these in one file. They're not so many 
that it's hard to track. I think either approach is acceptable though, so I 
wouldn't block the change either way.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221492)
Time Spent: 40m  (was: 0.5h)

> Utilize label and urn values from metrics.proto in Java DF Runner
> -
>
> Key: BEAM-6945
> URL: https://issues.apache.org/jira/browse/BEAM-6945
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We have some of values from metrics.proto hardcoded in java code. 
> Generalize access to those values and refactor Java DF Runner to utilize new 
> approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6956) --experiments=worker_threads=100 issue

2019-04-01 Thread Jiayi Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807065#comment-16807065
 ] 

Jiayi Zhao edited comment on BEAM-6956 at 4/1/19 9:41 PM:
--

it might due to some random stuff, I tried in thread flink again with 100 
worker_threads, and this time it works(I also changed -parallelism=1 instead of 
-parallelism=2 before, not sure if it's related),

but when I tried separate local flink cluster, it works for some simple 
pipeline but got stuck for some complex tensorflow transform pipelines

 


was (Author: 1025kb):
it might due to some random stuff, I tried in thread flink again with 100 
worker_threads, and this time it works(I also changed --parallelism=1 instead 
of --parallelism=2 before, not sure if it's related),

when I tried separate local flink cluster, each time pipeline stuck at 
different places, I will try confirm it and put more information about where it 
stucks 

> --experiments=worker_threads=100 issue
> --
>
> Key: BEAM-6956
> URL: https://issues.apache.org/jira/browse/BEAM-6956
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Jiayi Zhao
>Priority: Major
>
> I noticed that without this "–experiments=worker_threads=100", pipeline will 
> stuck,
> The weird thing is, I tried some complex pipeline using the in thread flink 
> method (./gradlew :beam-runners-flink_2.11-job-server:runShadow)
> "–experiments=worker_threads=100" doesn't work, but 
> "–experiments=worker_threads=1000" works fine
> Then I tried the same pipeline using the separate local flink cluster 
> (./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version 
> doesn't work, see BEAM-6915)
> Neither did "–experiments=worker_threads=1000" or 
> "–experiments=worker_threads=1" work, pipeline stuck at certain stage 
> (shows running in flink UI but won't finish forever)
> any real fix to that? Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6602) Support schemas in BigQueryIO.Write

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6602?focusedWorklogId=221481&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221481
 ]

ASF GitHub Bot logged work on BEAM-6602:


Author: ASF GitHub Bot
Created on: 01/Apr/19 21:11
Start Date: 01/Apr/19 21:11
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7840:  [BEAM-6602] 
BigQueryIO.write natively understands Beam schemas
URL: https://github.com/apache/beam/pull/7840#issuecomment-478749573
 
 
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221481)
Time Spent: 2h 50m  (was: 2h 40m)

> Support schemas in BigQueryIO.Write
> ---
>
> Key: BEAM-6602
> URL: https://issues.apache.org/jira/browse/BEAM-6602
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Labels: triaged
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6906?focusedWorklogId=221480&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221480
 ]

ASF GitHub Bot logged work on BEAM-6906:


Author: ASF GitHub Bot
Created on: 01/Apr/19 21:10
Start Date: 01/Apr/19 21:10
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8134: 
[BEAM-6906] Update spec on CombineFn and DoFn to clarify mutability of 
parameters
URL: https://github.com/apache/beam/pull/8134
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221480)
Time Spent: 40m  (was: 0.5h)

> Mutating accumulators in fused stages is generally unsafe - need to provide a 
> single mutable accumulator
> 
>
> Key: BEAM-6906
> URL: https://issues.apache.org/jira/browse/BEAM-6906
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Yueyang Qiu
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Our current docs encourage a CombineFn author to mutate accumulators for 
> efficiency. This is important, but cannot be done generally without losing 
> efficiency - it is not safe to share accumulators within a stage or across 
> sliding windows. The ownership story needs to be clear. Any accumulator that 
> is mutable is from that point on owned by the CombineFn, not the runner and 
> cannot be given to other steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221479
 ]

ASF GitHub Bot logged work on BEAM-6945:


Author: ASF GitHub Bot
Created on: 01/Apr/19 21:08
Start Date: 01/Apr/19 21:08
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on pull request #8175: [BEAM-6945] 
Add single entry point for metrics.proto constants in java DF worker
URL: https://github.com/apache/beam/pull/8175#discussion_r271051938
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.metrics;
+
+import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps;
+import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec;
+
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import 
org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+/** This static class fetches MonitoringInfo related values from 
metrics.proto. */
+public final class MonitoringInfoConstants {
+
+  /** Supported MonitoringInfo Urns. */
 
 Review comment:
   I believe all of these have tight relation through MonitoringInfo.
   Having them in single file/namespace is convenient to highlight the use case.
   
   Also, I don't feel we win much of shorter usage. To make thinks even 
shorter, you can import sub-class directly and then it will be: 
Urls.ELEMENT_COUNT, etc.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221479)
Time Spent: 0.5h  (was: 20m)

> Utilize label and urn values from metrics.proto in Java DF Runner
> -
>
> Key: BEAM-6945
> URL: https://issues.apache.org/jira/browse/BEAM-6945
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have some of values from metrics.proto hardcoded in java code. 
> Generalize access to those values and refactor Java DF Runner to utilize new 
> approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=221478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221478
 ]

ASF GitHub Bot logged work on BEAM-6821:


Author: ASF GitHub Bot
Created on: 01/Apr/19 21:07
Start Date: 01/Apr/19 21:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8054: [BEAM-6821] 
FileBasedSink improper paths
URL: https://github.com/apache/beam/pull/8054#issuecomment-478748435
 
 
   Please take a look at test failures: Click on Details -> Gradle Build Scan.
   
   There is a lint error and a test failure. 
   
   To run an individual test locally see: 
https://cwiki.apache.org/confluence/display/BEAM/Contribution+Testing+Guide#ContributionTestingGuide-HowtorunPythonunittests
   
   Perhaps that's a flake.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221478)
Time Spent: 2h  (was: 1h 50m)

> FileBasedSink is not creating file paths according to target filesystem
> ---
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Gregory Kovelman
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target 
> filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>  
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
>  base_path, last_component = FileSystems.split(file_path_prefix)
>  if not last_component:
># Trying to re-split the base_path to check if it's a root.
>new_base_path, _ = FileSystems.split(base_path)
>if base_path == new_base_path:
>  raise ValueError('Cannot create a temporary directory for root path '
>   'prefix %s. Please specify a file path prefix with '
>   'at least two components.' % file_path_prefix)
>  path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
>  return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
>  def open_writer(self, init_result, uid):
>  # A proper suffix is needed for AUTO compression detection.
>  # We also ensure there will be no collisions with uid and a
>  # (possibly unsharded) file_path_prefix and a (possibly empty)
>  # file_name_suffix.
>  file_path_prefix = self.file_path_prefix.get()
>  file_name_suffix = self.file_name_suffix.get()
>  suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
>  return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>  
>  
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221477
 ]

ASF GitHub Bot logged work on BEAM-6945:


Author: ASF GitHub Bot
Created on: 01/Apr/19 21:05
Start Date: 01/Apr/19 21:05
Worklog Time Spent: 10m 
  Work Description: ajamato commented on pull request #8175: [BEAM-6945] 
Add single entry point for metrics.proto constants in java DF worker
URL: https://github.com/apache/beam/pull/8175#discussion_r271050465
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MonitoringInfoConstants.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.metrics;
+
+import static org.apache.beam.model.pipeline.v1.MetricsApi.labelProps;
+import static org.apache.beam.model.pipeline.v1.MetricsApi.monitoringInfoSpec;
+
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo;
+import 
org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfo.MonitoringInfoLabels;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoSpecs;
+import org.apache.beam.model.pipeline.v1.MetricsApi.MonitoringInfoTypeUrns;
+import org.apache.beam.model.pipeline.v1.RunnerApi;
+
+/** This static class fetches MonitoringInfo related values from 
metrics.proto. */
+public final class MonitoringInfoConstants {
+
+  /** Supported MonitoringInfo Urns. */
 
 Review comment:
   Can we make this multiple classes instead, which will shorten the use a bit
   MonitoringInfoConstants.Urns.ELEMENT_COUNT
   becomes
   MonitoringInfoUrns.ELEMENT_COUNT
   
   MonitoringInfoConstants.Labels.PTRANSFORM
   becomes
   MonitoringInfoLabels.PTRANSFORM
   
   MonitoringInfoConstants.TypeUrns.SUM_INT64
   becomes
   MonitoringInfoTypeUrns.SUM_INT64
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221477)
Time Spent: 20m  (was: 10m)

> Utilize label and urn values from metrics.proto in Java DF Runner
> -
>
> Key: BEAM-6945
> URL: https://issues.apache.org/jira/browse/BEAM-6945
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have some of values from metrics.proto hardcoded in java code. 
> Generalize access to those values and refactor Java DF Runner to utilize new 
> approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3312) Add convenient "with" to MqttIO.ConnectionConfiguration

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3312?focusedWorklogId=221475&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221475
 ]

ASF GitHub Bot logged work on BEAM-3312:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:58
Start Date: 01/Apr/19 20:58
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8167: [BEAM-3312] Improve 
the builder to MqttIO connection
URL: https://github.com/apache/beam/pull/8167#issuecomment-478745144
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221475)
Time Spent: 50m  (was: 40m)

> Add convenient "with" to MqttIO.ConnectionConfiguration
> ---
>
> Key: BEAM-3312
> URL: https://issues.apache.org/jira/browse/BEAM-3312
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-mqtt
>Reporter: Jean-Baptiste Onofré
>Assignee: LI Guobao
>Priority: Major
>  Labels: triaged
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now, for instance, {{MqttIO}} requires a {{ConnectionConfiguration}} object 
> to pass the URL and topic name. It means, the user has to do something like:
> {code}
> MqttIO.read() 
> .withConnectionConfiguration(MqttIO.ConnectionConfiguration.create("tcp://localhost:1883",
>  "CAR"))
> {code}
> It's pretty verbose and long. I think it makes sense to provide convenient 
> "direct" method allowing to do:
> {code}
> MqttIO.read().withUrl().withTopic()
> {code}
> or even:
> {code}
> MqttIO.read().withConnection("url",  "topic")
> {code}
> The same apply for some other IOs (JMS, ...).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6945) Utilize label and urn values from metrics.proto in Java DF Runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6945?focusedWorklogId=221474&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221474
 ]

ASF GitHub Bot logged work on BEAM-6945:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:55
Start Date: 01/Apr/19 20:55
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8175: [BEAM-6945] Add 
single entry point for metrics.proto constants in java DF worker
URL: https://github.com/apache/beam/pull/8175#issuecomment-478744263
 
 
   R: @ajamato
   C: @pabloem, @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221474)
Time Spent: 10m
Remaining Estimate: 0h

> Utilize label and urn values from metrics.proto in Java DF Runner
> -
>
> Key: BEAM-6945
> URL: https://issues.apache.org/jira/browse/BEAM-6945
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have some of values from metrics.proto hardcoded in java code. 
> Generalize access to those values and refactor Java DF Runner to utilize new 
> approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6966) Spark portable runner: get PAssert working

2019-04-01 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-6966:
-

 Summary: Spark portable runner: get PAssert working
 Key: BEAM-6966
 URL: https://issues.apache.org/jira/browse/BEAM-6966
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


This would help a lot with testing, such as validatesRunner tests and others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6753) Create proto representation for schemas

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221465&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221465
 ]

ASF GitHub Bot logged work on BEAM-6753:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:38
Start Date: 01/Apr/19 20:38
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #7952: [BEAM-6753] Set the 
stage to make schema coder update compatible
URL: https://github.com/apache/beam/pull/7952#issuecomment-478738280
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221465)
Time Spent: 1.5h  (was: 1h 20m)

> Create proto representation for schemas
> ---
>
> Key: BEAM-6753
> URL: https://issues.apache.org/jira/browse/BEAM-6753
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221464&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221464
 ]

ASF GitHub Bot logged work on BEAM-6953:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:38
Start Date: 01/Apr/19 20:38
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #8188: [BEAM-6953] Make bq 
constants args
URL: https://github.com/apache/beam/pull/8188#issuecomment-478738133
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221464)
Time Spent: 1h 10m  (was: 1h)

> BigQueryIO has constants that should be PipelineOptions
> ---
>
> Key: BEAM-6953
> URL: https://issues.apache.org/jira/browse/BEAM-6953
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6965) Spark portable translator: translate READ

2019-04-01 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-6965:
-

 Summary: Spark portable translator: translate READ
 Key: BEAM-6965
 URL: https://issues.apache.org/jira/browse/BEAM-6965
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=221458&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221458
 ]

ASF GitHub Bot logged work on BEAM-6876:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:13
Start Date: 01/Apr/19 20:13
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #8173: [release-2.12] 
[BEAM-6876] Cleanup user state in portable Flink Runner
URL: https://github.com/apache/beam/pull/8173#issuecomment-478729355
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221458)
Time Spent: 5h 40m  (was: 5.5h)

> User state cleanup in portable Flink runner
> ---
>
> Key: BEAM-6876
> URL: https://issues.apache.org/jira/browse/BEAM-6876
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink, triaged
> Fix For: 2.12.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> State is currently not being cleaned up by the runner.
> [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6944) Add support for MeanByteCount metric for Python Streaming to Java DF Runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6944?focusedWorklogId=221453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221453
 ]

ASF GitHub Bot logged work on BEAM-6944:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:10
Start Date: 01/Apr/19 20:10
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8171: 
[BEAM-6944] Add MeanByteCountMonitoringInfoToCounterUpdateTransformer
URL: https://github.com/apache/beam/pull/8171
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221453)
Time Spent: 1h  (was: 50m)

> Add support for MeanByteCount metric for Python Streaming to Java DF Runner
> ---
>
> Key: BEAM-6944
> URL: https://issues.apache.org/jira/browse/BEAM-6944
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-harness
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6922) Remove LGPL test library dependency in cassandraio-test

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6922?focusedWorklogId=221452&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221452
 ]

ASF GitHub Bot logged work on BEAM-6922:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:08
Start Date: 01/Apr/19 20:08
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8182: [BEAM-6922] 
Replace cassandra-unit with achilles
URL: https://github.com/apache/beam/pull/8182#issuecomment-478727534
 
 
   I trust also that CassandraIO was working before, so it is still working.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221452)
Time Spent: 1.5h  (was: 1h 20m)

> Remove LGPL test library dependency in cassandraio-test
> ---
>
> Key: BEAM-6922
> URL: https://issues.apache.org/jira/browse/BEAM-6922
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> cassandra-io tests use cassandra-unit test library that has LGPLV3 ASF 
> category X licence , we cannot deliver test jars that depend on LGPL licence.
> A similar discussion at 
> https://issues.apache.org/jira/browse/LEGAL-153?focusedCommentId=13548819



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6922) Remove LGPL test library dependency in cassandraio-test

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6922?focusedWorklogId=221451&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221451
 ]

ASF GitHub Bot logged work on BEAM-6922:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:07
Start Date: 01/Apr/19 20:07
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8182: 
[BEAM-6922] Replace cassandra-unit with achilles
URL: https://github.com/apache/beam/pull/8182#discussion_r271029223
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/test/java/org/apache/beam/sdk/io/cassandra/CassandraIOTest.java
 ##
 @@ -88,63 +92,64 @@
   private static final long NUM_ROWS = 20L;
   private static final String CASSANDRA_KEYSPACE = "beam_ks";
   private static final String CASSANDRA_HOST = "127.0.0.1";
-  private static final int CASSANDRA_PORT = 9142;
+  private static final int CASSANDRA_PORT = 9042;
 
 Review comment:
   So this still cannot have more than one on a host?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221451)
Time Spent: 1h 20m  (was: 1h 10m)

> Remove LGPL test library dependency in cassandraio-test
> ---
>
> Key: BEAM-6922
> URL: https://issues.apache.org/jira/browse/BEAM-6922
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> cassandra-io tests use cassandra-unit test library that has LGPLV3 ASF 
> category X licence , we cannot deliver test jars that depend on LGPL licence.
> A similar discussion at 
> https://issues.apache.org/jira/browse/LEGAL-153?focusedCommentId=13548819



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6906) Mutating accumulators in fused stages is generally unsafe - need to provide a single mutable accumulator

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6906?focusedWorklogId=221449&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221449
 ]

ASF GitHub Bot logged work on BEAM-6906:


Author: ASF GitHub Bot
Created on: 01/Apr/19 20:01
Start Date: 01/Apr/19 20:01
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #8134: [BEAM-6906] Update 
spec on CombineFn and DoFn to clarify mutability of parameters
URL: https://github.com/apache/beam/pull/8134#issuecomment-478724935
 
 
   Ping @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221449)
Time Spent: 0.5h  (was: 20m)

> Mutating accumulators in fused stages is generally unsafe - need to provide a 
> single mutable accumulator
> 
>
> Key: BEAM-6906
> URL: https://issues.apache.org/jira/browse/BEAM-6906
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Yueyang Qiu
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Our current docs encourage a CombineFn author to mutate accumulators for 
> efficiency. This is important, but cannot be done generally without losing 
> efficiency - it is not safe to share accumulators within a stage or across 
> sliding windows. The ownership story needs to be clear. Any accumulator that 
> is mutable is from that point on owned by the CombineFn, not the runner and 
> cannot be given to other steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221445&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221445
 ]

ASF GitHub Bot logged work on BEAM-6953:


Author: ASF GitHub Bot
Created on: 01/Apr/19 19:50
Start Date: 01/Apr/19 19:50
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #8188: [BEAM-6953] Make bq 
constants args
URL: https://github.com/apache/beam/pull/8188#issuecomment-478720944
 
 
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221445)
Time Spent: 1h  (was: 50m)

> BigQueryIO has constants that should be PipelineOptions
> ---
>
> Key: BEAM-6953
> URL: https://issues.apache.org/jira/browse/BEAM-6953
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221444
 ]

ASF GitHub Bot logged work on BEAM-6953:


Author: ASF GitHub Bot
Created on: 01/Apr/19 19:50
Start Date: 01/Apr/19 19:50
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #8188: [BEAM-6953] Make bq 
constants args
URL: https://github.com/apache/beam/pull/8188#issuecomment-478720856
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221444)
Time Spent: 50m  (was: 40m)

> BigQueryIO has constants that should be PipelineOptions
> ---
>
> Key: BEAM-6953
> URL: https://issues.apache.org/jira/browse/BEAM-6953
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6753) Create proto representation for schemas

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221443&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221443
 ]

ASF GitHub Bot logged work on BEAM-6753:


Author: ASF GitHub Bot
Created on: 01/Apr/19 19:50
Start Date: 01/Apr/19 19:50
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #7952: [BEAM-6753] Set the 
stage to make schema coder update compatible
URL: https://github.com/apache/beam/pull/7952#issuecomment-478720719
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221443)
Time Spent: 1h 20m  (was: 1h 10m)

> Create proto representation for schemas
> ---
>
> Key: BEAM-6753
> URL: https://issues.apache.org/jira/browse/BEAM-6753
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6954:

Description: 
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - when we serialize the json, it shouldn't serialize null values at 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655]

 

Instinctively I would have expected the @Default.* annotations producing values 
every single time, when the value is null - so a property with a @Default.* 
annotation can't be null - but I was unable to find anything explicit regarding 
this in the documentation. So I'm not sure which of the suggested change has to 
be made.

---

Okay, I have investigated further, and it seems the default value is indeed 
calculated before the json serialization by calling the mentioned method. The 
problem is that it returns a RuntimeValueProvider, which gets serialized as 
null ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279]
 ), because the isAccessible returns false ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250]
 + 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220]
 )

... and then during deserialization it is found in the jsonOptions at ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 ), so it executes the getValueFromJson which uses an ObjectMapper to create a 
ValueProvider from a NullNode ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498]
 ) . 

The problem here is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes. -> 
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)]

For this part of the issue see BEAM-6963 (that still doesn't solve this issue 
btw, but might be required for it)

  was:
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !is

[jira] [Work logged] (BEAM-6753) Create proto representation for schemas

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221440
 ]

ASF GitHub Bot logged work on BEAM-6753:


Author: ASF GitHub Bot
Created on: 01/Apr/19 19:38
Start Date: 01/Apr/19 19:38
Worklog Time Spent: 10m 
  Work Description: dpmills commented on issue #7952: [BEAM-6753] Set the 
stage to make schema coder update compatible
URL: https://github.com/apache/beam/pull/7952#issuecomment-478716551
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221440)
Time Spent: 1h 10m  (was: 1h)

> Create proto representation for schemas
> ---
>
> Key: BEAM-6753
> URL: https://issues.apache.org/jira/browse/BEAM-6753
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6964) Simplify key context handling in ExecutableStageDoFnOperator

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6964?focusedWorklogId=221436&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221436
 ]

ASF GitHub Bot logged work on BEAM-6964:


Author: ASF GitHub Bot
Created on: 01/Apr/19 19:30
Start Date: 01/Apr/19 19:30
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #8190: [BEAM-6964] 
Simplify key context handling in ExecutableStageDoFnOperator
URL: https://github.com/apache/beam/pull/8190
 
 
   The key handling code in ExecutableStageDoFnOperator was a bit hard to
   understand because of the different contexts in which the current key is
   set. This simplifies the key setting by always using the state backend to set
   the key. An additional guard has been added to ensure that no concurrent 
access
   is performed during key access.
   
   This also adds an additional class for consistent use of key coders.
   
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   

   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache

[jira] [Work logged] (BEAM-6753) Create proto representation for schemas

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221434&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221434
 ]

ASF GitHub Bot logged work on BEAM-6753:


Author: ASF GitHub Bot
Created on: 01/Apr/19 19:24
Start Date: 01/Apr/19 19:24
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7952: [BEAM-6753] 
Set the stage to make schema coder update compatible
URL: https://github.com/apache/beam/pull/7952#discussion_r271015156
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
 ##
 @@ -285,15 +289,24 @@ public InstrumentedType prepare(InstrumentedType 
instrumentedType) {
 // per-field Coders.
 static Row decodeDelegate(Schema schema, Coder[] coders, InputStream 
inputStream)
 throws IOException {
+  int fieldCount = VAR_INT_CODER.decode(inputStream);
 
 Review comment:
   as discussed, we don't want to have to reorder fields in the encoding.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221434)
Time Spent: 1h  (was: 50m)

> Create proto representation for schemas
> ---
>
> Key: BEAM-6753
> URL: https://issues.apache.org/jira/browse/BEAM-6753
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6963:

Description: 
Classes affected:
 org.apache.beam.sdk.options.ValueProvider.Serializer
 org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
 ( 
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext]
 )
 )
 "Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.

My guess is that either we should completely omit serializing 
RuntimeValueProviders, or during deserialization the proper runtime value 
provider should be created again - which requires more than just a simple 
"null" being present in the json.

  was:
Classes affected:
 org.apache.beam.sdk.options.ValueProvider.Serializer
 org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
 ( 
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext]
 )
 )
 "Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.

My guess is that either we should completely omit serializing 
RuntimeValueProviders, or during deserialization the proper runtime value 
provider should be created again - which requires more than a "null" being 
present in the json.


> Bug in RuntimeValueProvider JSON serialization
> --
>
> Key: BEAM-6963
> URL: https://issues.apache.org/jira/browse/BEAM-6963
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority: Critical
>
> Classes affected:
>  org.apache.beam.sdk.options.ValueProvider.Serializer
>  org.apache.beam.sdk.options.ValueProvider.Deserializer
> The problem is that according to the JsonDeserializer documentation, the 
> deserialize method isn't executed for null nodes:
>  ( 
> [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext]
>  )
>  )
>  "Note that this method is never called for JSON null literal, and thus 
> deserializers need (and should) not check for it."
> If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
> so we call a writeNull(). During deserialization this isn't handled properly 
> as mentioned and our deserialization will return null.
> The end result is that getters with ValueProvider return values will return 
> "null". AFAIK ValueProvider getters should be never null.
> My guess is that either we should completely omit serializing 
> RuntimeValueProviders, or during deserialization the proper runtime value 
> provider should be created again - which requires more than just a simple 
> "null" being 

[jira] [Created] (BEAM-6964) Simplify key context handling in ExecutableStageDoFnOperator

2019-04-01 Thread Maximilian Michels (JIRA)
Maximilian Michels created BEAM-6964:


 Summary: Simplify key context handling in 
ExecutableStageDoFnOperator
 Key: BEAM-6964
 URL: https://issues.apache.org/jira/browse/BEAM-6964
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Maximilian Michels
Assignee: Maximilian Michels


The code is a bit hard to understand because of multiple keys being set in 
different contexts, for 1) state requests 2) timers setting 3) timer firing. 
This should be simplified.

We can also introduce a helper class for key encoding/decoding to achieve 
consistency across all places which use a key serializer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221423&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221423
 ]

ASF GitHub Bot logged work on BEAM-6953:


Author: ASF GitHub Bot
Created on: 01/Apr/19 19:06
Start Date: 01/Apr/19 19:06
Worklog Time Spent: 10m 
  Work Description: dpmills commented on issue #8188: [BEAM-6953] Make bq 
constants args
URL: https://github.com/apache/beam/pull/8188#issuecomment-478705542
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221423)
Time Spent: 40m  (was: 0.5h)

> BigQueryIO has constants that should be PipelineOptions
> ---
>
> Key: BEAM-6953
> URL: https://issues.apache.org/jira/browse/BEAM-6953
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6963:

Description: 
Classes affected:
 org.apache.beam.sdk.options.ValueProvider.Serializer
 org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
 ( 
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext]
 )
 )
 "Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.

My guess is that either we should completely omit serializing 
RuntimeValueProviders, or during deserialization the proper runtime value 
provider should be created again - which requires more than a "null" being 
present in the json.

  was:
Classes affected:
 org.apache.beam.sdk.options.ValueProvider.Serializer
 org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
 ( 
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext]
 )
 )
 "Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.


> Bug in RuntimeValueProvider JSON serialization
> --
>
> Key: BEAM-6963
> URL: https://issues.apache.org/jira/browse/BEAM-6963
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority: Critical
>
> Classes affected:
>  org.apache.beam.sdk.options.ValueProvider.Serializer
>  org.apache.beam.sdk.options.ValueProvider.Deserializer
> The problem is that according to the JsonDeserializer documentation, the 
> deserialize method isn't executed for null nodes:
>  ( 
> [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext]
>  )
>  )
>  "Note that this method is never called for JSON null literal, and thus 
> deserializers need (and should) not check for it."
> If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
> so we call a writeNull(). During deserialization this isn't handled properly 
> as mentioned and our deserialization will return null.
> The end result is that getters with ValueProvider return values will return 
> "null". AFAIK ValueProvider getters should be never null.
> My guess is that either we should completely omit serializing 
> RuntimeValueProviders, or during deserialization the proper runtime value 
> provider should be created again - which requires more than a "null" being 
> present in the json.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6963:

Description: 
Classes affected:
 org.apache.beam.sdk.options.ValueProvider.Serializer
 org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
 ( 
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext]
 )
 )
 "Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.

  was:
Classes affected:
 org.apache.beam.sdk.options.ValueProvider.Serializer
 org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
( 
https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)
)
"Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.


> Bug in RuntimeValueProvider JSON serialization
> --
>
> Key: BEAM-6963
> URL: https://issues.apache.org/jira/browse/BEAM-6963
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority: Critical
>
> Classes affected:
>  org.apache.beam.sdk.options.ValueProvider.Serializer
>  org.apache.beam.sdk.options.ValueProvider.Deserializer
> The problem is that according to the JsonDeserializer documentation, the 
> deserialize method isn't executed for null nodes:
>  ( 
> [https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)|https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext]
>  )
>  )
>  "Note that this method is never called for JSON null literal, and thus 
> deserializers need (and should) not check for it."
> If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
> so we call a writeNull(). During deserialization this isn't handled properly 
> as mentioned and our deserialization will return null.
> The end result is that getters with ValueProvider return values will return 
> "null". AFAIK ValueProvider getters should be never null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6963:

Description: 
Classes affected:
 org.apache.beam.sdk.options.ValueProvider.Serializer
 org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
( 
https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)
)
"Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.

  was:
Classes affected:
org.apache.beam.sdk.options.ValueProvider.Serializer
org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)
]"Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.


> Bug in RuntimeValueProvider JSON serialization
> --
>
> Key: BEAM-6963
> URL: https://issues.apache.org/jira/browse/BEAM-6963
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority: Critical
>
> Classes affected:
>  org.apache.beam.sdk.options.ValueProvider.Serializer
>  org.apache.beam.sdk.options.ValueProvider.Deserializer
> The problem is that according to the JsonDeserializer documentation, the 
> deserialize method isn't executed for null nodes:
> ( 
> https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)
> )
> "Note that this method is never called for JSON null literal, and thus 
> deserializers need (and should) not check for it."
> If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
> so we call a writeNull(). During deserialization this isn't handled properly 
> as mentioned and our deserialization will return null.
> The end result is that getters with ValueProvider return values will return 
> "null". AFAIK ValueProvider getters should be never null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6954:

Description: 
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - when we serialize the json, it shouldn't serialize null values at 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655]

 

Instinctively I would have expected the @Default.* annotations producing values 
every single time, when the value is null - so a property with a @Default.* 
annotation can't be null - but I was unable to find anything explicit regarding 
this in the documentation. So I'm not sure which of the suggested change has to 
be made.

---

Okay, I have investigated further, and it seems the default value is indeed 
calculated before the json serialization by calling the mentioned method. The 
problem is that it returns a RuntimeValueProvider, which gets serialized as 
null ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279]
 ), because the isAccessible returns false ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250]
 + 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220]
 )

... and then during deserialization it is found in the jsonOptions at ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 ), so it executes the getValueFromJson which uses an ObjectMapper to create a 
ValueProvider from a NullNode ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498]
 ) . 

The problem here is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes. -> 
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)]

For this part of the issue see BEAM-6963 (that still doesn't solve this issue 
btw)

  was:
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - when we seri

[jira] [Created] (BEAM-6963) Bug in RuntimeValueProvider JSON serialization

2019-04-01 Thread JIRA
Balázs Németh created BEAM-6963:
---

 Summary: Bug in RuntimeValueProvider JSON serialization
 Key: BEAM-6963
 URL: https://issues.apache.org/jira/browse/BEAM-6963
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Affects Versions: 2.11.0
Reporter: Balázs Németh


Classes affected:
org.apache.beam.sdk.options.ValueProvider.Serializer
org.apache.beam.sdk.options.ValueProvider.Deserializer

The problem is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes:
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)
]"Note that this method is never called for JSON null literal, and thus 
deserializers need (and should) not check for it."

If we serialize a RuntimeValueProvider, the isAccessible() will return false, 
so we call a writeNull(). During deserialization this isn't handled properly as 
mentioned and our deserialization will return null.

The end result is that getters with ValueProvider return values will return 
"null". AFAIK ValueProvider getters should be never null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6960) Run Go PostCommit tests against a ULR

2019-04-01 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-6960:
---
Description: 
See parent task :https://issues.apache.org/jira/browse/BEAM-6958

 

[Instructions on running against the Java 
ULR|https://cwiki.apache.org/confluence/display/BEAM/Usage+Guide]

  was:See parent task :https://issues.apache.org/jira/browse/BEAM-6958


> Run Go PostCommit tests against a ULR
> -
>
> Key: BEAM-6960
> URL: https://issues.apache.org/jira/browse/BEAM-6960
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go, testing
>Reporter: Robert Burke
>Priority: Minor
>
> See parent task :https://issues.apache.org/jira/browse/BEAM-6958
>  
> [Instructions on running against the Java 
> ULR|https://cwiki.apache.org/confluence/display/BEAM/Usage+Guide]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6962) Add wordcount_xlang to Python post-commit

2019-04-01 Thread Chamikara Jayalath (JIRA)
Chamikara Jayalath created BEAM-6962:


 Summary: Add wordcount_xlang to Python post-commit
 Key: BEAM-6962
 URL: https://issues.apache.org/jira/browse/BEAM-6962
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink, sdk-py-core, sdk-py-harness
Reporter: Chamikara Jayalath
Assignee: Heejong Lee


This example works great but we should add it to Python post-commit to prevent 
bit-rot. 

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang.py]

Also, please consider adding a step to the pipeline that performs a checksum on 
the output to make sure that the output is valid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6961) Run Go SDK Postcommits Dataflow as their own jenkins target

2019-04-01 Thread Robert Burke (JIRA)
Robert Burke created BEAM-6961:
--

 Summary: Run Go SDK Postcommits Dataflow as their own jenkins 
target
 Key: BEAM-6961
 URL: https://issues.apache.org/jira/browse/BEAM-6961
 Project: Beam
  Issue Type: Sub-task
  Components: runner-dataflow, sdk-go, testing
Reporter: Robert Burke


See parent task: https://issues.apache.org/jira/browse/BEAM-6958



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6960) Run Go PostCommit tests against a ULR

2019-04-01 Thread Robert Burke (JIRA)
Robert Burke created BEAM-6960:
--

 Summary: Run Go PostCommit tests against a ULR
 Key: BEAM-6960
 URL: https://issues.apache.org/jira/browse/BEAM-6960
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-go, testing
Reporter: Robert Burke


See parent task :https://issues.apache.org/jira/browse/BEAM-6958



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6959) Run Go SDK Post Commit tests against the Flink Runner.

2019-04-01 Thread Robert Burke (JIRA)
Robert Burke created BEAM-6959:
--

 Summary: Run Go SDK  Post Commit tests against the Flink Runner.
 Key: BEAM-6959
 URL: https://issues.apache.org/jira/browse/BEAM-6959
 Project: Beam
  Issue Type: Sub-task
  Components: runner-flink, sdk-go, testing
Reporter: Robert Burke


See parent task BEAM-6958



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4152) Support Go session windowing

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4152?focusedWorklogId=221406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221406
 ]

ASF GitHub Bot logged work on BEAM-4152:


Author: ASF GitHub Bot
Created on: 01/Apr/19 18:40
Start Date: 01/Apr/19 18:40
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #8111: [BEAM-4152] Kinds & 
URNs for Session Windows
URL: https://github.com/apache/beam/pull/8111#issuecomment-478696170
 
 
   @ptomasroos At present the "testing and experimenting" path that works for 
the Go SDK is based on the direct runner, as @robertwb mentions. 
   
   Right now the direct runner is a constraint since work hasn't been done to 
use the Go SDK with the (new?) Python ULR, though I do know it was working with 
the Java ULR at some point, let alone a free runner such as Flink. Otherwise, 
the only thing that is tested via the post commits is Dataflow, but that's not 
great for users experimenting, at the end of the day since it costs money, 
which isn't ideal for users just wanting to try things.
   
   https://issues.apache.org/jira/browse/BEAM-6958 is a task to split out the 
integration tests in the post commit rubric. Which would be natural starting 
point for demonstrating how to use the SDK against other runners ensuring that 
the SDK works on said runners to boot. That should help a touch, once the 
jenkin's machine apocalypse is over.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221406)
Time Spent: 2.5h  (was: 2h 20m)

> Support Go session windowing
> 
>
> Key: BEAM-4152
> URL: https://issues.apache.org/jira/browse/BEAM-4152
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Support session windowing and how to handle merging windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6956) --experiments=worker_threads=100 issue

2019-04-01 Thread Jiayi Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807065#comment-16807065
 ] 

Jiayi Zhao commented on BEAM-6956:
--

it might due to some random stuff, I tried in thread flink again with 100 
worker_threads, and this time it works(I also changed --parallelism=1 instead 
of --parallelism=2 before, not sure if it's related),

when I tried separate local flink cluster, each time pipeline stuck at 
different places, I will try confirm it and put more information about where it 
stucks 

> --experiments=worker_threads=100 issue
> --
>
> Key: BEAM-6956
> URL: https://issues.apache.org/jira/browse/BEAM-6956
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Jiayi Zhao
>Priority: Major
>
> I noticed that without this "–experiments=worker_threads=100", pipeline will 
> stuck,
> The weird thing is, I tried some complex pipeline using the in thread flink 
> method (./gradlew :beam-runners-flink_2.11-job-server:runShadow)
> "–experiments=worker_threads=100" doesn't work, but 
> "–experiments=worker_threads=1000" works fine
> Then I tried the same pipeline using the separate local flink cluster 
> (./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version 
> doesn't work, see BEAM-6915)
> Neither did "–experiments=worker_threads=1000" or 
> "–experiments=worker_threads=1" work, pipeline stuck at certain stage 
> (shows running in flink UI but won't finish forever)
> any real fix to that? Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6958) Split Go PostCommit Test results by Runner

2019-04-01 Thread Robert Burke (JIRA)
Robert Burke created BEAM-6958:
--

 Summary: Split Go PostCommit Test results by Runner
 Key: BEAM-6958
 URL: https://issues.apache.org/jira/browse/BEAM-6958
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go, testing
Reporter: Robert Burke


At present the Go SDK only has a single column filled in on the master branch 
Post-Commit Tests Status testing rubric, which is unclear, and not-ideal.

Right now the jenkin's [Go PostCommit 
tests|https://github.com/apache/beam/blob/ec3f79214e9ef204fa32b744051a291fe4b61e23/.test-infra/jenkins/job_PostCommit_Go.groovy#L24]
 trigger the [go integration test 
task|https://github.com/apache/beam/blob/58a70b273367c22fd7c8562c42bc10a07dbe7156/build.gradle#L178],
 which only runs the [tests on Dataflow via a shell 
script|https://github.com/apache/beam/blob/master/sdks/go/test/run_integration_tests.sh#L78].
 It doesn't even run the unit tests as per the pre-commit.



The end goal for this task is to:
* Have the Go SDK column represent the Go SDK Unit Tests as a post commit.
  * Or better, to avoid pre-commit-run duplication, run the integration tests 
against the ULR if other runners are doing so.
* Have the integration tests run against the Dataflow, be represented in the 
column.

This will set the basis for adding and the integration tests against other 
portable runners (Flink, Spark, Python ULR, future portable runners...)

It looks like there are three bits of work to accomplish here:

* Adjust the Gradle tasks/task names to accurately represent what they're 
running against.
* Add the new Jenkins tasks for each of the runners. (The other languages call 
these ValidateRunner_ tasks), 
* Add the cool "badges" to the new Jenkins tasks to the Post Commit rubric.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6954:

Description: 
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - when we serialize the json, it shouldn't serialize null values at 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655]

 

Instinctively I would have expected the @Default.* annotations producing values 
every single time, when the value is null - so a property with a @Default.* 
annotation can't be null - but I was unable to find anything explicit regarding 
this in the documentation. So I'm not sure which of the suggested change has to 
be made.

---

Okay, I have investigated further, and it seems the default value is indeed 
calculated before the json serialization by calling the mentioned method. The 
problem is that it returns a RuntimeValueProvider, which gets serialized as 
null ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279]
 ), because the isAccessible returns false ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250]
 + 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220]
 )

... and then during deserialization it is found in the jsonOptions at ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 ), so it executes the getValueFromJson which uses an ObjectMapper to create a 
ValueProvider from a NullNode ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L498]
 ) . 

The problem here is that according to the JsonDeserializer documentation, the 
deserialize method isn't executed for null nodes. -> 
[https://static.javadoc.io/com.fasterxml.jackson.core/jackson-databind/2.9.6/com/fasterxml/jackson/databind/JsonDeserializer.html#deserialize(com.fasterxml.jackson.core.JsonParser,%20com.fasterxml.jackson.databind.DeserializationContext)]

That requires overriding the getNullValue(DeserializationContext ctxt) method.

  was:
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - when we serialize

[jira] [Work logged] (BEAM-6914) Use BigQuerySink as default for 2.12.

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6914?focusedWorklogId=221399&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221399
 ]

ASF GitHub Bot logged work on BEAM-6914:


Author: ASF GitHub Bot
Created on: 01/Apr/19 18:16
Start Date: 01/Apr/19 18:16
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #8170: 
[release-2.12.0][BEAM-6914] Reverting behavior of Native BQ sink in Python 
(#8143)
URL: https://github.com/apache/beam/pull/8170#issuecomment-478687364
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221399)
Time Spent: 5h 50m  (was: 5h 40m)

> Use BigQuerySink as default for 2.12.
> -
>
> Key: BEAM-6914
> URL: https://issues.apache.org/jira/browse/BEAM-6914
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.12.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6876) User state cleanup in portable Flink runner

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6876?focusedWorklogId=221398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221398
 ]

ASF GitHub Bot logged work on BEAM-6876:


Author: ASF GitHub Bot
Created on: 01/Apr/19 18:15
Start Date: 01/Apr/19 18:15
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #8173: [release-2.12] 
[BEAM-6876] Cleanup user state in portable Flink Runner
URL: https://github.com/apache/beam/pull/8173#issuecomment-478687325
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221398)
Time Spent: 5.5h  (was: 5h 20m)

> User state cleanup in portable Flink runner
> ---
>
> Key: BEAM-6876
> URL: https://issues.apache.org/jira/browse/BEAM-6876
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability-flink, triaged
> Fix For: 2.12.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> State is currently not being cleaned up by the runner.
> [https://lists.apache.org/thread.html/86f0809fbfa3da873051287b9ff249d6dd5a896b45409db1e484cf38@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6738) Reduce Combine overhead

2019-04-01 Thread Robert Burke (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke resolved BEAM-6738.

   Resolution: Fixed
Fix Version/s: Not applicable

> Reduce Combine overhead
> ---
>
> Key: BEAM-6738
> URL: https://issues.apache.org/jira/browse/BEAM-6738
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Sibling to BEAM-4726 for ParDos, this should be to add the invoker caching to 
> the exec/combine.go units, since for example AddInput would be done for every 
> single element, and for large key spaces, the same applies for the other 
> CombineFn components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6745) Cannot run pipeline on Dataflow (GO SDK)

2019-04-01 Thread Robert Burke (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779662#comment-16779662
 ] 

Robert Burke edited comment on BEAM-6745 at 4/1/19 6:10 PM:


There's no *dataflow side* documentation of the SDK, or any statements of 
support that I'm aware of. At present, if you use the Go SDK on Dataflow, you 
do so at your own risk. While Google does fund some work on Apache Beam, the Go 
SDK not currently among the set of things that are being maintained at 
production quality on Dataflow. ergo, things will break, documentation will 
become stale.

It's a quirk of portability that it enables "unofficial" language SDK support 
on any compatible runner. However, there's nothing guaranteeing this, and 
there's no effort to maintain anything around it, (as determined by this bug).

Official support would include be the version of the dataflow library to 
provide a compatible, versioned SDK container, without users ever needing to 
specify anything, and that tests for certain versions of the SDK run 
successfully against the service and similar. 

In short, it's a matter of it Can run on dataflow, but not necessarily that 
folks should use it.

I'm hoping to be able to change that, but I can't speak to any timelines at 
present.

Edit: I think the point I'm trying to make here is that the Go SDK tries to 
support Dataflow, but that Dataflow, as a paid service, doesn't support the Go 
SDK, as there are certain expectations once money gets involved. 

 


was (Author: lostluck):
There's no *dataflow side* documentation of the SDK, or any statements of 
support that I'm aware of. At present, if you use the Go SDK on Dataflow, you 
do so at your own risk. While Google does fund some work on Apache Beam.

It's a quirk of portability that it enables "unofficial" language SDK support 
on any compatible runner. However, there's nothing guaranteeing this, and 
there's no effort to maintain anything around it, (as determined by this bug).

Official support would include be the version of the dataflow library to 
provide a compatible, versioned SDK container, without users ever needing to 
specify anything, and that tests for certain versions of the SDK run 
successfully against the service and similar. 

In short, it's a matter of it Can run on dataflow, but not necessarily that 
folks should use it.

I'm hoping to be able to change that, but I can't speak to any timelines at 
present.

Edit: I think the point I'm trying to make here is that the Go SDK tries to 
support Dataflow, but that Dataflow, as a paid service, doesn't support the Go 
SDK, as there are certain expectations once money gets involved. 

 

> Cannot run pipeline on Dataflow (GO SDK)
> 
>
> Key: BEAM-6745
> URL: https://issues.apache.org/jira/browse/BEAM-6745
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Michael Chemani
>Priority: Major
>
> I got 
> ```
> {{Failed to retrieve staged files: failed to retrieve worker in 3 attempts: 
> bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; 
> bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; 
> bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ; 
> bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want}}
> ```
>  
> When trying to run 
> ```
> {{dataflow \ --runner dataflow \ --index gs://\{BUCKET}/data_100k.csv \ 
> --output gs://\{BUCKET}/ \ --project {PROJECT} \ --temp_location 
> gs://\{BUCKET}/tmp/ \ --staging_location gs://\{BUCKET}/binaries/ \ 
> --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515}}
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221390
 ]

ASF GitHub Bot logged work on BEAM-6502:


Author: ASF GitHub Bot
Created on: 01/Apr/19 18:08
Start Date: 01/Apr/19 18:08
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8146: [BEAM-6502] 
Re-Remove runner time execution information from public API surface (now 
including Watch)
URL: https://github.com/apache/beam/pull/8146#discussion_r270987570
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackers.java
 ##
 @@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.fn.splittabledofn;
+
+import javax.annotation.concurrent.ThreadSafe;
+import org.apache.beam.sdk.transforms.splittabledofn.Backlog;
+import org.apache.beam.sdk.transforms.splittabledofn.Backlogs;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
+
+/** Support utilities for interacting with {@link RestrictionTracker 
RestrictionTrackers}. */
+public class RestrictionTrackers {
+
+  /** Interface allowing a runner to observe the calls to {@link 
RestrictionTracker#tryClaim}. */
+  public interface ClaimObserver {
+/** Called when {@link RestrictionTracker#tryClaim} returns true. */
+void onClaimed(PositionT position);
+
+/** Called when {@link RestrictionTracker#tryClaim} returns false. */
+void onClaimFailed(PositionT position);
+  }
+
+  /**
+   * A {@link RestrictionTracker} which forwards all calls to the delegate 
{@link
+   * RestrictionTracker}.
+   */
+  @ThreadSafe
+  private static class RestrictionTrackerObserver
+  extends RestrictionTracker {
+protected final RestrictionTracker delegate;
+private final ClaimObserver claimObserver;
+
+protected RestrictionTrackerObserver(
+RestrictionTracker delegate,
+ClaimObserver claimObserver) {
+  this.delegate = delegate;
+  this.claimObserver = claimObserver;
+}
+
+@Override
+public synchronized boolean tryClaim(PositionT position) {
+  if (delegate.tryClaim(position)) {
+claimObserver.onClaimed(position);
+return true;
+  } else {
+claimObserver.onClaimFailed(position);
+return false;
+  }
+}
+
+@Override
+public synchronized RestrictionT currentRestriction() {
+  return delegate.currentRestriction();
+}
+
+@Override
+public synchronized RestrictionT checkpoint() {
+  return delegate.checkpoint();
+}
+
+@Override
+public synchronized void checkDone() throws IllegalStateException {
+  delegate.checkDone();
+}
+  }
+
+  /**
+   * A {@link RestrictionTracker} which forwards all calls to the delegate 
backlog reporting {@link
+   * RestrictionTracker}.
+   */
+  @ThreadSafe
+  private static class RestrictionTrackerObserverWithBacklog
+  extends RestrictionTrackerObserver implements 
Backlogs.HasBacklog {
+
+protected RestrictionTrackerObserverWithBacklog(
+RestrictionTracker delegate,
+ClaimObserver claimObserver) {
+  super(delegate, claimObserver);
+}
+
+@Override
+public synchronized Backlog getBacklog() {
+  return ((Backlogs.HasBacklog) delegate).getBacklog();
+}
+  }
+
+  /**
+   * A {@link RestrictionTracker} which forwards all calls to the delegate 
partitioned backlog
+   * reporting {@link RestrictionTracker}.
+   */
+  @ThreadSafe
+  private static class 
RestrictionTrackerObserverWithPartitionedBacklog
+  extends RestrictionTrackerObserverWithBacklog
+  implements Backlogs.HasPartitionedBacklog {
+
+protected RestrictionTrackerObserverWithPartitionedBacklog(
+RestrictionTracker delegate,
+ClaimObserver claimObserver) {
+  super(delegate, claimObserver);
+}
+
+@Override
+public synchronized byte[] getBacklogPartition() {
+  return ((Backlogs.HasPartitionedBacklog) delegate).getBacklogPartition();
+}
+  }
+
+  /**
+   * Returns a thread safe {@

[jira] [Work logged] (BEAM-6753) Create proto representation for schemas

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6753?focusedWorklogId=221391&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221391
 ]

ASF GitHub Bot logged work on BEAM-6753:


Author: ASF GitHub Bot
Created on: 01/Apr/19 18:08
Start Date: 01/Apr/19 18:08
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7952: [BEAM-6753] 
Set the stage to make schema coder update compatible
URL: https://github.com/apache/beam/pull/7952#discussion_r270987733
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
 ##
 @@ -285,15 +289,24 @@ public InstrumentedType prepare(InstrumentedType 
instrumentedType) {
 // per-field Coders.
 static Row decodeDelegate(Schema schema, Coder[] coders, InputStream 
inputStream)
 throws IOException {
+  int fieldCount = VAR_INT_CODER.decode(inputStream);
 
 Review comment:
   Unfortunately no. BitSet.size() returns the total space used, which rounds 
up to the next word size.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221391)
Time Spent: 50m  (was: 40m)

> Create proto representation for schemas
> ---
>
> Key: BEAM-6753
> URL: https://issues.apache.org/jira/browse/BEAM-6753
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221387&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221387
 ]

ASF GitHub Bot logged work on BEAM-6502:


Author: ASF GitHub Bot
Created on: 01/Apr/19 18:04
Start Date: 01/Apr/19 18:04
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8146: [BEAM-6502] 
Re-Remove runner time execution information from public API surface (now 
including Watch)
URL: https://github.com/apache/beam/pull/8146#discussion_r270979431
 
 

 ##
 File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java
 ##
 @@ -512,6 +513,7 @@ public GenericClass apply(Long i) {
 }
 
 @Test
+@Ignore("https://issues.apache.org/jira/browse/BEAM-6352";)
 
 Review comment:
   I missed it, thought only WatchTest was impacted. Fixed here and elsewhere 
as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221387)
Time Spent: 2h  (was: 1h 50m)

> SplittableDoFn: Re-Remove runner time execution information from public API 
> surface
> ---
>
> Key: BEAM-6502
> URL: https://issues.apache.org/jira/browse/BEAM-6502
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: triaged
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6830) Direct runner throws error "http 404 not found" when reading Bigquery tables from any other region than "US"

2019-04-01 Thread Aizhamal Nurmamat kyzy (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aizhamal Nurmamat kyzy reassigned BEAM-6830:


Assignee: Pablo Estrada  (was: Aizhamal Nurmamat kyzy)

> Direct runner throws error "http 404 not found" when reading Bigquery tables 
> from any other region than "US"
> 
>
> Key: BEAM-6830
> URL: https://issues.apache.org/jira/browse/BEAM-6830
> Project: Beam
>  Issue Type: Bug
>  Components: beam-events
>Reporter: Suraj
>Assignee: Pablo Estrada
>Priority: Major
>
> When trying to read bigquery table located in region "asia-southeast-1" using 
> DirectRunner ,it throws error "http 404 not found" but same code works when 
> run using DataflowRunner



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=221379&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221379
 ]

ASF GitHub Bot logged work on BEAM-5723:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:52
Start Date: 01/Apr/19 17:52
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8178: [BEAM-5723] Do 
not use the default shadow closure when building CassandraIO
URL: https://github.com/apache/beam/pull/8178#issuecomment-478679044
 
 
   Hmm, I am not exactly sure why, but the failure is listing essentially every 
class. It may have to do with not having this block:
   
   ```
   dependencies {
 include(dependency(project.library.java.guava))
   }
   ```
   
   Everything else in the `DEFAULT_SHADOW_CLOSURE` is just relocations of Guava.
   
   Almost everything causing the relocation error should be OK, and should not 
be in the shaded jar.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221379)
Time Spent: 9h 20m  (was: 9h 10m)

> CassandraIO is broken because of use of bad relocation of guava
> ---
>
> Key: BEAM-5723
> URL: https://issues.apache.org/jira/browse/BEAM-5723
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 2.12.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221378&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221378
 ]

ASF GitHub Bot logged work on BEAM-6502:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:47
Start Date: 01/Apr/19 17:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8146: [BEAM-6502] 
Re-Remove runner time execution information from public API surface (now 
including Watch)
URL: https://github.com/apache/beam/pull/8146#discussion_r270979431
 
 

 ##
 File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/AvroIOTest.java
 ##
 @@ -512,6 +513,7 @@ public GenericClass apply(Long i) {
 }
 
 @Test
+@Ignore("https://issues.apache.org/jira/browse/BEAM-6352";)
 
 Review comment:
   I missed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221378)
Time Spent: 1h 50m  (was: 1h 40m)

> SplittableDoFn: Re-Remove runner time execution information from public API 
> surface
> ---
>
> Key: BEAM-6502
> URL: https://issues.apache.org/jira/browse/BEAM-6502
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: triaged
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6957) Spark Portable Runner: Support metrics

2019-04-01 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-6957:
-

 Summary: Spark Portable Runner: Support metrics
 Key: BEAM-6957
 URL: https://issues.apache.org/jira/browse/BEAM-6957
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=221376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221376
 ]

ASF GitHub Bot logged work on BEAM-5723:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:45
Start Date: 01/Apr/19 17:45
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8178: [BEAM-5723] Do 
not use the default shadow closure when building CassandraIO
URL: https://github.com/apache/beam/pull/8178#issuecomment-478676487
 
 
   Ah, no it is just that excludes are not additive to the default set.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221376)
Time Spent: 9h 10m  (was: 9h)

> CassandraIO is broken because of use of bad relocation of guava
> ---
>
> Key: BEAM-5723
> URL: https://issues.apache.org/jira/browse/BEAM-5723
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 2.12.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221375
 ]

ASF GitHub Bot logged work on BEAM-6953:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:44
Start Date: 01/Apr/19 17:44
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8188: [BEAM-6953] 
Make bq constants args
URL: https://github.com/apache/beam/pull/8188#discussion_r270978470
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
 ##
 @@ -53,4 +53,16 @@
   Integer getInsertBundleParallelism();
 
   void setInsertBundleParallelism(Integer parallelism);
+
+  @Description("The number of buckets used per table when doing streaming 
inserts to BigQuery.")
+  @Default.Integer(50)
+  Integer getNumStreamingBuckets();
 
 Review comment:
   fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221375)
Time Spent: 0.5h  (was: 20m)

> BigQueryIO has constants that should be PipelineOptions
> ---
>
> Key: BEAM-6953
> URL: https://issues.apache.org/jira/browse/BEAM-6953
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6502) SplittableDoFn: Re-Remove runner time execution information from public API surface

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6502?focusedWorklogId=221374&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221374
 ]

ASF GitHub Bot logged work on BEAM-6502:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:44
Start Date: 01/Apr/19 17:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8146: [BEAM-6502] 
Re-Remove runner time execution information from public API surface (now 
including Watch)
URL: https://github.com/apache/beam/pull/8146#discussion_r270978398
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java
 ##
 @@ -228,6 +253,23 @@ Instant getWatermark() {
 }
 return res;
   }
+
+  @Override
+  public boolean equals(Object o) {
 
 Review comment:
   There was no difference, swapped to use autovalue. Its just a remnant of the 
old way this was written.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221374)
Time Spent: 1h 40m  (was: 1.5h)

> SplittableDoFn: Re-Remove runner time execution information from public API 
> surface
> ---
>
> Key: BEAM-6502
> URL: https://issues.apache.org/jira/browse/BEAM-6502
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: triaged
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5723) CassandraIO is broken because of use of bad relocation of guava

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5723?focusedWorklogId=221373&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221373
 ]

ASF GitHub Bot logged work on BEAM-5723:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:43
Start Date: 01/Apr/19 17:43
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8178: [BEAM-5723] Do 
not use the default shadow closure when building CassandraIO
URL: https://github.com/apache/beam/pull/8178#issuecomment-478675846
 
 
   This actually exposed a huge bug in our build, if I understand it. It isn't 
just the Guava classes used here, but all of the ones relocated by any other 
module.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221373)
Time Spent: 9h  (was: 8h 50m)

> CassandraIO is broken because of use of bad relocation of guava
> ---
>
> Key: BEAM-5723
> URL: https://issues.apache.org/jira/browse/BEAM-5723
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 2.12.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6956) --experiments=worker_threads=100 issue

2019-04-01 Thread Jiayi Zhao (JIRA)
Jiayi Zhao created BEAM-6956:


 Summary: --experiments=worker_threads=100 issue
 Key: BEAM-6956
 URL: https://issues.apache.org/jira/browse/BEAM-6956
 Project: Beam
  Issue Type: New Feature
  Components: runner-flink
Reporter: Jiayi Zhao


I noticed that without this "–experiments=worker_threads=100", pipeline will 
stuck,

The weird thing is, I tried some complex pipeline using the in thread flink 
method (./gradlew :beam-runners-flink_2.11-job-server:runShadow)

"–experiments=worker_threads=100" doesn't work, but 
"–experiments=worker_threads=1000" works fine

Then I tried the same pipeline using the separate local flink cluster 
(./gradlew :beam-runners-flink_2.11-job-server:runShadow 
-PflinkMasterUrl=localhost:8081), flink version is 1.5.6 (other version doesn't 
work, see BEAM-6915)

Neither did "–experiments=worker_threads=1000" or 
"–experiments=worker_threads=1" work, pipeline stuck at certain stage 
(shows running in flink UI but won't finish forever)

any real fix to that? Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6865) Refactor common portable runner infrastructure for better reuse

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6865?focusedWorklogId=221370&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221370
 ]

ASF GitHub Bot logged work on BEAM-6865:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:35
Start Date: 01/Apr/19 17:35
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #8176: [BEAM-6865] Move 
MetricsApi updates from flink.metrics to core.metrics
URL: https://github.com/apache/beam/pull/8176#issuecomment-478673312
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221370)
Time Spent: 2h  (was: 1h 50m)

> Refactor common portable runner infrastructure for better reuse
> ---
>
> Key: BEAM-6865
> URL: https://issues.apache.org/jira/browse/BEAM-6865
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The Flink runner is currently Beam's most mature portable OSS runner. Much of 
> the Flink portable runner's implementation details are not unique to Flink, 
> and yet are confined to the Flink runner code. In order to ease development 
> on other portable runners such as the Spark runner, this reusable code should 
> be moved into some common location.
> I've set this up to track my progress on these ongoing improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6915) Issue when run pipeline on a separate Flink cluster

2019-04-01 Thread Jiayi Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807015#comment-16807015
 ] 

Jiayi Zhao commented on BEAM-6915:
--

I git clone the head beam, Flink 1.7.2 doesn't work (above error), Flink 1.5.4 
works except pipeline.wait_until_finish() call has a exception, Flink 1.5.6 
works fine for a simple word count example. 

> Issue when run pipeline on a separate Flink cluster
> ---
>
> Key: BEAM-6915
> URL: https://issues.apache.org/jira/browse/BEAM-6915
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Jiayi Zhao
>Priority: Major
>
> First I tried a simple pipeline on the JobService endpoint created by:
>   ./gradlew :beam-runners-flink_2.11-job-server:runShadow
> it works, then I tried the following examples:
>   _To run on a separate [Flink 
> cluster|https://ci.apache.org/projects/flink/flink-docs-release-1.5/quickstart/setup_quickstart.html]:_
>   _1. Start a Flink cluster which exposes the Rest interface on 
> {{localhost:8081}} by default._
>   _2. Start JobService with Flink Rest endpoint: {{./gradlew 
> :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081}}._
>   _3. Submit the pipeline as above._
> when I run the pipeline in another console, the jobService console shows 
> following errors, any ideas?
>  
> _$ ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081_
> _Configuration on demand is an incubating feature._
> _> Task :beam-runners-flink_2.11-job-server:runShadow_
> _Listening for transport dt_socket at address: 5005_
> _[main] INFO 
> org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - 
> ArtifactStagingService started on localhost:8098_
> _[main] INFO 
> org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - Java 
> ExpansionService started on localhost:8097_
> _[main] INFO 
> org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - 
> JobService started on localhost:8099_
> _[grpc-default-executor-0] ERROR 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - 
> Encountered Unexpected Exception for Invocation 
> job_e3ca1015-d683-47df-beb5-104ccbb5a457_
> _org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: NOT_FOUND_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534)_
>  _at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341)_
>  _at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262)_
>  _at 
> org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:770)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)_
>  _at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)_
>  _at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)_
>  _at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)_
>  _at java.lang.Thread.run(Thread.java:748)_
> _[grpc-default-executor-0] INFO org.apache.beam.runners.flink.FlinkJobInvoker 
> - Invoking job 
> BeamApp-jyzhao-0326181339-95e448eb_3b2a57a8-c8bc-463e-8be6-47890deb48b4_
> _[grpc-default-executor-0] INFO 
> org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation - Starting 
> job invocation 
> BeamApp-jyzhao-0326181339-95e448eb_3b2a57a8-c8bc-463e-8be6-47890deb48b4_
> _[flink-runner-job-invoker] INFO 
> org.apache.beam.runners.flink.FlinkPipelineRunner - Translating pipeline to 
> Flink program._
> _[flink-runner-job-invoker] INFO 
> org.apache.beam.r

[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221365
 ]

ASF GitHub Bot logged work on BEAM-6953:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:32
Start Date: 01/Apr/19 17:32
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8188: 
[BEAM-6953] Make bq constants args
URL: https://github.com/apache/beam/pull/8188#discussion_r270974111
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java
 ##
 @@ -53,4 +53,16 @@
   Integer getInsertBundleParallelism();
 
   void setInsertBundleParallelism(Integer parallelism);
+
+  @Description("The number of buckets used per table when doing streaming 
inserts to BigQuery.")
+  @Default.Integer(50)
+  Integer getNumStreamingBuckets();
 
 Review comment:
   I think it is failing because you have to write the setter boilerplate.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221365)
Time Spent: 20m  (was: 10m)

> BigQueryIO has constants that should be PipelineOptions
> ---
>
> Key: BEAM-6953
> URL: https://issues.apache.org/jira/browse/BEAM-6953
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6952) concatenated compressed files bug with python sdk

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6952?focusedWorklogId=221364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221364
 ]

ASF GitHub Bot logged work on BEAM-6952:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:32
Start Date: 01/Apr/19 17:32
Worklog Time Spent: 10m 
  Work Description: dlesco commented on issue #8187: [BEAM-6952] 
concatenated compressed files bug with python sdk
URL: https://github.com/apache/beam/pull/8187#issuecomment-478672026
 
 
   Updated branch with commit to replace in the unit test the use of xrange 
with six.moves.range, so that the unit test will run under Python 3.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221364)
Time Spent: 20m  (was: 10m)

> concatenated compressed files bug with python sdk
> -
>
> Key: BEAM-6952
> URL: https://issues.apache.org/jira/browse/BEAM-6952
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
>Reporter: Daniel Lescohier
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Python apache_beam.io.filesystem module has a bug handling concatenated 
> compressed files.
> The PR I will create has two commits:
>  # a new unit test that shows the problem 
>  # a fix to the problem.
> The unit test is added to the apache_beam.io.filesystem_test module. It was 
> added to this module because the test: 
> apache_beam.io.textio_test.test_read_gzip_concat does not encounter the 
> problem in the Beam 2.11 and earlier code base because the test data is too 
> small: the data is smaller than read_size, so it goes through logic in the 
> code that avoids the problem in the code. So, this test sets read_size 
> smaller and test data bigger, in order to encounter the problem. It would be 
> difficult to test in the textio_test module, because you'd need very large 
> test data because default read_size is 16MiB, and the ReadFromText interface 
> does not allow you to modify the read_size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6954:

Description: 
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - when we serialize the json, it shouldn't serialize null values at 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655]

 

Instinctively I would have expected the @Default.* annotations producing values 
every single time, when the value is null - so a property with a @Default.* 
annotation can't be null - but I was unable to find anything explicit regarding 
this in the documentation. So I'm not sure which of the suggested change has to 
be made.

---

Okay, I have investigated further, and it seems the default value is indeed 
calculated before the json serialization by calling the mentioned method. The 
problem is that it returns a RuntimeValueProvider, which gets serialized as 
null ( 
https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L275-L279
 ), because the isAccessible returns false ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L248-L250]
 + 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L214-L220]
 )

  was:
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - when we serialize the json, it shouldn't serialize null values at 
https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655

 

Instinctively I would have expected the @Default.* annotations producing values 
every single time, when the value is null - so a property with a @Default.* 
annotation can't be null - but I was unable to find anything explicit regarding 
this in the documentation. So I'm not sure which of the suggested change has to 
be made.

 


> @Default not called if the options json has null value for a property
> -
>
> Key: BEAM-6954
> URL: https://issues.apache.org/jira/browse/BEAM-6954
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority:

[jira] [Commented] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-04-01 Thread Michael Luckey (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807009#comment-16807009
 ] 

Michael Luckey commented on BEAM-4046:
--

Thanks, Luke, for pointing that out. Indeed, I have not yet tested deeply, so 
it might happen that those issues are difficult to solve.

Regarding the backwards compatibility, the 'best' I came up with, apart from 
having own gradle distribution with hacked task resolution or rewriting 
Gradle-wrapper.jar is to hook into parameter resolution of gradlew resp 
gradlew.bat.

This would imply to (temporarily!) replace the current scripts by some enhanced 
implementation which would map any param starting with ':beam-' to the 
corresponding target project. This could possibly work, but feels *really* 
hacky :(

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6872) Add hook for user-defined JVM initialization in workers

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6872?focusedWorklogId=221363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221363
 ]

ASF GitHub Bot logged work on BEAM-6872:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:14
Start Date: 01/Apr/19 17:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8104: [BEAM-6872] 
Add hook for user-defined JVM initialization in workers
URL: https://github.com/apache/beam/pull/8104#discussion_r270967393
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowWorkerHarnessHelper.java
 ##
 @@ -95,6 +97,17 @@ public static void 
configureLogging(DataflowWorkerHarnessOptions pipelineOptions
 DataflowWorkerLoggingInitializer.configure(pipelineOptions);
   }
 
+  public static void runUserDefinedInitialization() {
+ServiceLoader loader =
 
 Review comment:
   Sounds worthwhile. Since it has been copied a few times now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221363)
Time Spent: 1h 10m  (was: 1h)

> Add hook for user-defined JVM initialization in workers
> ---
>
> Key: BEAM-6872
> URL: https://issues.apache.org/jira/browse/BEAM-6872
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Expose an interface for users to run some one-time initialization code when a 
> worker starts up.
> This can be useful for things like overriding the Default ZoneRulesProvider, 
> or setting up custom SSL providers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-04-01 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806991#comment-16806991
 ] 

Luke Cwik edited comment on BEAM-4046 at 4/1/19 5:06 PM:
-

Michael, what your suggesting makes sense to me. We also ran into two other 
problems:
* we had more then one Gradle project with the same directory name even though 
they were under a different parent folder (I think it was "core") and that was 
leading to some strange build time behavior.
* we use the project names during [javadoc 
generation|https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle].


was (Author: lcwik):
Michael, what your suggesting makes sense to me. We also ran into two other 
problems:

During the gradle migration, we used to have something like:
* we had more then one Gradle project with the same directory name even though 
they were under a different parent folder (I think it was "core") and that was 
leading to some strange build time behavior.
* we use the project names during [javadoc 
generation|https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle].

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-04-01 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806991#comment-16806991
 ] 

Luke Cwik commented on BEAM-4046:
-

Michael, what your suggesting makes sense to me. We also ran into two other 
problems:

During the gradle migration, we used to have something like:
* we had more then one Gradle project with the same directory name even though 
they were under a different parent folder (I think it was "core") and that was 
leading to some strange build time behavior.
* we use the project names during [javadoc 
generation|https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle].

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Priority: Major
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6955) Support Dataflow --sdk_location with modified version number

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6955?focusedWorklogId=221356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221356
 ]

ASF GitHub Bot logged work on BEAM-6955:


Author: ASF GitHub Bot
Created on: 01/Apr/19 17:00
Start Date: 01/Apr/19 17:00
Worklog Time Spent: 10m 
  Work Description: dlesco commented on pull request #8189: [BEAM-6955] 
Support Dataflow --sdk_location with modified version number
URL: https://github.com/apache/beam/pull/8189
 
 
   Determine the version tag to use for the Google Container Registry,
   for the service image versions to use on the Dataflow worker nodes.
   Users of Dataflow may be using a locally-modified version of
   Apache Beam, which they submit to Dataflow with the
   --sdk_location option. Those users would most likely modify the
   version number of Apache Beam, so they can distinguish it from the
   public distribution of Apache Beam.
   However, the remote nodes in Dataflow still need to bootsrap the
   worker service with a Docker image that a version tag exists for.
   
   The most appropriate way for system integrators to modify the
   Apache Beam version number would be to add a Local Version Identifier:
   https://www.python.org/dev/peps/pep-0440/#local-version-identifiers
   
   If people only use Local Version Identifiers, then we could use
   the "public" attribute of the pkg_resources version object.
   
   If people instead use a post-release version identifier:
   https://www.python.org/dev/peps/pep-0440/#post-releases
   then only the "base_version" attribute would work both of these
   version number changes.
   
   Since Dataflow documentation does not specify how to modify
   version numbers, I am choosing to use "base_version" attribute.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https:/

[jira] [Created] (BEAM-6955) Support Dataflow --sdk_location with modified version number

2019-04-01 Thread Daniel Lescohier (JIRA)
Daniel Lescohier created BEAM-6955:
--

 Summary: Support Dataflow --sdk_location with modified version 
number
 Key: BEAM-6955
 URL: https://issues.apache.org/jira/browse/BEAM-6955
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Affects Versions: 2.11.0
Reporter: Daniel Lescohier


Support Dataflow --sdk_location with modified version number

Determine the version tag to use for the Google Container Registry, for the 
service image versions to use on the Dataflow worker nodes. Users of Dataflow 
may be using a locally-modified version of Apache Beam, which they submit to 
Dataflow with the --sdk_location option. Those users would most likely modify 
the version number of Apache Beam, so they can distinguish it from the public 
distribution of Apache Beam. However, the remote nodes in Dataflow still need 
to bootsrap the worker service with a Docker image that a version tag exists 
for. 

The most appropriate way for system integrators to modify the Apache Beam 
version number would be to add a Local Version Identifier: 
https://www.python.org/dev/peps/pep-0440/#local-version-identifiers

If people only use Local Version Identifiers, then we could use the "public" 
attribute of the pkg_resources version object.

If people instead use a post-release version identifier: 
https://www.python.org/dev/peps/pep-0440/#post-releases then only the 
"base_version" attribute would work both of these version number changes. 

Since Dataflow documentation does not specify how to modify version numbers, I 
am choosing to use "base_version" attribute.

Will shortly submit a PR with the change.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6954:

Component/s: (was: runner-dataflow)

> @Default not called if the options json has null value for a property
> -
>
> Key: BEAM-6954
> URL: https://issues.apache.org/jira/browse/BEAM-6954
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority: Major
>
> When a pipeline options get deserialized from a json with 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
>  it creates a map, where properties present in the json - even if with a null 
> value - will be added to the map.
> So we can have String->NullNode mappings.
> When you create a ProxyInvocationHandler with this Map ( 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
>  ) this map will be the backing jsonOptions map.
> Later on when a getter is called on the options it will reach this code: 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
>  
> Then the containsKey will return true, even for NullNodes. So we won't 
> execute the getDefault() method hence not calculating the default value.
>  
> I'm not sure about the expected behaviour, but either:
>  - the containsKey check should be expanded with an !isNull check
>  OR
>  - when we serialize the json, it shouldn't serialize null values at 
> https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655
>  
> Instinctively I would have expected the @Default.* annotations producing 
> values every single time, when the value is null - so a property with a 
> @Default.* annotation can't be null - but I was unable to find anything 
> explicit regarding this in the documentation. So I'm not sure which of the 
> suggested change has to be made.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6954:

Description: 
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - when we serialize the json, it shouldn't serialize null values at 
https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L653-L655

 

Instinctively I would have expected the @Default.* annotations producing values 
every single time, when the value is null - so a property with a @Default.* 
annotation can't be null - but I was unable to find anything explicit regarding 
this in the documentation. So I'm not sure which of the suggested change has to 
be made.

 

  was:
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - the dataflow runner have to be modified so it doesn't persist null values at 
options here: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222]

 

Instinctively I would have expected the @Default.* annotations producing values 
every single time, when the value is null - so a property with a @Default.* 
annotation can't be null - but I was unable to find anything explicit regarding 
this in the documentation. So I'm not sure which of the suggested change has to 
be made.

 


> @Default not called if the options json has null value for a property
> -
>
> Key: BEAM-6954
> URL: https://issues.apache.org/jira/browse/BEAM-6954
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority: Major
>
> When a pipeline options get deserialized from a json with 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
>  it creates a map, where properties present in the json - even if with a null 
> value - will be added to the map.
> So we can have String->NullNode mappings.
> When you create a ProxyInvocationHandler with this Map ( 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
>  ) this map will be the backing jsonOptions map.
> Later on when a getter is 

[jira] [Work logged] (BEAM-5959) Add Cloud KMS support to GCS copies

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5959?focusedWorklogId=221354&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221354
 ]

ASF GitHub Bot logged work on BEAM-5959:


Author: ASF GitHub Bot
Created on: 01/Apr/19 16:54
Start Date: 01/Apr/19 16:54
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #7744: [BEAM-5959] KMS 
support for BigQuery
URL: https://github.com/apache/beam/pull/7744#discussion_r270960080
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1534,6 +1550,10 @@ static String getExtractDestinationUri(String 
extractDestinationDir) {
   return toBuilder().setIgnoreUnknownValues(true).build();
 }
 
+Write withKmsKey(String kmsKey) {
 
 Review comment:
   @mayansalama I fixed this here: https://github.com/apache/beam/pull/8145. I 
should have updated you.
   @kennknowles KMS support is too new to be in 2.7. :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221354)
Time Spent: 31.5h  (was: 31h 20m)

> Add Cloud KMS support to GCS copies
> ---
>
> Key: BEAM-5959
> URL: https://issues.apache.org/jira/browse/BEAM-5959
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Labels: triaged
> Fix For: 2.11.0
>
>  Time Spent: 31.5h
>  Remaining Estimate: 0h
>
> Beam SDK currently uses the CopyTo GCS API call, which doesn't support 
> copying objects that Customer Managed Encryption Keys (CMEK).
> CMEKs are managed in Cloud KMS.
> Items (for Java and Python SDKs):
> - Update clients to versions that support KMS keys.
> - Change copyTo API calls to use rewriteTo (Python - directly, Java - 
> possibly convert copyTo API call to use client library)
> - Add unit tests.
> - Add basic tests (DirectRunner and GCS buckets with CMEK).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6954:

Description: 
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:
 - the containsKey check should be expanded with an !isNull check
 OR
 - the dataflow runner have to be modified so it doesn't persist null values at 
options here: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222]

 

Instinctively I would have expected the @Default.* annotations producing values 
every single time, when the value is null - so a property with a @Default.* 
annotation can't be null - but I was unable to find anything explicit regarding 
this in the documentation. So I'm not sure which of the suggested change has to 
be made.

 

  was:
When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:

- the containsKey check should be expanded with an !isNull check
OR
- the dataflow runner have to be modified so it doesn't persist null values at 
options here: 
https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222


> @Default not called if the options json has null value for a property
> -
>
> Key: BEAM-6954
> URL: https://issues.apache.org/jira/browse/BEAM-6954
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority: Major
>
> When a pipeline options get deserialized from a json with 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
>  it creates a map, where properties present in the json - even if with a null 
> value - will be added to the map.
> So we can have String->NullNode mappings.
> When you create a ProxyInvocationHandler with this Map ( 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
>  ) this map will be the backing jsonOptions map.
> Later on when a getter is called on the options it will reach this code: 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
>  
> Then the containsKey will return true, even for NullN

[jira] [Assigned] (BEAM-3489) Expose the message id of received messages within PubsubMessage

2019-04-01 Thread Ahmed El.Hussaini (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed El.Hussaini reassigned BEAM-3489:
---

Assignee: Ahmed El.Hussaini

> Expose the message id of received messages within PubsubMessage
> ---
>
> Key: BEAM-3489
> URL: https://issues.apache.org/jira/browse/BEAM-3489
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Luke Cwik
>Assignee: Ahmed El.Hussaini
>Priority: Minor
>  Labels: newbie, starter
>
> This task is about passing forward the message id from the pubsub proto to 
> the java PubsubMessage.
> Add a message id field to PubsubMessage.
> Update the coder for PubsubMessage to encode the message id.
> Update the translation from the Pubsub proto message to the Dataflow message:
> https://github.com/apache/beam/blob/2e275264b21db45787833502e5e42907b05e28b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L976



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balázs Németh updated BEAM-6954:

Component/s: runner-dataflow

> @Default not called if the options json has null value for a property
> -
>
> Key: BEAM-6954
> URL: https://issues.apache.org/jira/browse/BEAM-6954
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Balázs Németh
>Priority: Major
>
> When a pipeline options get deserialized from a json with 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
>  it creates a map, where properties present in the json - even if with a null 
> value - will be added to the map.
> So we can have String->NullNode mappings.
> When you create a ProxyInvocationHandler with this Map ( 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
>  ) this map will be the backing jsonOptions map.
> Later on when a getter is called on the options it will reach this code: 
> [https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
>  
> Then the containsKey will return true, even for NullNodes. So we won't 
> execute the getDefault() method hence not calculating the default value.
>  
> I'm not sure about the expected behaviour, but either:
> - the containsKey check should be expanded with an !isNull check
> OR
> - the dataflow runner have to be modified so it doesn't persist null values 
> at options here: 
> https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6954) @Default not called if the options json has null value for a property

2019-04-01 Thread JIRA
Balázs Németh created BEAM-6954:
---

 Summary: @Default not called if the options json has null value 
for a property
 Key: BEAM-6954
 URL: https://issues.apache.org/jira/browse/BEAM-6954
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Affects Versions: 2.11.0
Reporter: Balázs Németh


When a pipeline options get deserialized from a json with 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L738-L760]
 it creates a map, where properties present in the json - even if with a null 
value - will be added to the map.

So we can have String->NullNode mappings.

When you create a ProxyInvocationHandler with this Map ( 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L117-L125]
 ) this map will be the backing jsonOptions map.

Later on when a getter is called on the options it will reach this code: 
[https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ProxyInvocationHandler.java#L156-L158]
 

Then the containsKey will return true, even for NullNodes. So we won't execute 
the getDefault() method hence not calculating the default value.

 

I'm not sure about the expected behaviour, but either:

- the containsKey check should be expanded with an !isNull check
OR
- the dataflow runner have to be modified so it doesn't persist null values at 
options here: 
https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L216-L222



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6953?focusedWorklogId=221347&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221347
 ]

ASF GitHub Bot logged work on BEAM-6953:


Author: ASF GitHub Bot
Created on: 01/Apr/19 16:37
Start Date: 01/Apr/19 16:37
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8188: [BEAM-6953] 
Make bq constants args
URL: https://github.com/apache/beam/pull/8188
 
 
   Make several hard-coded constants pipeline options.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 221347)
Time Spent: 10m
Remaining Estimate: 0h

> BigQueryIO has constants that should be PipelineOptions
> ---
>
> Key: BEAM-6953
> URL: https://issues.apache.org/jira/browse/BEAM-6953
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6953) BigQueryIO has constants that should be PipelineOptions

2019-04-01 Thread Reuven Lax (JIRA)
Reuven Lax created BEAM-6953:


 Summary: BigQueryIO has constants that should be PipelineOptions
 Key: BEAM-6953
 URL: https://issues.apache.org/jira/browse/BEAM-6953
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Reporter: Reuven Lax






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >