[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360129
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 09:07
Start Date: 16/Dec/19 09:07
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-565970172
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360129)
Time Spent: 8h 50m  (was: 8h 40m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8967:
---
Status: Open  (was: Triage Needed)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8967:
---
Fix Version/s: 2.18.0
   2.17.0

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0, 2.18.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=360133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360133
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 16/Dec/19 09:25
Start Date: 16/Dec/19 09:25
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10382: [BEAM-8967] 
Declare JSR305 dependency as 'shadow'
URL: https://github.com/apache/beam/pull/10382
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360133)
Time Spent: 1h  (was: 50m)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0, 2.18.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8917) javax.annotation.Nullable is missing for org.apache.beam.sdk.schemas.FieldValueTypeInformation

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8917?focusedWorklogId=360134&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360134
 ]

ASF GitHub Bot logged work on BEAM-8917:


Author: ASF GitHub Bot
Created on: 16/Dec/19 09:35
Start Date: 16/Dec/19 09:35
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10370: 
[release-2.17.0][BEAM-8917] jsr305 dependency declaration for Nullable class 
(#10324)
URL: https://github.com/apache/beam/pull/10370#issuecomment-565980371
 
 
   Oups we discovered one minor issue I will cherry-pick an extra commit here 
and then it is ok to go. For more details 
https://issues.apache.org/jira/browse/BEAM-8967
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360134)
Time Spent: 11h 20m  (was: 11h 10m)

> javax.annotation.Nullable is missing for 
> org.apache.beam.sdk.schemas.FieldValueTypeInformation
> --
>
> Key: BEAM-8917
> URL: https://issues.apache.org/jira/browse/BEAM-8917
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> This ticket is from the result of static analysis by Linkage Checker 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1045])
> h1. Example Project
> Example project to produce an issue: 
> https://github.com/suztomo/beam-java-sdk-missing-nullable .
> I think the Maven artifact {{org.apache.beam:beam-sdks-java-core}}, which 
> contains {{org.apache.beam.sdk.schemas.FieldValueTypeInformation}}, should 
> declare the dependency to {{com.google.code.findbugs:jsr305}}.
> h1. Why there's no problem in compilation and tests of sdks/java/core?
> The compilation succeeds because the {{Nullable}} annotation is in the 
> transitive dependency of compileOnly {{spotbugs-annotations}} dependency:
> {noformat}
> compileOnly - Compile only dependencies for source set 'main'.
> ...
> +--- com.github.spotbugs:spotbugs-annotations:3.1.12
> |\--- com.google.code.findbugs:jsr305:3.0.2
> ...
> {noformat}
> The tests succeed because the {{Nullable}} annotation is in the transitive 
> dependency of {{guava-testlib}}.
> {noformat}
> testRuntime - Runtime dependencies for source set 'test' (deprecated, use 
> 'testRuntimeOnly' instead).
> ...
> +--- com.google.guava:guava-testlib:20.0
> |+--- com.google.code.findbugs:jsr305:1.3.9
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360141
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 09:45
Start Date: 16/Dec/19 09:45
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-565983841
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360141)
Time Spent: 9h  (was: 8h 50m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=360143&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360143
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 16/Dec/19 09:46
Start Date: 16/Dec/19 09:46
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10385: 
[release-2.18.0][BEAM-8967] Maven artifact beam-sdks-java-core does not have 
JSR305 specified as "compile"
URL: https://github.com/apache/beam/pull/10385
 
 
   A tiny error we found when validating the generated pom files from the 
previous cherry-pick. We should be good to go after this one.
   
   R: @udim 
   CC: @suztomo 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360143)
Time Spent: 1h 10m  (was: 1h)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0, 2.18.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8917) javax.annotation.Nullable is missing for org.apache.beam.sdk.schemas.FieldValueTypeInformation

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8917?focusedWorklogId=360144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360144
 ]

ASF GitHub Bot logged work on BEAM-8917:


Author: ASF GitHub Bot
Created on: 16/Dec/19 09:48
Start Date: 16/Dec/19 09:48
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10370: 
[release-2.17.0][BEAM-8917][BEAM-8967] jsr305 dependency declaration for 
Nullable class (#10324)
URL: https://github.com/apache/beam/pull/10370#issuecomment-565984823
 
 
   Run SQL postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360144)
Time Spent: 11.5h  (was: 11h 20m)

> javax.annotation.Nullable is missing for 
> org.apache.beam.sdk.schemas.FieldValueTypeInformation
> --
>
> Key: BEAM-8917
> URL: https://issues.apache.org/jira/browse/BEAM-8917
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> This ticket is from the result of static analysis by Linkage Checker 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1045])
> h1. Example Project
> Example project to produce an issue: 
> https://github.com/suztomo/beam-java-sdk-missing-nullable .
> I think the Maven artifact {{org.apache.beam:beam-sdks-java-core}}, which 
> contains {{org.apache.beam.sdk.schemas.FieldValueTypeInformation}}, should 
> declare the dependency to {{com.google.code.findbugs:jsr305}}.
> h1. Why there's no problem in compilation and tests of sdks/java/core?
> The compilation succeeds because the {{Nullable}} annotation is in the 
> transitive dependency of compileOnly {{spotbugs-annotations}} dependency:
> {noformat}
> compileOnly - Compile only dependencies for source set 'main'.
> ...
> +--- com.github.spotbugs:spotbugs-annotations:3.1.12
> |\--- com.google.code.findbugs:jsr305:3.0.2
> ...
> {noformat}
> The tests succeed because the {{Nullable}} annotation is in the transitive 
> dependency of {{guava-testlib}}.
> {noformat}
> testRuntime - Runtime dependencies for source set 'test' (deprecated, use 
> 'testRuntimeOnly' instead).
> ...
> +--- com.google.guava:guava-testlib:20.0
> |+--- com.google.code.findbugs:jsr305:1.3.9
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO.Read

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=360147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360147
 ]

ASF GitHub Bot logged work on BEAM-8511:


Author: ASF GitHub Bot
Created on: 16/Dec/19 09:55
Start Date: 16/Dec/19 09:55
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9899: [BEAM-8511] 
[WIP] KinesisIO.Read enhanced fanout
URL: https://github.com/apache/beam/pull/9899#issuecomment-565987603
 
 
   Hi @jfarr, do you have any updates on this PR? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360147)
Time Spent: 3.5h  (was: 3h 20m)

> Support for enhanced fan-out in KinesisIO.Read
> --
>
> Key: BEAM-8511
> URL: https://issues.apache.org/jira/browse/BEAM-8511
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kinesis
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Add support for reading from an enhanced fan-out consumer using KinesisIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8933:
---
Status: Open  (was: Triage Needed)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use (with 
> Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8542) Add async write to AWS SNS IO & remove retry logic

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8542?focusedWorklogId=360149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360149
 ]

ASF GitHub Bot logged work on BEAM-8542:


Author: ASF GitHub Bot
Created on: 16/Dec/19 09:56
Start Date: 16/Dec/19 09:56
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10078: [BEAM-8542] 
Change write to async in AWS SNS IO & remove retry logic
URL: https://github.com/apache/beam/pull/10078#issuecomment-565988036
 
 
   Hi @ajothomas , do you have any updates on this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360149)
Time Spent: 4.5h  (was: 4h 20m)

> Add async write to AWS SNS IO & remove retry logic
> --
>
> Key: BEAM-8542
> URL: https://issues.apache.org/jira/browse/BEAM-8542
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Ajo Thomas
>Assignee: Ajo Thomas
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> - While working with SNS IO for one of my work-related projects, I found that 
> the IO uses synchronous publishes during writes. I had a simple mock pipeline 
> where I was reading from a kinesis stream and publishing it to SNS using 
> Beam's SNS IO. For comparison, I also had a lamdba which did the same using 
> asynchronous publishes but was about 5x faster. Changing the SNS IO to use 
> async publishes would improve publish latencies.
>  - SNS IO also has some retry logic which isn't required as SNS clients can 
> handle retries. The retry logic in the SNS client is user-configurable and 
> therefore, an explicit retry logic in SNS IO is not required
> I have a working version of the IO with these changes, will create a PR 
> linking this ticket to it once I get some feedback here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997124#comment-16997124
 ] 

Ismaël Mejía commented on BEAM-8933:


Do you mean Avro in the title of this issue?

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use (with 
> Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8864) BigQueryQueryToTableIT.test_big_query_legacy_sql - fails in post commit tests

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8864:
---
Status: Open  (was: Triage Needed)

> BigQueryQueryToTableIT.test_big_query_legacy_sql - fails in post commit tests
> -
>
> Key: BEAM-8864
> URL: https://issues.apache.org/jira/browse/BEAM-8864
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp, test-failures
>Reporter: Ahmet Altay
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
> Fix For: Not applicable
>
>
> Logs: 
> [https://builds.apache.org/job/beam_PostCommit_Python35/1123/testReport/junit/apache_beam.io.gcp.big_query_query_to_table_it_test/BigQueryQueryToTableIT/test_big_query_legacy_sql/]
> Error Message
> Expected: (Test pipeline expected terminated in state: DONE and Expected 
> checksum is 158a8ea1c254fcf40d4ed3e7c0242c3ea0a29e72)
>  but: Expected checksum is 158a8ea1c254fcf40d4ed3e7c0242c3ea0a29e72 Actual 
> checksum is da39a3ee5e6b4b0d3255bfef95601890afd80709



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8963) Apache Beam Java GCP dependencies to catch up with GCP libraries-bom

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8963:
---
Status: Open  (was: Triage Needed)

> Apache Beam Java GCP dependencies to catch up with GCP libraries-bom
> 
>
> Key: BEAM-8963
> URL: https://issues.apache.org/jira/browse/BEAM-8963
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>
> || 
> ||org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.16.0||com.google.cloud:libraries-bom:3.1.0||
> |avalon-framework:avalon-framework|4.1.5| |
> |com.fasterxml.jackson.core:jackson-annotations|2.9.10| |
> |com.fasterxml.jackson.core:jackson-core|2.9.10| |
> |com.fasterxml.jackson.core:jackson-databind|2.9.10| |
> |com.github.jponge:lzma-java|1.3| |
> |com.google.android:android|1.5_r4| |
> |com.google.api-client:google-api-client|1.27.0| |
> |com.google.api-client:google-api-client-jackson2|1.27.0| |
> |com.google.api-client:google-api-client-java6|1.27.0| |
> |com.google.api.grpc:grpc-google-cloud-asset-v1| |0.80.0|
> |com.google.api.grpc:grpc-google-cloud-asset-v1beta1| |0.80.0|
> |com.google.api.grpc:grpc-google-cloud-asset-v1p2beta1| |0.80.0|
> |com.google.api.grpc:grpc-google-cloud-automl-v1| |0.79.1|
> |com.google.api.grpc:grpc-google-cloud-automl-v1beta1| |0.79.1|
> |com.google.api.grpc:grpc-google-cloud-bigquerydatatransfer-v1| |0.84.0|
> |com.google.api.grpc:grpc-google-cloud-bigquerystorage-v1beta1|0.44.0|0.84.0|
> |com.google.api.grpc:grpc-google-cloud-bigtable-admin-v2|0.38.0|1.7.1|
> |com.google.api.grpc:grpc-google-cloud-bigtable-v2|0.38.0|1.7.1|
> |com.google.api.grpc:grpc-google-cloud-billingbudgets-v1beta1| |0.1.1|
> |com.google.api.grpc:grpc-google-cloud-build-v1| |0.1.0|
> |com.google.api.grpc:grpc-google-cloud-container-v1| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-containeranalysis-v1| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-containeranalysis-v1beta1| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-datacatalog-v1beta1| |0.29.0-alpha|
> |com.google.api.grpc:grpc-google-cloud-datalabeling-v1beta1| |0.81.1|
> |com.google.api.grpc:grpc-google-cloud-dataproc-v1| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-dataproc-v1beta2| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-dialogflow-v2| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-dialogflow-v2beta1| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-dlp-v2| |0.81.0|
> |com.google.api.grpc:grpc-google-cloud-error-reporting-v1beta1| |0.84.1|
> |com.google.api.grpc:grpc-google-cloud-firestore-admin-v1| |1.32.0|
> |com.google.api.grpc:grpc-google-cloud-firestore-v1| |1.32.0|
> |com.google.api.grpc:grpc-google-cloud-firestore-v1beta1| |0.85.0|
> |com.google.api.grpc:grpc-google-cloud-gameservices-v1alpha| |0.18.0|
> |com.google.api.grpc:grpc-google-cloud-iamcredentials-v1| |0.43.0-alpha|
> |com.google.api.grpc:grpc-google-cloud-iot-v1| |0.81.0|
> |com.google.api.grpc:grpc-google-cloud-kms-v1| |0.82.1|
> |com.google.api.grpc:grpc-google-cloud-language-v1| |1.81.0|
> |com.google.api.grpc:grpc-google-cloud-language-v1beta2| |0.82.0|
> |com.google.api.grpc:grpc-google-cloud-logging-v2| |0.82.0|
> |com.google.api.grpc:grpc-google-cloud-monitoring-v3| |1.81.0|
> |com.google.api.grpc:grpc-google-cloud-os-login-v1| |0.82.0|
> |com.google.api.grpc:grpc-google-cloud-phishingprotection-v1beta1| |0.28.0|
> |com.google.api.grpc:grpc-google-cloud-pubsub-v1|1.43.0|1.84.0|
> |com.google.api.grpc:grpc-google-cloud-recaptchaenterprise-v1beta1| |0.28.0|
> |com.google.api.grpc:grpc-google-cloud-recommender-v1beta1| |0.2.0|
> |com.google.api.grpc:grpc-google-cloud-redis-v1| |0.82.0|
> |com.google.api.grpc:grpc-google-cloud-redis-v1beta1| |0.82.0|
> |com.google.api.grpc:grpc-google-cloud-scheduler-v1| |1.22.0|
> |com.google.api.grpc:grpc-google-cloud-scheduler-v1beta1| |0.82.0|
> |com.google.api.grpc:grpc-google-cloud-securitycenter-v1| |0.82.0|
> |com.google.api.grpc:grpc-google-cloud-securitycenter-v1beta1| |0.82.0|
> |com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1|1.6.0|1.46.0|
> |com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1|1.6.0|1.46.0|
> |com.google.api.grpc:grpc-google-cloud-spanner-v1|1.6.0|1.46.0|
> |com.google.api.grpc:grpc-google-cloud-speech-v1| |1.22.1|
> |com.google.api.grpc:grpc-google-cloud-speech-v1beta1| |0.75.1|
> |com.google.api.grpc:grpc-google-cloud-speech-v1p1beta1| |0.75.1|
> |com.google.api.grpc:grpc-google-cloud-talent-v4beta1| |0.34.0-beta|
> |com.google.api.grpc:grpc-google-cloud-tasks-v2| |1.27.0|
> |com.google.api.grpc:grpc-google-cloud-tasks-v2beta2| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-tasks-v2beta3| |0.83.0|
> |com.google.api.grpc:grpc-google-cloud-texttospeech-v1| |0.82.0|

[jira] [Assigned] (BEAM-8956) Unify Contributor Docs

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8956:
--

Assignee: elharo

> Unify Contributor Docs
> --
>
> Key: BEAM-8956
> URL: https://issues.apache.org/jira/browse/BEAM-8956
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.18.0
>Reporter: elharo
>Assignee: elharo
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Right now we have overlapping and sometimes contradictory docs on how to 
> setup and build BEAM in four different places I've found:
>  
>  README.md
>  CONTRIBUTING.md
>  [https://cwiki.apache.org/confluence/display/BEAM/Contributor+FAQ]
>  [https://beam.apache.org/contribute/]
>  
>  We should probably pick one as the source of truth and rewrite the
>  other three to simply point to it. I propose putting all checkout,
>  build, test, commit, and push instructions in CONTRIBUTING.md in the
>  repo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8956) Unify Contributor Docs

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8956:
---
Status: Open  (was: Triage Needed)

> Unify Contributor Docs
> --
>
> Key: BEAM-8956
> URL: https://issues.apache.org/jira/browse/BEAM-8956
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Affects Versions: 2.18.0
>Reporter: elharo
>Assignee: elharo
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Right now we have overlapping and sometimes contradictory docs on how to 
> setup and build BEAM in four different places I've found:
>  
>  README.md
>  CONTRIBUTING.md
>  [https://cwiki.apache.org/confluence/display/BEAM/Contributor+FAQ]
>  [https://beam.apache.org/contribute/]
>  
>  We should probably pick one as the source of truth and rewrite the
>  other three to simply point to it. I propose putting all checkout,
>  build, test, commit, and push instructions in CONTRIBUTING.md in the
>  repo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8946) Report collection size from MongoDBIOIT

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8946:
---
Status: Open  (was: Triage Needed)

> Report collection size from MongoDBIOIT
> ---
>
> Key: BEAM-8946
> URL: https://issues.apache.org/jira/browse/BEAM-8946
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Pawel Pasterz
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In order to calculate throughput of IO test it would be good to use MongoDB 
> utils to gather collection size before and after writing to it and report 
> number of bytes written to the collection using Metrics publisher



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8946) Report collection size from MongoDBIOIT

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-8946.

Fix Version/s: 2.19.0
   Resolution: Fixed

> Report collection size from MongoDBIOIT
> ---
>
> Key: BEAM-8946
> URL: https://issues.apache.org/jira/browse/BEAM-8946
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Pawel Pasterz
>Priority: Minor
> Fix For: 2.19.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In order to calculate throughput of IO test it would be good to use MongoDB 
> utils to gather collection size before and after writing to it and report 
> number of bytes written to the collection using Metrics publisher



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8786) .test-infra/jenkins/README.md link is 404

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8786:
--

Assignee: Oleg Bonar

> .test-infra/jenkins/README.md link is 404
> -
>
> Key: BEAM-8786
> URL: https://issues.apache.org/jira/browse/BEAM-8786
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Elliotte Rusty Harold
>Assignee: Oleg Bonar
>Priority: Trivial
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> On .test-infra/jenkins/README.md 
> Beam Jenkins overview page: link
> points to https://builds.apache.org/view/A-D/view/Beam/view which is 404



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8786) .test-infra/jenkins/README.md link is 404

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-8786.

Fix Version/s: Not applicable
   Resolution: Fixed

> .test-infra/jenkins/README.md link is 404
> -
>
> Key: BEAM-8786
> URL: https://issues.apache.org/jira/browse/BEAM-8786
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Elliotte Rusty Harold
>Priority: Trivial
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> On .test-infra/jenkins/README.md 
> Beam Jenkins overview page: link
> points to https://builds.apache.org/view/A-D/view/Beam/view which is 404



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8786) .test-infra/jenkins/README.md link is 404

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8786:
---
Status: Open  (was: Triage Needed)

> .test-infra/jenkins/README.md link is 404
> -
>
> Key: BEAM-8786
> URL: https://issues.apache.org/jira/browse/BEAM-8786
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Elliotte Rusty Harold
>Priority: Trivial
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> On .test-infra/jenkins/README.md 
> Beam Jenkins overview page: link
> points to https://builds.apache.org/view/A-D/view/Beam/view which is 404



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8952) Bigqueryio read from query temp dataset and GCP IAM issue

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8952:
---
Component/s: (was: beam-community)
 io-java-gcp

> Bigqueryio read from query temp dataset and GCP IAM issue
> -
>
> Key: BEAM-8952
> URL: https://issues.apache.org/jira/browse/BEAM-8952
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Pierig Le Saux
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>
> The need to give project level bigquery permissions for the creation and 
> deletion of a temporary dataset is in my opinion a security concern.
> We should be able to pass an existing temp_dataset for this purpose, where we 
> have previously applied proper IAM permissions, thus not needing to give 
> project level permissions to a dataflow job.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8952) Bigqueryio read from query temp dataset and GCP IAM issue

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8952:
--

Assignee: (was: Aizhamal Nurmamat kyzy)

> Bigqueryio read from query temp dataset and GCP IAM issue
> -
>
> Key: BEAM-8952
> URL: https://issues.apache.org/jira/browse/BEAM-8952
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Pierig Le Saux
>Priority: Major
>
> The need to give project level bigquery permissions for the creation and 
> deletion of a temporary dataset is in my opinion a security concern.
> We should be able to pass an existing temp_dataset for this purpose, where we 
> have previously applied proper IAM permissions, thus not needing to give 
> project level permissions to a dataflow job.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8949) Add Spanner IO Integration Test for Python

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8949:
---
Status: Open  (was: Triage Needed)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8950) Apache beam Python Porable runner is not working

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8950:
--

Assignee: (was: Aizhamal Nurmamat kyzy)

> Apache beam Python Porable runner is not working 
> -
>
> Key: BEAM-8950
> URL: https://issues.apache.org/jira/browse/BEAM-8950
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: dhiren
>Priority: Major
>
> When we try to run beam job with python and do setting in PipelineOptions 
> --runner=PortableRunner,  --job_endpoint=localhost:8089, 
> --environment_type=LOOPBACK 
> to run job in portable mode code is not working in local system 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8948) ValueProvider support for read method of XML IO connector

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8948:
---
Environment: (was: Windows/Linux)

> ValueProvider support for read method of XML IO connector
> -
>
> Key: BEAM-8948
> URL: https://issues.apache.org/jira/browse/BEAM-8948
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Vikram
>Priority: Blocker
> Fix For: 2.11.0
>
>
> We are using XML IO for reading XML from GCP bucket for our requirement. The 
> requirement also includes  creating templates.
> When we try to create template for our dataflow job which uses XML IO to get 
> the template created for dataflow job we have to make input parameters as 
> Value Provider. However the XML IO read method does not have this option and 
> we are not able to create a template Below is the document with XML IO.
> {color:#FF}*public XmlIO.Read from(java.lang.String 
> fileOrPatternSpec)*{color}
> {color:#FF}*Reads a single XML file or a set of XML files defined by a 
> Java "glob" file pattern. Each XML file should be of the form defined in 
> XmlIO.read().*{color}
>  
>  
> There is no read method with a ValueProvider.
> Can this issue be fixed at the earliest as we have been blocked with this 
> requirement. Any updates please contact me.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8950) Apache beam Python Porable runner is not working

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8950:
---
Component/s: (was: examples-python)
 (was: beam-community)
 sdk-py-core

> Apache beam Python Porable runner is not working 
> -
>
> Key: BEAM-8950
> URL: https://issues.apache.org/jira/browse/BEAM-8950
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0
>Reporter: dhiren
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>
> When we try to run beam job with python and do setting in PipelineOptions 
> --runner=PortableRunner,  --job_endpoint=localhost:8089, 
> --environment_type=LOOPBACK 
> to run job in portable mode code is not working in local system 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=360154&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360154
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 16/Dec/19 10:08
Start Date: 16/Dec/19 10:08
Worklog Time Spent: 10m 
  Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO 
compression and decompression support
URL: https://github.com/apache/beam/pull/10254#issuecomment-565992871
 
 
   > @amoght I don't have enough context to make the call on that, as I am very 
new to Beam. I have reached out to some others at Twitter to also review this 
change, as they will have more context.
   
   Thanks Gary :) appreciate your help! 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360154)
Time Spent: 4h 50m  (was: 4h 40m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8948) ValueProvider support for read method of XML IO connector

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8948:
---
Status: Open  (was: Triage Needed)

> ValueProvider support for read method of XML IO connector
> -
>
> Key: BEAM-8948
> URL: https://issues.apache.org/jira/browse/BEAM-8948
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Vikram
>Priority: Blocker
>
> We are using XML IO for reading XML from GCP bucket for our requirement. The 
> requirement also includes  creating templates.
> When we try to create template for our dataflow job which uses XML IO to get 
> the template created for dataflow job we have to make input parameters as 
> Value Provider. However the XML IO read method does not have this option and 
> we are not able to create a template Below is the document with XML IO.
> {color:#FF}*public XmlIO.Read from(java.lang.String 
> fileOrPatternSpec)*{color}
> {color:#FF}*Reads a single XML file or a set of XML files defined by a 
> Java "glob" file pattern. Each XML file should be of the form defined in 
> XmlIO.read().*{color}
>  
>  
> There is no read method with a ValueProvider.
> Can this issue be fixed at the earliest as we have been blocked with this 
> requirement. Any updates please contact me.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8948) ValueProvider support for read method of XML IO connector

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8948:
---
Fix Version/s: (was: 2.11.0)

> ValueProvider support for read method of XML IO connector
> -
>
> Key: BEAM-8948
> URL: https://issues.apache.org/jira/browse/BEAM-8948
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Vikram
>Priority: Blocker
>
> We are using XML IO for reading XML from GCP bucket for our requirement. The 
> requirement also includes  creating templates.
> When we try to create template for our dataflow job which uses XML IO to get 
> the template created for dataflow job we have to make input parameters as 
> Value Provider. However the XML IO read method does not have this option and 
> we are not able to create a template Below is the document with XML IO.
> {color:#FF}*public XmlIO.Read from(java.lang.String 
> fileOrPatternSpec)*{color}
> {color:#FF}*Reads a single XML file or a set of XML files defined by a 
> Java "glob" file pattern. Each XML file should be of the form defined in 
> XmlIO.read().*{color}
>  
>  
> There is no read method with a ValueProvider.
> Can this issue be fixed at the earliest as we have been blocked with this 
> requirement. Any updates please contact me.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8948) ValueProvider support for read method of XML IO connector

2019-12-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997130#comment-16997130
 ] 

Ismaël Mejía commented on BEAM-8948:


A version of Read that accepts templates was added in 2.12.0. I am not 100% 
sure but if you require to stay in 2.11.0 you may use the FileIO.match (with 
the valueprovider) + XmlIO.readFiles pattern.

> ValueProvider support for read method of XML IO connector
> -
>
> Key: BEAM-8948
> URL: https://issues.apache.org/jira/browse/BEAM-8948
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Vikram
>Priority: Blocker
>
> We are using XML IO for reading XML from GCP bucket for our requirement. The 
> requirement also includes  creating templates.
> When we try to create template for our dataflow job which uses XML IO to get 
> the template created for dataflow job we have to make input parameters as 
> Value Provider. However the XML IO read method does not have this option and 
> we are not able to create a template Below is the document with XML IO.
> {color:#FF}*public XmlIO.Read from(java.lang.String 
> fileOrPatternSpec)*{color}
> {color:#FF}*Reads a single XML file or a set of XML files defined by a 
> Java "glob" file pattern. Each XML file should be of the form defined in 
> XmlIO.read().*{color}
>  
>  
> There is no read method with a ValueProvider.
> Can this issue be fixed at the earliest as we have been blocked with this 
> requirement. Any updates please contact me.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8945) DirectStreamObserver race condition

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8945:
---
Status: Open  (was: Triage Needed)

> DirectStreamObserver race condition
> ---
>
> Key: BEAM-8945
> URL: https://issues.apache.org/jira/browse/BEAM-8945
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Affects Versions: 2.16.0
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>
> The DirectStreamObserver can bet into a dead lock if the channel become 
> unhealthy of is not ready. An extended period of unhealthyness should result 
> into failure.
> This is supported by following thread dumps where we see that 1 thread is 
> having on getting the lock on actual stream observer while the remaining 
> worker threads are waiting on the lock on the stream observer.
>  The thread which is having lock on stream observer is probably in the while 
> loop because the outboundObserver is not ready.
>  Their is also 1 thread which is waiting to execute onError which means that 
> the stream observer has become unhealthy and probably never going to get 
> ready.
> 100s of threads are blocked on:
>  
>  
> org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onNext(SynchronizedStreamObserver.java:46)
>  
> org.apache.beam.runners.fnexecution.control.FnApiControlClient.handle(FnApiControlClient.java:84)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.getProcessBundleProgress(RegisterAndProcessBundleOperation.java:393)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.updateProgress(BeamFnMapTaskExecutor.java:347)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.periodicProgressUpdate(BeamFnMapTaskExecutor.java:334)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker$$Lambda$107/1297335196.run(Unknown
>  Source)
>  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.run(Thread.java:745)
>  
>  
> One thread having the lock:
> State: TIMED_WAITING stack: ---
>  sun.misc.Unsafe.park(Native Method)
>  java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>  java.util.concurrent.Phaser$QNode.block(Phaser.java:1142)
>  java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>  java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067)
>  java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:796)
>  
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:70)
>  
> org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onNext(SynchronizedStreamObserver.java:46)
>  
> org.apache.beam.runners.fnexecution.control.FnApiControlClient.handle(FnApiControlClient.java:84)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.getProcessBundleProgress(RegisterAndProcessBundleOperation.java:393)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.updateProgress(BeamFnMapTaskExecutor.java:347)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.periodicProgressUpdate(BeamFnMapTaskExecutor.java:334)
>  
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker$$Lambda$107/1297335196.run(Unknown
>  Source)
>  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.run(Thread.java:745)
>  
>  
> One thread waiting to execute onError
> State: BLOCKED stack: ---
>  
> org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver.onError(Sync

[jira] [Resolved] (BEAM-8948) ValueProvider support for read method of XML IO connector

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-8948.

Fix Version/s: 2.12.0
   Resolution: Fixed

> ValueProvider support for read method of XML IO connector
> -
>
> Key: BEAM-8948
> URL: https://issues.apache.org/jira/browse/BEAM-8948
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
>Reporter: Vikram
>Priority: Blocker
> Fix For: 2.12.0
>
>
> We are using XML IO for reading XML from GCP bucket for our requirement. The 
> requirement also includes  creating templates.
> When we try to create template for our dataflow job which uses XML IO to get 
> the template created for dataflow job we have to make input parameters as 
> Value Provider. However the XML IO read method does not have this option and 
> we are not able to create a template Below is the document with XML IO.
> {color:#FF}*public XmlIO.Read from(java.lang.String 
> fileOrPatternSpec)*{color}
> {color:#FF}*Reads a single XML file or a set of XML files defined by a 
> Java "glob" file pattern. Each XML file should be of the form defined in 
> XmlIO.read().*{color}
>  
>  
> There is no read method with a ValueProvider.
> Can this issue be fixed at the earliest as we have been blocked with this 
> requirement. Any updates please contact me.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8936) BigQuery related ITs are failing in PostCommit: quota exceeded

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8936:
---
Status: Open  (was: Triage Needed)

> BigQuery related ITs are failing in PostCommit: quota exceeded
> --
>
> Key: BEAM-8936
> URL: https://issues.apache.org/jira/browse/BEAM-8936
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Yueyang Qiu
>Assignee: Mark Liu
>Priority: Major
>  Labels: currently-failing
>
> beam_PostCommit_Java: 
> [https://builds.apache.org/job/beam_PostCommit_Java/4852/]
> beam_PostCommit_Python2: 
> [https://builds.apache.org/job/beam_PostCommit_Python2/1178|https://builds.apache.org/job/beam_PostCommit_Python2/1178/#showFailuresLink]
> beam_PostCommit_Python35: 
> [https://builds.apache.org/job/beam_PostCommit_Python35/1185]
> ...
>  
> This seems to be a GCP quota issue. Mark, could you help take a look or find 
> a owner of this bug?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7403) BigQueryIO.Write does not autoscale correctly (idle workers)

2019-12-16 Thread Pavlo Pohrrebnyi (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997156#comment-16997156
 ] 

Pavlo Pohrrebnyi commented on BEAM-7403:


Looks like that was a Dataflow Runner issue, and Google has resolved that

> BigQueryIO.Write does not autoscale correctly (idle workers)
> 
>
> Key: BEAM-7403
> URL: https://issues.apache.org/jira/browse/BEAM-7403
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Pavlo Pohrrebnyi
>Priority: Major
>
> Apache Beam version:
> 2.10
> JAVA SDK
> Dataflow GCP Staged
> Details:
> We have a streaming dataflow which ingests data into BigQuery (Streaming 
> Inserts).
> We deploy a job with max number of workers = 40 and
> there is a huge backlog already (high watermark).
> When the dataflow starts it scales 0 -> 3 (from 0 to 3 workers)
> and starts ingesting with 12000 messages/sec rate.
> After 2 mins it scales 3 -> 40 to keep up with a backlog.
> After scaling up, the rate never goes higher than it was with 3 nodes (12000 
> messages/sec).
> We have memory consumption metrics in Stackdriver; from them
> we see that the first 3 workers consume about 5GB of RAM and the rest 37 
> workers
> consume about 0.2GB RAM. It appears that these autoscaled Nodes are idle? 
> Importantly, they don’t add to Streaming Inserts process for BigQuery.
> Autoscaling in the other streaming pipelines we have works fine.
> It appears that this is related to BigQuery streaming inserts.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-7403) BigQueryIO.Write does not autoscale correctly (idle workers)

2019-12-16 Thread Pavlo Pohrrebnyi (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavlo Pohrrebnyi closed BEAM-7403.
--
Fix Version/s: 2.11.0
   Resolution: Won't Fix

Dataflow Runner was fixed by Google

> BigQueryIO.Write does not autoscale correctly (idle workers)
> 
>
> Key: BEAM-7403
> URL: https://issues.apache.org/jira/browse/BEAM-7403
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Pavlo Pohrrebnyi
>Priority: Major
> Fix For: 2.11.0
>
>
> Apache Beam version:
> 2.10
> JAVA SDK
> Dataflow GCP Staged
> Details:
> We have a streaming dataflow which ingests data into BigQuery (Streaming 
> Inserts).
> We deploy a job with max number of workers = 40 and
> there is a huge backlog already (high watermark).
> When the dataflow starts it scales 0 -> 3 (from 0 to 3 workers)
> and starts ingesting with 12000 messages/sec rate.
> After 2 mins it scales 3 -> 40 to keep up with a backlog.
> After scaling up, the rate never goes higher than it was with 3 nodes (12000 
> messages/sec).
> We have memory consumption metrics in Stackdriver; from them
> we see that the first 3 workers consume about 5GB of RAM and the rest 37 
> workers
> consume about 0.2GB RAM. It appears that these autoscaled Nodes are idle? 
> Importantly, they don’t add to Streaming Inserts process for BigQuery.
> Autoscaling in the other streaming pipelines we have works fine.
> It appears that this is related to BigQuery streaming inserts.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8971) BigQueryIO.Write sometimes throws errors

2019-12-16 Thread Pavlo Pohrrebnyi (Jira)
Pavlo Pohrrebnyi created BEAM-8971:
--

 Summary: BigQueryIO.Write sometimes throws errors 
 Key: BEAM-8971
 URL: https://issues.apache.org/jira/browse/BEAM-8971
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.15.0
Reporter: Pavlo Pohrrebnyi


The following error happens from time to time. After that beam retries an 
entire batch and that gets processed fine. There are 2 concerns:
 * that may produce duplicates (however, I am not sure)
 * these might be false-positive errors which clutter the log and produce false 
alerts

Stacktrace:
java.lang.RuntimeException: java.io.IOException: Insert failed: 
[\{"errors":[{"debugInfo":"","location":"","message":"","reason":"timeout"}],"index":0}]
at 
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:151)
at 
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:112)
Caused by: java.io.IOException: Insert failed: 
[\{"errors":[{"debugInfo":"","location":"","message":"","reason":"timeout"}],"index":0}]
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:854)
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:871)
at 
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:140)
at 
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:112)
at 
org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(Unknown
 Source)
at 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.finishBundle(SimpleDoFnRunner.java:224)
at 
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:412)
at 
org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:56)
at 
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
at 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1295)
at 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
at 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1028)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360213&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360213
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 12:29
Start Date: 16/Dec/19 12:29
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566041061
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360213)
Time Spent: 9h 10m  (was: 9h)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360216&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360216
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 12:38
Start Date: 16/Dec/19 12:38
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566043925
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360216)
Time Spent: 9h 20m  (was: 9h 10m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8849) Flink: Support multiple translation or a single URN in batch runner.

2019-12-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8849:
---
Description: We need to be able to have multiple translations for a single 
URN (right now we want to have a special translation for GBK with non-merging 
windowing). We'll introduce a new `canTranslate` method for all translators, 
that can be used by runner to decide, which translation to use based on 
`PTransform` properties.  (was: We need to be able to have multiple 
translations for a single URN (right now we want to have a special translation 
for GBK with non-merging widowing). We'll introduce a new `canTranslate` method 
for all translators, that can be used by runner to decide, which translation to 
use based on `PTransform` properties.)

> Flink: Support multiple translation or a single URN in batch runner.
> 
>
> Key: BEAM-8849
> URL: https://issues.apache.org/jira/browse/BEAM-8849
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: Major
>
> We need to be able to have multiple translations for a single URN (right now 
> we want to have a special translation for GBK with non-merging windowing). 
> We'll introduce a new `canTranslate` method for all translators, that can be 
> used by runner to decide, which translation to use based on `PTransform` 
> properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360239
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 13:18
Start Date: 16/Dec/19 13:18
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566056904
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360239)
Time Spent: 9.5h  (was: 9h 20m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360243&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360243
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 13:25
Start Date: 16/Dec/19 13:25
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566059482
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360243)
Time Spent: 9h 40m  (was: 9.5h)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360271&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360271
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:06
Start Date: 16/Dec/19 14:06
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r358249787
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResources.java
 ##
 @@ -49,7 +50,19 @@
   ClassLoader classLoader, PipelineOptions options) {
 
 PipelineResourcesOptions artifactsRelatedOptions = 
options.as(PipelineResourcesOptions.class);
-return 
artifactsRelatedOptions.getPipelineResourcesDetector().detect(classLoader);
+return artifactsRelatedOptions
+.getPipelineResourcesDetector()
+.detect(classLoader)
+.filter(isStageable())
+.collect(Collectors.toList());
+  }
+
+  /**
+   * Returns a predicate for filtering all resources that are impossible to 
stage (like gradle
+   * wrapper jars).
+   */
+  private static Predicate isStageable() {
+return resourcePath -> !resourcePath.contains("gradle/wrapper");
 
 Review comment:
   Is there a more programmatic way to filter these out, e.g. to stop at the 
Gradle wrapper's classloader?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360271)
Time Spent: 12.5h  (was: 12h 20m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360269&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360269
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:06
Start Date: 16/Dec/19 14:06
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r358247892
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetector.java
 ##
 @@ -43,9 +43,9 @@ public ClasspathScanningResourcesDetector(ClassGraph 
classGraph) {
* @return A list of absolute paths to the resources the class loader uses.
*/
   @Override
-  public List detect(ClassLoader classLoader) {
-List collect = 
classGraph.addClassLoader(classLoader).getClasspathFiles();
+  public Stream detect(ClassLoader classLoader) {
 
 Review comment:
   Could we leave this `List`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360269)
Time Spent: 12h 20m  (was: 12h 10m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360270
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:06
Start Date: 16/Dec/19 14:06
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r358248161
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesDetector.java
 ##
 @@ -18,10 +18,10 @@
 package org.apache.beam.runners.core.construction.resources;
 
 import java.io.Serializable;
-import java.util.List;
+import java.util.stream.Stream;
 
 /** Interface for an algorithm detecting classpath resources for pipelines. */
 public interface PipelineResourcesDetector extends Serializable {
 
-  List detect(ClassLoader classLoader);
+  Stream detect(ClassLoader classLoader);
 
 Review comment:
   Would prefer `List`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360270)
Time Spent: 12h 20m  (was: 12h 10m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360276
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:14
Start Date: 16/Dec/19 14:14
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r358254181
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesDetector.java
 ##
 @@ -18,10 +18,10 @@
 package org.apache.beam.runners.core.construction.resources;
 
 import java.io.Serializable;
-import java.util.List;
+import java.util.stream.Stream;
 
 /** Interface for an algorithm detecting classpath resources for pipelines. */
 public interface PipelineResourcesDetector extends Serializable {
 
-  List detect(ClassLoader classLoader);
+  Stream detect(ClassLoader classLoader);
 
 Review comment:
   Stream was a better fit in case we want to do filtering outside a 
`PipelineResourcesDetector` implementation class. This is already done in 
`PipelineResources.java`. This way, when a detector (even a 3rd party one) 
provides paths that we know we should not accept, we can easily filter them out 
using stream's api and then materialize the result to a collection of our 
choice (eg. List or Set - it's up to beam's developers at this point). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360276)
Time Spent: 12h 40m  (was: 12.5h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360289
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:18
Start Date: 16/Dec/19 14:18
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r358256436
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResources.java
 ##
 @@ -49,7 +50,19 @@
   ClassLoader classLoader, PipelineOptions options) {
 
 PipelineResourcesOptions artifactsRelatedOptions = 
options.as(PipelineResourcesOptions.class);
-return 
artifactsRelatedOptions.getPipelineResourcesDetector().detect(classLoader);
+return artifactsRelatedOptions
+.getPipelineResourcesDetector()
+.detect(classLoader)
+.filter(isStageable())
+.collect(Collectors.toList());
+  }
+
+  /**
+   * Returns a predicate for filtering all resources that are impossible to 
stage (like gradle
+   * wrapper jars).
+   */
+  private static Predicate isStageable() {
+return resourcePath -> !resourcePath.contains("gradle/wrapper");
 
 Review comment:
   I tried to use classgraph's `blacklist*()` methods to filter these. However, 
I couldn't get any of them working for this particular directory. 
   
   I still think that even despite that, filtering should also be possible (and 
easy, that's why I used streams) outside detector's implementation.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360289)
Time Spent: 12h 50m  (was: 12h 40m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8962) FlinkMetricContainer causes churn in the JobManager and lets the web frontend malfunction

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8962?focusedWorklogId=360298&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360298
 ]

ASF GitHub Bot logged work on BEAM-8962:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:31
Start Date: 16/Dec/19 14:31
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #10381: [BEAM-8962] 
Add option to disable the metric container accumulator
URL: https://github.com/apache/beam/pull/10381#discussion_r358263908
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##
 @@ -159,6 +159,15 @@
 
   void setEnableMetrics(Boolean enableMetrics);
 
+  @Description(
+  "By default, uses Flink accumulators to store the metrics which allows 
to query metrics from the PipelineResult. "
+  + "If set to true, metrics will still be reported but can't be 
queried via PipelineResult. "
+  + "This saves network and memory.")
+  @Default.Boolean(false)
 
 Review comment:
   I find it important that flags are named to express their semantics. In this 
case, the flag's purpose is to turn something off, and there are many similar 
examples where we would use "skip", "no" etc.
   
   There should not be a case where `disabled=false` needs to be specified by 
the user. 
   But maybe flags should not have default values, since they are fully 
described by name.
   
   That would also avoid the options to be included with their defaults during 
job submission, even when the user has not specified them.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360298)
Time Spent: 1h 10m  (was: 1h)

> FlinkMetricContainer causes churn in the JobManager and lets the web frontend 
> malfunction
> -
>
> Key: BEAM-8962
> URL: https://issues.apache.org/jira/browse/BEAM-8962
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The {{FlinkMetricContainer}} wraps the Beam metric container for reporting 
> metrics, but also stores them as Flink accumulators. With high parallelism 
> jobs with over a thousand tasks and many built-in Beam metrics for every Beam 
> step, this can accumulate to over 100MB of serialized data which is stored in 
> the JobManager's ExecutionGraph. This then fails to even sent over the wire, 
> due to the akka.framesize limit (10MB by default), and manifests in {{500 
> Internal Server Error}}s in the web frontend.
> We need to introduce an option to disable the reporting via accumulators. It 
> is mostly useful for batch workloads where you can retrieve the final 
> accumulator values at the end of the job. It adds a lot of memory and network 
> overhead.
> Perhaps we could even turn off the accumulators for streaming jobs, or 
> entirely and make them opt-in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360300
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:31
Start Date: 16/Dec/19 14:31
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r358264060
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetector.java
 ##
 @@ -43,9 +43,9 @@ public ClasspathScanningResourcesDetector(ClassGraph 
classGraph) {
* @return A list of absolute paths to the resources the class loader uses.
*/
   @Override
-  public List detect(ClassLoader classLoader) {
-List collect = 
classGraph.addClassLoader(classLoader).getClasspathFiles();
+  public Stream detect(ClassLoader classLoader) {
 
 Review comment:
   (I responded in a suggestion below)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360300)
Time Spent: 13h 10m  (was: 13h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360299
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:31
Start Date: 16/Dec/19 14:31
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r358264060
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetector.java
 ##
 @@ -43,9 +43,9 @@ public ClasspathScanningResourcesDetector(ClassGraph 
classGraph) {
* @return A list of absolute paths to the resources the class loader uses.
*/
   @Override
-  public List detect(ClassLoader classLoader) {
-List collect = 
classGraph.addClassLoader(classLoader).getClasspathFiles();
+  public Stream detect(ClassLoader classLoader) {
 
 Review comment:
   (I responded in a below suggestion)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360299)
Time Spent: 13h  (was: 12h 50m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360304
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:43
Start Date: 16/Dec/19 14:43
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566090693
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360304)
Time Spent: 9h 50m  (was: 9h 40m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360305&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360305
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:44
Start Date: 16/Dec/19 14:44
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r358254181
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesDetector.java
 ##
 @@ -18,10 +18,10 @@
 package org.apache.beam.runners.core.construction.resources;
 
 import java.io.Serializable;
-import java.util.List;
+import java.util.stream.Stream;
 
 /** Interface for an algorithm detecting classpath resources for pipelines. */
 public interface PipelineResourcesDetector extends Serializable {
 
-  List detect(ClassLoader classLoader);
+  Stream detect(ClassLoader classLoader);
 
 Review comment:
   Stream was a better fit in case we want to do filtering outside a 
`PipelineResourcesDetector` implementation class. This is already done in 
`PipelineResources.java`. This way, when a detector (even a 3rd party one, 
provided via pipeline options) provides paths that we know we should not 
accept, we can easily filter them out using stream's api and then materialize 
the result to a collection of our choice (eg. List or Set - it's up to beam's 
developers at this point). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360305)
Time Spent: 13h 20m  (was: 13h 10m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360309&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360309
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:56
Start Date: 16/Dec/19 14:56
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566095979
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360309)
Time Spent: 10h  (was: 9h 50m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360310
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:57
Start Date: 16/Dec/19 14:57
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566096146
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360310)
Time Spent: 10h 10m  (was: 10h)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360311
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 14:57
Start Date: 16/Dec/19 14:57
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566095979
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360311)
Time Spent: 10h 20m  (was: 10h 10m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360314&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360314
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:04
Start Date: 16/Dec/19 15:04
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566099230
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360314)
Time Spent: 10.5h  (was: 10h 20m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360320&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360320
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:20
Start Date: 16/Dec/19 15:20
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566106245
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360320)
Time Spent: 10h 40m  (was: 10.5h)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8801) PubsubMessageToRow should not check useFlatSchema() in processElement

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8801?focusedWorklogId=360322&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360322
 ]

ASF GitHub Bot logged work on BEAM-8801:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:29
Start Date: 16/Dec/19 15:29
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10359: 
[BEAM-8801] PubsubMessageToRow should not check useFlatSchema() in pr…
URL: https://github.com/apache/beam/pull/10359#discussion_r358295100
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubMessageToRowTest.java
 ##
 @@ -275,6 +268,68 @@ public void testSendsFlatRowInvalidToDLQ() {
 pipeline.run();
   }
 
+  @Test
+  public void testFlatSchemaMessageInvalidElement() {
+Schema messageSchema =
+Schema.builder()
+.addDateTimeField("event_timestamp")
+.addInt32Field("id")
+.addStringField("name")
+.build();
+
+PCollectionTuple rows =
+pipeline
+.apply(
+"create",
+Create.timestamped(
+message(1, map("attr", "val"), "{ \"id\" : 3, \"name\" : 
\"foo\" }"),
+message(2, map("attr1", "val1"), "{ \"invalid1\" : 
\"sdfsd\" }")))
+.apply(
+"convert",
+PubsubMessageToRow.builder()
+.messageSchema(messageSchema)
+.useDlq(false)
+.useFlatSchema(true)
+.build());
+
+Exception exception = Assert.assertThrows(RuntimeException.class, () -> 
pipeline.run());
+Assert.assertTrue(exception.getMessage().contains("Error parsing 
message"));
+  }
+
+  @Test
+  public void testNestedSchemaMessageInvalidElement() {
+Schema payloadSchema =
+Schema.builder()
+.addNullableField("id", FieldType.INT32)
+.addNullableField("name", FieldType.STRING)
+.build();
+
+Schema messageSchema =
+Schema.builder()
+.addDateTimeField("event_timestamp")
+.addMapField("attributes", VARCHAR, VARCHAR)
+.addRowField("payload", payloadSchema)
+.build();
+
+PCollectionTuple rows =
+pipeline
+.apply(
+"create",
+Create.timestamped(
+message(1, map("attr", "val"), "{ \"id\" : 3, \"name\" : 
\"foo\" }"),
+message(2, map("attr1", "val1"), "{ \"invalid1\" : 
\"sdfsd\" }")))
+.apply(
+"convert",
+PubsubMessageToRow.builder()
+.messageSchema(messageSchema)
+.useDlq(false)
+.useFlatSchema(false)
+.build());
+
+Exception exception = Assert.assertThrows(RuntimeException.class, () -> 
pipeline.run());
+Assert.assertTrue(exception.getMessage().contains("Error parsing 
message"));
 
 Review comment:
   nice! thanks for adding these tests :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360322)
Time Spent: 2h 40m  (was: 2.5h)

> PubsubMessageToRow should not check useFlatSchema() in processElement
> -
>
> Key: BEAM-8801
> URL: https://issues.apache.org/jira/browse/BEAM-8801
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently we check useFlatSchema() for every element that's processed. 
> Instead, we should check it once at pipeline construction time. See 
> [comment|https://github.com/apache/beam/pull/10158#discussion_r348805530].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8801) PubsubMessageToRow should not check useFlatSchema() in processElement

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8801?focusedWorklogId=360323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360323
 ]

ASF GitHub Bot logged work on BEAM-8801:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:29
Start Date: 16/Dec/19 15:29
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10359: 
[BEAM-8801] PubsubMessageToRow should not check useFlatSchema() in pr…
URL: https://github.com/apache/beam/pull/10359#discussion_r358294729
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubMessageToRowTest.java
 ##
 @@ -36,23 +36,23 @@
 import org.apache.beam.sdk.testing.PAssert;
 import org.apache.beam.sdk.testing.TestPipeline;
 import org.apache.beam.sdk.transforms.Create;
-import org.apache.beam.sdk.transforms.ParDo;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionTuple;
 import org.apache.beam.sdk.values.Row;
 import org.apache.beam.sdk.values.TimestampedValue;
-import org.apache.beam.sdk.values.TupleTagList;
 import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableMap;
 import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableSet;
 import org.joda.time.DateTime;
 import org.joda.time.Instant;
+import org.junit.Assert;
 import org.junit.Rule;
 import org.junit.Test;
 
 /** Unit tests for {@link PubsubMessageToRow}. */
 public class PubsubMessageToRowTest implements Serializable {
 
   @Rule public transient TestPipeline pipeline = TestPipeline.create();
+  private static final String DEAD_FILE_QUEUE = "projects/a12345z/topics/test";
 
 Review comment:
   nit: can you remove this? I don't think it's used anymore
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360323)
Time Spent: 2h 40m  (was: 2.5h)

> PubsubMessageToRow should not check useFlatSchema() in processElement
> -
>
> Key: BEAM-8801
> URL: https://issues.apache.org/jira/browse/BEAM-8801
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently we check useFlatSchema() for every element that's processed. 
> Instead, we should check it once at pipeline construction time. See 
> [comment|https://github.com/apache/beam/pull/10158#discussion_r348805530].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2019-12-16 Thread Michal Walenia (Jira)
Michal Walenia created BEAM-8972:


 Summary: Add a Jenkins job running Combine load test on Java with 
Flink in Portability mode
 Key: BEAM-8972
 URL: https://issues.apache.org/jira/browse/BEAM-8972
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Michal Walenia
Assignee: Michal Walenia






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=360326&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360326
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:35
Start Date: 16/Dec/19 15:35
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566112567
 
 
   Run Python 3.7 Load Tests ParDo Dataflow Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360326)
Time Spent: 10h 50m  (was: 10h 40m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360332
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:46
Start Date: 16/Dec/19 15:46
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-566117095
 
 
   Could you squash the relevant commits?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360332)
Time Spent: 13.5h  (was: 13h 20m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=360333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360333
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:48
Start Date: 16/Dec/19 15:48
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10385: 
[release-2.18.0][BEAM-8967] Maven artifact beam-sdks-java-core does not have 
JSR305 specified as "compile"
URL: https://github.com/apache/beam/pull/10385#issuecomment-566118311
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360333)
Time Spent: 1h 20m  (was: 1h 10m)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0, 2.18.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=360335&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360335
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:51
Start Date: 16/Dec/19 15:51
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10386: [BEAM-8972] Add 
Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386#issuecomment-566119478
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360335)
Time Spent: 20m  (was: 10m)

> Add a Jenkins job running Combine load test on Java with Flink in Portability 
> mode
> --
>
> Key: BEAM-8972
> URL: https://issues.apache.org/jira/browse/BEAM-8972
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=360334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360334
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 16/Dec/19 15:51
Start Date: 16/Dec/19 15:51
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #10386: [BEAM-8972] 
Add Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386
 
 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/ico

[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=360338&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360338
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 16/Dec/19 16:03
Start Date: 16/Dec/19 16:03
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10386: [BEAM-8972] Add 
Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386#issuecomment-566125111
 
 
   Run Load Tests Java Combine Portable Flink Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360338)
Time Spent: 0.5h  (was: 20m)

> Add a Jenkins job running Combine load test on Java with Flink in Portability 
> mode
> --
>
> Key: BEAM-8972
> URL: https://issues.apache.org/jira/browse/BEAM-8972
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8917) javax.annotation.Nullable is missing for org.apache.beam.sdk.schemas.FieldValueTypeInformation

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8917?focusedWorklogId=360340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360340
 ]

ASF GitHub Bot logged work on BEAM-8917:


Author: ASF GitHub Bot
Created on: 16/Dec/19 16:07
Start Date: 16/Dec/19 16:07
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #10370: 
[release-2.17.0][BEAM-8917][BEAM-8967] jsr305 dependency declaration for 
Nullable class (#10324)
URL: https://github.com/apache/beam/pull/10370#issuecomment-566126783
 
 
   run python precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360340)
Time Spent: 11h 40m  (was: 11.5h)

> javax.annotation.Nullable is missing for 
> org.apache.beam.sdk.schemas.FieldValueTypeInformation
> --
>
> Key: BEAM-8917
> URL: https://issues.apache.org/jira/browse/BEAM-8917
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> This ticket is from the result of static analysis by Linkage Checker 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1045])
> h1. Example Project
> Example project to produce an issue: 
> https://github.com/suztomo/beam-java-sdk-missing-nullable .
> I think the Maven artifact {{org.apache.beam:beam-sdks-java-core}}, which 
> contains {{org.apache.beam.sdk.schemas.FieldValueTypeInformation}}, should 
> declare the dependency to {{com.google.code.findbugs:jsr305}}.
> h1. Why there's no problem in compilation and tests of sdks/java/core?
> The compilation succeeds because the {{Nullable}} annotation is in the 
> transitive dependency of compileOnly {{spotbugs-annotations}} dependency:
> {noformat}
> compileOnly - Compile only dependencies for source set 'main'.
> ...
> +--- com.github.spotbugs:spotbugs-annotations:3.1.12
> |\--- com.google.code.findbugs:jsr305:3.0.2
> ...
> {noformat}
> The tests succeed because the {{Nullable}} annotation is in the transitive 
> dependency of {{guava-testlib}}.
> {noformat}
> testRuntime - Runtime dependencies for source set 'test' (deprecated, use 
> 'testRuntimeOnly' instead).
> ...
> +--- com.google.guava:guava-testlib:20.0
> |+--- com.google.code.findbugs:jsr305:1.3.9
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest

2019-12-16 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997429#comment-16997429
 ] 

Kamil Wasilewski commented on BEAM-8966:


Similar error happens in 
[https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
 
|https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]and
 
[https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]

No idea why not every Python tests are affected. Also, I'm not able to 
reproduce this error locally.
{code:java}
16:52:40 > Task :sdks:python:sdist FAILED
16:52:40 setup.py:232: UserWarning: You are using Apache Beam with Python 2. 
New releases of Apache Beam will soon support Python 3 only.
16:52:40   'You are using Apache Beam with Python 2. '
16:52:40 
/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/src/build/gradleenv/192237/local/lib/python2.7/site-packages/setuptools/dist.py:476:
 UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
16:52:40   normalized_version,
16:52:40 DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 
2020. Please upgrade your Python as Python 2.7 won't be maintained after that 
date. A future version of pip will drop support for Python 2.7. More details 
about Python 2 support in pip, can be found at 
https://pip.pypa.io/en/latest/development/release-process/#python-2-support
16:52:40 Requirement already satisfied: mypy-protobuf==1.12 in 
/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/src/build/gradleenv/192237/lib/python2.7/site-packages
 (1.12)
16:52:40 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
but not used.
16:52:40 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but 
not used.
16:52:40 protoc-gen-mypy: program not found or is not executable
16:52:40 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
16:52:40 Traceback (most recent call last):
16:52:40   File "setup.py", line 295, in 
16:52:40 'mypy': generate_protos_first(mypy),
16:52:40   File 
"/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/src/build/gradleenv/192237/local/lib/python2.7/site-packages/setuptools/__init__.py",
 line 145, in setup
16:52:40 return distutils.core.setup(**attrs)
16:52:40   File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
16:52:40 dist.run_commands()
16:52:40   File "/usr/lib/python2.7/distutils/dist.py", line 953, in 
run_commands
16:52:40 self.run_command(cmd)
16:52:40   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
16:52:40 cmd_obj.run()
16:52:40   File 
"/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/src/build/gradleenv/192237/local/lib/python2.7/site-packages/setuptools/command/sdist.py",
 line 44, in run
16:52:40 self.run_command('egg_info')
16:52:40   File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
16:52:40 self.distribution.run_command(command)
16:52:40   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
16:52:40 cmd_obj.run()
16:52:40   File "setup.py", line 220, in run
16:52:40 gen_protos.generate_proto_files(log=log)
16:52:40   File 
"/home/jenkins/jenkins-slave/workspace/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/src/sdks/python/gen_protos.py",
 line 144, in generate_proto_files
16:52:40 '%s' % ret_code)
16:52:40 RuntimeError: Protoc returned non-zero status (see logs for details): 1
{code}

> failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> ---
>
> Key: BEAM-8966
> URL: https://issues.apache.org/jira/browse/BEAM-8966
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Chad Dombrova
>Priority: Major
>
> I believe this is due to https://github.com/apache/beam/pull/9915
> {code}
> Collecting mypy-protobuf==1.12
>   Using cached 
> https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
> Installing collected packages: mypy-protobuf
> Successfully installed mypy-protobuf-1.12
> beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
> not used.
> beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not 
> used.
> Traceback (most recent call last):
>   File "/usr/local/bin/protoc-gen-mypy", line 13, in 
> import google.protobuf.descriptor_pb2 as d
> ModuleNotFoundError: No module named 'google'
> --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> Process Process-1:
> Traceb

[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360344&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360344
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 16:19
Start Date: 16/Dec/19 16:19
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-566132048
 
 
   (rebased & squashed)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360344)
Time Spent: 13h 40m  (was: 13.5h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360346&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360346
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 16:20
Start Date: 16/Dec/19 16:20
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-566132048
 
 
   (squashed)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360346)
Time Spent: 13h 50m  (was: 13h 40m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8962) FlinkMetricContainer causes churn in the JobManager and lets the web frontend malfunction

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8962?focusedWorklogId=360353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360353
 ]

ASF GitHub Bot logged work on BEAM-8962:


Author: ASF GitHub Bot
Created on: 16/Dec/19 16:29
Start Date: 16/Dec/19 16:29
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10381: [BEAM-8962] Add 
option to disable the metric container accumulator
URL: https://github.com/apache/beam/pull/10381
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360353)
Time Spent: 1h 20m  (was: 1h 10m)

> FlinkMetricContainer causes churn in the JobManager and lets the web frontend 
> malfunction
> -
>
> Key: BEAM-8962
> URL: https://issues.apache.org/jira/browse/BEAM-8962
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The {{FlinkMetricContainer}} wraps the Beam metric container for reporting 
> metrics, but also stores them as Flink accumulators. With high parallelism 
> jobs with over a thousand tasks and many built-in Beam metrics for every Beam 
> step, this can accumulate to over 100MB of serialized data which is stored in 
> the JobManager's ExecutionGraph. This then fails to even sent over the wire, 
> due to the akka.framesize limit (10MB by default), and manifests in {{500 
> Internal Server Error}}s in the web frontend.
> We need to introduce an option to disable the reporting via accumulators. It 
> is mostly useful for batch workloads where you can retrieve the final 
> accumulator values at the end of the job. It adds a lot of memory and network 
> overhead.
> Perhaps we could even turn off the accumulators for streaming jobs, or 
> entirely and make them opt-in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8962) FlinkMetricContainer causes churn in the JobManager and lets the web frontend malfunction

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8962?focusedWorklogId=360358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360358
 ]

ASF GitHub Bot logged work on BEAM-8962:


Author: ASF GitHub Bot
Created on: 16/Dec/19 16:35
Start Date: 16/Dec/19 16:35
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10381: [BEAM-8962] Add option 
to disable the metric container accumulator
URL: https://github.com/apache/beam/pull/10381#issuecomment-566138675
 
 
   I think it is worth to think about disabling the metric accumulator by 
default and only enable it via a `--enable_metric_accumulator`. The reason is 
that it provides very little value. The accumulator is used to aggregate the 
final metric values to write them to the configured Beam MetricSink. However, 
this is only done on job completion which makes this feature useless for 
streaming applications. Even for batch, you probably want to be able to see 
metrics during job execution which the accumulator does not provide. I'm 
inclined to remove the feature entirely.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360358)
Time Spent: 1.5h  (was: 1h 20m)

> FlinkMetricContainer causes churn in the JobManager and lets the web frontend 
> malfunction
> -
>
> Key: BEAM-8962
> URL: https://issues.apache.org/jira/browse/BEAM-8962
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The {{FlinkMetricContainer}} wraps the Beam metric container for reporting 
> metrics, but also stores them as Flink accumulators. With high parallelism 
> jobs with over a thousand tasks and many built-in Beam metrics for every Beam 
> step, this can accumulate to over 100MB of serialized data which is stored in 
> the JobManager's ExecutionGraph. This then fails to even sent over the wire, 
> due to the akka.framesize limit (10MB by default), and manifests in {{500 
> Internal Server Error}}s in the web frontend.
> We need to introduce an option to disable the reporting via accumulators. It 
> is mostly useful for batch workloads where you can retrieve the final 
> accumulator values at the end of the job. It adds a lot of memory and network 
> overhead.
> Perhaps we could even turn off the accumulators for streaming jobs, or 
> entirely and make them opt-in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest

2019-12-16 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997438#comment-16997438
 ] 

Chad Dombrova commented on BEAM-8966:
-

hmmm...  the errors show two different problems.  

In Udi's,  mypy-protobuf is newly installed, protoc-gen-mypy is found by 
protoc, but there's a failure to import google.protobuf.  In Kamil's 
mypy-protobuf is already installed, but protoc-gen-mypy is not found by protoc. 

Does anyone have a way to reproduce this reliably?  What about locally?

I have a hunch this is the kind of problem that would be solved by the pep517 
efforts I've been working on.  In the meantime, the mypy tests are not required 
– they don't actually pass yet, they're simply there for reference – so we 
could remove the mypy-protobuf change and disable the tests until after the 
pep517 changes. 

 

> failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> ---
>
> Key: BEAM-8966
> URL: https://issues.apache.org/jira/browse/BEAM-8966
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Chad Dombrova
>Priority: Major
>
> I believe this is due to https://github.com/apache/beam/pull/9915
> {code}
> Collecting mypy-protobuf==1.12
>   Using cached 
> https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
> Installing collected packages: mypy-protobuf
> Successfully installed mypy-protobuf-1.12
> beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
> not used.
> beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not 
> used.
> Traceback (most recent call last):
>   File "/usr/local/bin/protoc-gen-mypy", line 13, in 
> import google.protobuf.descriptor_pb2 as d
> ModuleNotFoundError: No module named 'google'
> --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> Process Process-1:
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in 
> _bootstrap
> self.run()
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
> self._target(*self._args, **self._kwargs)
>   File "/app/sdks/python/gen_protos.py", line 189, in 
> _install_grpcio_tools_and_generate_proto_files
> generate_proto_files()
>   File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files
> '%s' % ret_code)
> RuntimeError: Protoc returned non-zero status (see logs for details): 1
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "setup.py", line 295, in 
> 'mypy': generate_protos_first(mypy),
>   File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 
> 145, in setup
> return distutils.core.setup(**attrs)
>   File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup
> dist.run_commands()
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands
> self.run_command(cmd)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", 
> line 44, in run
> self.run_command('egg_info')
>   File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command
> self.distribution.run_command(command)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "setup.py", line 220, in run
> gen_protos.generate_proto_files(log=log)
>   File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files
> raise ValueError("Proto generation failed (see log for details).")
> ValueError: Proto generation failed (see log for details).
> Service 'test' failed to build: The command '/bin/sh -c cd sdks/python && 
> python setup.py sdist && pip install --no-cache-dir $(ls 
> dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python37/1114/consoleText



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360361&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360361
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 16:39
Start Date: 16/Dec/19 16:39
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-566140540
 
 
   Could you squash this commit? 
https://github.com/apache/beam/pull/10268/commits/74f20d82f51b802947314943108d8af4969fe0f2
   
   Why would you keep an earlier version of this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360361)
Time Spent: 14h  (was: 13h 50m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8962) FlinkMetricContainer causes churn in the JobManager and lets the web frontend malfunction

2019-12-16 Thread Maximilian Michels (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997437#comment-16997437
 ] 

Maximilian Michels commented on BEAM-8962:
--

I think it is worth to think about disabling the metrics accumulator by default 
and only enable it via a {{--enable_metrics_accumulator}}. The reason is that 
it provides very little value. The accumulator is used to aggregate the final 
metrics values to write them to the configured Beam MetricSink. However, this 
is only done on job completion which makes this feature useless for streaming 
applications. Even for batch, you probably want to be able to see metrics 
during job execution which the accumulator does not provide. I'm inclined to 
remove the feature entirely.

> FlinkMetricContainer causes churn in the JobManager and lets the web frontend 
> malfunction
> -
>
> Key: BEAM-8962
> URL: https://issues.apache.org/jira/browse/BEAM-8962
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The {{FlinkMetricContainer}} wraps the Beam metric container for reporting 
> metrics, but also stores them as Flink accumulators. With high parallelism 
> jobs with over a thousand tasks and many built-in Beam metrics for every Beam 
> step, this can accumulate to over 100MB of serialized data which is stored in 
> the JobManager's ExecutionGraph. This then fails to even sent over the wire, 
> due to the akka.framesize limit (10MB by default), and manifests in {{500 
> Internal Server Error}}s in the web frontend.
> We need to introduce an option to disable the reporting via accumulators. It 
> is mostly useful for batch workloads where you can retrieve the final 
> accumulator values at the end of the job. It adds a lot of memory and network 
> overhead.
> Perhaps we could even turn off the accumulators for streaming jobs, or 
> entirely and make them opt-in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360384
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:18
Start Date: 16/Dec/19 17:18
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-566156575
 
 
   I believe that it may be worth keeping the pr in history. It introduces 
another (valid) way of how we can do classpath scanning in java >=9 that I've 
seen in other projects. In case we have any problems with classgraph, we can 
revert to that version easily (as easy as `git revert 7e5a885` - the commit 
that introduces classgraph).

   I could still squash that - it will be a little bit tricky due to another 
commit relying on the changes but I think I can deal with the conflicts. Should 
I do this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360384)
Time Spent: 14h 10m  (was: 14h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-16 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997475#comment-16997475
 ] 

Kirill Kozlov commented on BEAM-8933:
-

[~iemejia], I meant to say Arrow [1] in the title, BigQuery currently supports 
Avro.

As of right now reading from BQ in Arrow does not support all features Avro 
format does, but there are people working on matching the functionality.

[1] [https://github.com/apache/arrow]

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use (with 
> Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360388&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360388
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:27
Start Date: 16/Dec/19 17:27
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#issuecomment-566160168
 
 
   Sounds good. Let's keep the alternative approach in the Git history then. 
Merging, tests were passing before and the force-push did not result in any 
changed files.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360388)
Time Spent: 14h 20m  (was: 14h 10m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8590) Python typehints: native types: consider bare container types as containing Any

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8590?focusedWorklogId=360389&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360389
 ]

ASF GitHub Bot logged work on BEAM-8590:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:28
Start Date: 16/Dec/19 17:28
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10042: [BEAM-8590] Support 
unsubscripted native types
URL: https://github.com/apache/beam/pull/10042#issuecomment-566160641
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360389)
Time Spent: 40m  (was: 0.5h)

> Python typehints: native types: consider bare container types as containing 
> Any
> ---
>
> Key: BEAM-8590
> URL: https://issues.apache.org/jira/browse/BEAM-8590
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is for convert_to_beam_type:
> For example, process(element: List) is the same as process(element: 
> List[Any]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=360390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360390
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:28
Start Date: 16/Dec/19 17:28
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360390)
Time Spent: 14.5h  (was: 14h 20m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved BEAM-5495.
--
Fix Version/s: 2.19.0
   Resolution: Fixed

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest

2019-12-16 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997486#comment-16997486
 ] 

Udi Meiri commented on BEAM-8966:
-

I get the same "ModuleNotFoundError" error if I run this on my local machine:
{code}
./gradlew :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
{code}

> failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> ---
>
> Key: BEAM-8966
> URL: https://issues.apache.org/jira/browse/BEAM-8966
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Chad Dombrova
>Priority: Major
>
> I believe this is due to https://github.com/apache/beam/pull/9915
> {code}
> Collecting mypy-protobuf==1.12
>   Using cached 
> https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
> Installing collected packages: mypy-protobuf
> Successfully installed mypy-protobuf-1.12
> beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
> not used.
> beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not 
> used.
> Traceback (most recent call last):
>   File "/usr/local/bin/protoc-gen-mypy", line 13, in 
> import google.protobuf.descriptor_pb2 as d
> ModuleNotFoundError: No module named 'google'
> --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> Process Process-1:
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in 
> _bootstrap
> self.run()
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
> self._target(*self._args, **self._kwargs)
>   File "/app/sdks/python/gen_protos.py", line 189, in 
> _install_grpcio_tools_and_generate_proto_files
> generate_proto_files()
>   File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files
> '%s' % ret_code)
> RuntimeError: Protoc returned non-zero status (see logs for details): 1
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "setup.py", line 295, in 
> 'mypy': generate_protos_first(mypy),
>   File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 
> 145, in setup
> return distutils.core.setup(**attrs)
>   File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup
> dist.run_commands()
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands
> self.run_command(cmd)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", 
> line 44, in run
> self.run_command('egg_info')
>   File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command
> self.distribution.run_command(command)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "setup.py", line 220, in run
> gen_protos.generate_proto_files(log=log)
>   File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files
> raise ValueError("Proto generation failed (see log for details).")
> ValueError: Proto generation failed (see log for details).
> Service 'test' failed to build: The command '/bin/sh -c cd sdks/python && 
> python setup.py sdist && pip install --no-cache-dir $(ls 
> dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python37/1114/consoleText



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8837) PCollectionVisualizationTest: possible bug

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8837?focusedWorklogId=360395&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360395
 ]

ASF GitHub Bot logged work on BEAM-8837:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:40
Start Date: 16/Dec/19 17:40
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10321: [BEAM-8837] Fix 
pcoll_visualization tests
URL: https://github.com/apache/beam/pull/10321#issuecomment-566165104
 
 
   Looks good!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360395)
Time Spent: 2h 20m  (was: 2h 10m)

> PCollectionVisualizationTest: possible bug
> --
>
> Key: BEAM-8837
> URL: https://issues.apache.org/jira/browse/BEAM-8837
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This seems like a bug, even though the test passes:
> {code}
> test_display_plain_text_when_kernel_has_no_frontend 
> (apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest)
>  ... Exception in thread Thread-4405:
> Traceback (most recent call last):
>   File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
> self.run()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/.eggs/timeloop-1.0.2-py3.7.egg/timeloop/job.py",
>  line 19, in run
> self.execute(*self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 132, in continuous_update_display
> updated_pv.display_facets(updating_pv=pv)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 209, in display_facets
> data = self._to_dataframe()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 278, in _to_dataframe
> for el in self._to_element_list():
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 266, in _to_element_list
> if ie.current_env().cache_manager().exists('full', self._cache_key):
> AttributeError: 'NoneType' object has no attribute 'exists'
> ok
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8837) PCollectionVisualizationTest: possible bug

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8837?focusedWorklogId=360396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360396
 ]

ASF GitHub Bot logged work on BEAM-8837:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:40
Start Date: 16/Dec/19 17:40
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10321: [BEAM-8837] Fix 
pcoll_visualization tests
URL: https://github.com/apache/beam/pull/10321
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360396)
Time Spent: 2.5h  (was: 2h 20m)

> PCollectionVisualizationTest: possible bug
> --
>
> Key: BEAM-8837
> URL: https://issues.apache.org/jira/browse/BEAM-8837
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This seems like a bug, even though the test passes:
> {code}
> test_display_plain_text_when_kernel_has_no_frontend 
> (apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest)
>  ... Exception in thread Thread-4405:
> Traceback (most recent call last):
>   File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
> self.run()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/.eggs/timeloop-1.0.2-py3.7.egg/timeloop/job.py",
>  line 19, in run
> self.execute(*self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 132, in continuous_update_display
> updated_pv.display_facets(updating_pv=pv)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 209, in display_facets
> data = self._to_dataframe()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 278, in _to_dataframe
> for el in self._to_element_list():
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 266, in _to_element_list
> if ie.current_env().cache_manager().exists('full', self._cache_key):
> AttributeError: 'NoneType' object has no attribute 'exists'
> ok
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8825?focusedWorklogId=360398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360398
 ]

ASF GitHub Bot logged work on BEAM-8825:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:44
Start Date: 16/Dec/19 17:44
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10380: [BEAM-8825] Add limit 
on number of mutated rows to batching/sorting stages.
URL: https://github.com/apache/beam/pull/10380#issuecomment-566166634
 
 
   Will ignore Java_Examples_Dataflow (WindowedWordCountIT) test failure since 
this is a release branch 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360398)
Time Spent: 1h 40m  (was: 1.5h)

> OOM when writing large numbers of 'narrow' rows
> ---
>
> Key: BEAM-8825
> URL: https://issues.apache.org/jira/browse/BEAM-8825
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8825?focusedWorklogId=360399&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360399
 ]

ASF GitHub Bot logged work on BEAM-8825:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:44
Start Date: 16/Dec/19 17:44
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10380: [BEAM-8825] Add 
limit on number of mutated rows to batching/sorting stages.
URL: https://github.com/apache/beam/pull/10380
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360399)
Time Spent: 1h 50m  (was: 1h 40m)

> OOM when writing large numbers of 'narrow' rows
> ---
>
> Key: BEAM-8825
> URL: https://issues.apache.org/jira/browse/BEAM-8825
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-16 Thread Romain Manni-Bucau (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997492#comment-16997492
 ] 

Romain Manni-Bucau commented on BEAM-5495:
--

Any reason to use a one man github project (io.github.classgraph:classgraph) 
instead of apache xbean proposal?

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=360400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360400
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 16/Dec/19 17:49
Start Date: 16/Dec/19 17:49
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10385: 
[release-2.18.0][BEAM-8967] Maven artifact beam-sdks-java-core does not have 
JSR305 specified as "compile"
URL: https://github.com/apache/beam/pull/10385
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360400)
Time Spent: 1.5h  (was: 1h 20m)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0, 2.18.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8824) Add support for allowed lateness in python sdk

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8824?focusedWorklogId=360405&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360405
 ]

ASF GitHub Bot logged work on BEAM-8824:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:01
Start Date: 16/Dec/19 18:01
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10216: [BEAM-8824] Add 
support to allow specify window allowed_lateness in python sdk
URL: https://github.com/apache/beam/pull/10216#issuecomment-565542800
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360405)
Time Spent: 7h 10m  (was: 7h)

> Add support for allowed lateness in python sdk
> --
>
> Key: BEAM-8824
> URL: https://issues.apache.org/jira/browse/BEAM-8824
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8968) portableWordCount test for Spark/Flink failing: jar not found

2019-12-16 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8968:
--
Labels: currently-failing portability-flink portability-spark test-failure  
(was: portability-flink portability-spark test-failure)

> portableWordCount test for Spark/Flink failing: jar not found
> -
>
> Key: BEAM-8968
> URL: https://issues.apache.org/jira/browse/BEAM-8968
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-flink, portability-spark, 
> test-failure
>
> This affects portableWordCountSparkRunnerBatch, 
> portableWordCountFlinkRunnerBatch, and portableWordCountFlinkRunnerStreaming.
> 22:43:23 RuntimeError: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib/runners/flink/1.9/job-server/build/libs/beam-runners-flink-1.9-job-server-2.19.0-SNAPSHOT.jar
>  not found. Please build the server with 
> 22:43:23   cd 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib;
>  ./gradlew runners:flink:1.9:job-server:shadowJar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8824) Add support for allowed lateness in python sdk

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8824?focusedWorklogId=360404&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360404
 ]

ASF GitHub Bot logged work on BEAM-8824:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:01
Start Date: 16/Dec/19 18:01
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10216: [BEAM-8824] Add 
support to allow specify window allowed_lateness in python sdk
URL: https://github.com/apache/beam/pull/10216#issuecomment-566172906
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360404)
Time Spent: 7h  (was: 6h 50m)

> Add support for allowed lateness in python sdk
> --
>
> Key: BEAM-8824
> URL: https://issues.apache.org/jira/browse/BEAM-8824
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest

2019-12-16 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997506#comment-16997506
 ] 

Chad Dombrova commented on BEAM-8966:
-

Ok, I'm working on trying to reproduce this, but running into errors.  This 
test copies the entirety of the sdks/python directory, including all of my 
intermediate build, target, and tox env directories, and that's tripping 
something up.  Seems like another example of a highly inefficient approach to 
creating build isolation.  Is there a reason the ITs don't go through tox?

 

 

 

> failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> ---
>
> Key: BEAM-8966
> URL: https://issues.apache.org/jira/browse/BEAM-8966
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Chad Dombrova
>Priority: Major
>
> I believe this is due to https://github.com/apache/beam/pull/9915
> {code}
> Collecting mypy-protobuf==1.12
>   Using cached 
> https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
> Installing collected packages: mypy-protobuf
> Successfully installed mypy-protobuf-1.12
> beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
> not used.
> beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not 
> used.
> Traceback (most recent call last):
>   File "/usr/local/bin/protoc-gen-mypy", line 13, in 
> import google.protobuf.descriptor_pb2 as d
> ModuleNotFoundError: No module named 'google'
> --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> Process Process-1:
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in 
> _bootstrap
> self.run()
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
> self._target(*self._args, **self._kwargs)
>   File "/app/sdks/python/gen_protos.py", line 189, in 
> _install_grpcio_tools_and_generate_proto_files
> generate_proto_files()
>   File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files
> '%s' % ret_code)
> RuntimeError: Protoc returned non-zero status (see logs for details): 1
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "setup.py", line 295, in 
> 'mypy': generate_protos_first(mypy),
>   File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 
> 145, in setup
> return distutils.core.setup(**attrs)
>   File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup
> dist.run_commands()
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands
> self.run_command(cmd)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", 
> line 44, in run
> self.run_command('egg_info')
>   File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command
> self.distribution.run_command(command)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "setup.py", line 220, in run
> gen_protos.generate_proto_files(log=log)
>   File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files
> raise ValueError("Proto generation failed (see log for details).")
> ValueError: Proto generation failed (see log for details).
> Service 'test' failed to build: The command '/bin/sh -c cd sdks/python && 
> python setup.py sdist && pip install --no-cache-dir $(ls 
> dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python37/1114/consoleText



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8973) Python PreCommit occasionally timesout

2019-12-16 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-8973:
-

 Summary: Python PreCommit occasionally timesout 
 Key: BEAM-8973
 URL: https://issues.apache.org/jira/browse/BEAM-8973
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core, test-failures
Reporter: Valentyn Tymofieiev


Sample time outs in Cron jobs (~1 out of 10 jobs):

[https://builds.apache.org/job/beam_PreCommit_Python_Cron/2157/]

[https://builds.apache.org/job/beam_PreCommit_Python_Cron/2146/]

In jobs triggered on PRs the error also happened more frequently, example: 
[https://builds.apache.org/job/beam_PreCommit_Python_Commit/10373/]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8864) BigQueryQueryToTableIT.test_big_query_legacy_sql - fails in post commit tests

2019-12-16 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997516#comment-16997516
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-8864:
-

Seems like verify query returned zero results.

[https://builds.apache.org/job/beam_PostCommit_Python35/1123/testReport/junit/apache_beam.io.gcp.big_query_query_to_table_it_test/BigQueryQueryToTableIT/test_big_query_legacy_sql/]

apache_beam.io.gcp.tests.bigquery_matcher: INFO: Read from given query (SELECT 
fruit from `python_query_to_table_1575309894232.output_table`;), total rows 0

 

Seems like output table is just named "output_table": 
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L87]

 

So probably this test will be flaky in the presence of multiple test suites 
since one test suite could clean up the output table while another is running 
it. [~boyuanz] , have we considered randomizing output table name ?

 

 

> BigQueryQueryToTableIT.test_big_query_legacy_sql - fails in post commit tests
> -
>
> Key: BEAM-8864
> URL: https://issues.apache.org/jira/browse/BEAM-8864
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp, test-failures
>Reporter: Ahmet Altay
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
> Fix For: Not applicable
>
>
> Logs: 
> [https://builds.apache.org/job/beam_PostCommit_Python35/1123/testReport/junit/apache_beam.io.gcp.big_query_query_to_table_it_test/BigQueryQueryToTableIT/test_big_query_legacy_sql/]
> Error Message
> Expected: (Test pipeline expected terminated in state: DONE and Expected 
> checksum is 158a8ea1c254fcf40d4ed3e7c0242c3ea0a29e72)
>  but: Expected checksum is 158a8ea1c254fcf40d4ed3e7c0242c3ea0a29e72 Actual 
> checksum is da39a3ee5e6b4b0d3255bfef95601890afd80709



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=360407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360407
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:15
Start Date: 16/Dec/19 18:15
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a 
race condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-566178193
 
 
   Thanks, @ibzib  and @udim. Precommits have timeout after 3 hrs. Looking at 
https://builds.apache.org/job/beam_PreCommit_Python_Cron/, 1 in 10 precommit 
jobs times out, but most finish under an hour. Opened 
https://issues.apache.org/jira/browse/BEAM-8973.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360407)
Time Spent: 2h 50m  (was: 2h 40m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=360408&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360408
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:15
Start Date: 16/Dec/19 18:15
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a 
race condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-566178327
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360408)
Time Spent: 3h  (was: 2h 50m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest

2019-12-16 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997522#comment-16997522
 ] 

Udi Meiri commented on BEAM-8966:
-

There's no reason why ITs don't go through tox, and I believe they should.


> failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> ---
>
> Key: BEAM-8966
> URL: https://issues.apache.org/jira/browse/BEAM-8966
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Chad Dombrova
>Priority: Major
>
> I believe this is due to https://github.com/apache/beam/pull/9915
> {code}
> Collecting mypy-protobuf==1.12
>   Using cached 
> https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
> Installing collected packages: mypy-protobuf
> Successfully installed mypy-protobuf-1.12
> beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
> not used.
> beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not 
> used.
> Traceback (most recent call last):
>   File "/usr/local/bin/protoc-gen-mypy", line 13, in 
> import google.protobuf.descriptor_pb2 as d
> ModuleNotFoundError: No module named 'google'
> --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> Process Process-1:
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in 
> _bootstrap
> self.run()
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
> self._target(*self._args, **self._kwargs)
>   File "/app/sdks/python/gen_protos.py", line 189, in 
> _install_grpcio_tools_and_generate_proto_files
> generate_proto_files()
>   File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files
> '%s' % ret_code)
> RuntimeError: Protoc returned non-zero status (see logs for details): 1
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "setup.py", line 295, in 
> 'mypy': generate_protos_first(mypy),
>   File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 
> 145, in setup
> return distutils.core.setup(**attrs)
>   File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup
> dist.run_commands()
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands
> self.run_command(cmd)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", 
> line 44, in run
> self.run_command('egg_info')
>   File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command
> self.distribution.run_command(command)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "setup.py", line 220, in run
> gen_protos.generate_proto_files(log=log)
>   File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files
> raise ValueError("Proto generation failed (see log for details).")
> ValueError: Proto generation failed (see log for details).
> Service 'test' failed to build: The command '/bin/sh -c cd sdks/python && 
> python setup.py sdist && pip install --no-cache-dir $(ls 
> dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python37/1114/consoleText



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=360412&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360412
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:26
Start Date: 16/Dec/19 18:26
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9855: [BEAM-8446] Retrying BQ 
query on timeouts
URL: https://github.com/apache/beam/pull/9855#issuecomment-566182422
 
 
   > Hm I added it because the Pydoc specifies it is a possibility. I did not 
reproduce it because the job is flaky...
   > 
https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.result
   
   If it's listed then we should definitely handle it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360412)
Time Spent: 6h 50m  (was: 6h 40m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360411
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:26
Start Date: 16/Dec/19 18:26
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358392005
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java
 ##
 @@ -0,0 +1,424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static java.util.Collections.emptyList;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * The {@link Document} class holds the elements of a Thrift file.
+ *
+ * A {@link Document} is made up of:
+ *
+ * 
+ *   {@link Header} - Contains: includes, cppIncludes, namespaces, and 
defaultNamespace.
+ *   {@link Document#definitions} - Contains list of Thrift {@link 
Definition}.
+ * 
+ */
+public class Document implements Serializable {
+  private Header header;
+  private List definitions;
+
+  public Document(Header header, List definitions) {
+this.header = checkNotNull(header, "header");
+this.definitions = ImmutableList.copyOf(checkNotNull(definitions, 
"definitions"));
+  }
+
+  /** Returns an empty {@link Document}. */
+  public static Document emptyDocument() {
+List includes = emptyList();
+List cppIncludes = emptyList();
+String defaultNamespace = null;
+Map namespaces = Collections.emptyMap();
+Header header = new Header(includes, cppIncludes, defaultNamespace, 
namespaces);
+List definitions = emptyList();
+return new Document(header, definitions);
+  }
+
+  public Document getDocument() {
+return this;
+  }
+
+  public Header getHeader() {
+return this.header;
+  }
+
+  public void setHeader(Header header) {
+this.header = header;
+  }
+
+  public List getDefinitions() {
+return definitions;
+  }
+
+  public void setDefinitions(List definitions) {
+this.definitions = definitions;
+  }
+
+  public void visit(final DocumentVisitor visitor) throws IOException {
+Preconditions.checkNotNull(visitor, "the visitor must not be null!");
+
+for (Definition definition : definitions) {
+  if (visitor.accept(definition)) {
+definition.visit(visitor);
+  }
+}
+  }
+
+  /** Gets Avro {@link Schema} for the object. */
+  public Schema getSchema() {
+return ReflectData.get().getSchema(Document.class);
+  }
+
+  /** Gets {@link Document} as a {@link GenericRecord}. */
+  public GenericRecord getAsGenericRecord() {
+GenericRecordBuilder genericRecordBuilder = new 
GenericRecordBuilder(this.getSchema());
+genericRecordBuilder.set("header", this.getHeader()).set("definitions", 
this.getDefinitions());
+
+return genericRecordBuilder.build();
+  }
+
+  /** Adds list of includes to {@link Document#header}. */
+  public void addIncludes(List includes) {
+checkNotNull(includes, "includes");
+List currentIncludes = new 
ArrayList<>(this.getHeader().getIncludes());
+currentIncludes.addAll(includes);
+this.header.setIncludes(curr

[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=360414&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360414
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:27
Start Date: 16/Dec/19 18:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9855: [BEAM-8446] 
Retrying BQ query on timeouts
URL: https://github.com/apache/beam/pull/9855
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360414)
Time Spent: 7h  (was: 6h 50m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >