from:"Kenneth Knowles \(Jira\)"

[jira] [Updated] (BEAM-8735) tox: yes or no

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8735:
--
Status: Open  (was: Triage Needed)

> tox: yes or no
> --
>
> Key: BEAM-8735
> URL: https://issues.apache.org/jira/browse/BEAM-8735
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> https://beam.apache.org/contribute/ says "Python, virtualenv, and tox 
> installed for Python SDK development"
> However, https://github.com/apache/beam says
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> Notice the second makes no mention of tox. Please sync up these instructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8735) Beam site says to install tox, but GitHub README does not

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108775#comment-17108775
 ] 

Kenneth Knowles commented on BEAM-8735:
---

Ideally we'd just have one set of instructions.

> Beam site says to install tox, but GitHub README does not
> -
>
> Key: BEAM-8735
> URL: https://issues.apache.org/jira/browse/BEAM-8735
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> https://beam.apache.org/contribute/ says "Python, virtualenv, and tox 
> installed for Python SDK development"
> However, https://github.com/apache/beam says
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> Notice the second makes no mention of tox. Please sync up these instructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8735) Beam site says to install tox, but GitHub README does not

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8735:
--
Summary: Beam site says to install tox, but GitHub README does not  (was: 
tox: yes or no)

> Beam site says to install tox, but GitHub README does not
> -
>
> Key: BEAM-8735
> URL: https://issues.apache.org/jira/browse/BEAM-8735
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> https://beam.apache.org/contribute/ says "Python, virtualenv, and tox 
> installed for Python SDK development"
> However, https://github.com/apache/beam says
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> Notice the second makes no mention of tox. Please sync up these instructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9894) Add batch SnowflakeIO.Write to Java SDK

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9894:
--
Status: Open  (was: Triage Needed)

> Add batch SnowflakeIO.Write to Java SDK
> ---
>
> Key: BEAM-9894
> URL: https://issues.apache.org/jira/browse/BEAM-9894
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Dariusz Aniszewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9898) Add cross-language support to SnowflakeIO.Write

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9898:
--
Status: Open  (was: Triage Needed)

> Add cross-language support to SnowflakeIO.Write
> ---
>
> Key: BEAM-9898
> URL: https://issues.apache.org/jira/browse/BEAM-9898
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Dariusz Aniszewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9896) Add streaming for SnowflakeIO.Write to Java SDK

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9896:
--
Status: Open  (was: Triage Needed)

> Add streaming for SnowflakeIO.Write to Java SDK
> ---
>
> Key: BEAM-9896
> URL: https://issues.apache.org/jira/browse/BEAM-9896
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Dariusz Aniszewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9897) Add cross-language support to SnowflakeIO.Read

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9897:
--
Status: Open  (was: Triage Needed)

> Add cross-language support to SnowflakeIO.Read
> --
>
> Key: BEAM-9897
> URL: https://issues.apache.org/jira/browse/BEAM-9897
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Dariusz Aniszewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10024:
---
Status: Open  (was: Triage Needed)

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10025) Samza runner failing testOutputTimestampDefault

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10025:
---
Status: Open  (was: Triage Needed)

> Samza runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10025
> URL: https://issues.apache.org/jira/browse/BEAM-10025
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10027) Support for Kotlin-based Beam Katas

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10027:
---
Status: Open  (was: Triage Needed)

> Support for Kotlin-based Beam Katas
> ---
>
> Key: BEAM-10027
> URL: https://issues.apache.org/jira/browse/BEAM-10027
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion
>Assignee: Rion
>Priority: P2
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Currently, there are a series of examples available demonstrating the use of 
> Apache Beam with Kotlin. It would be nice to have support for the same Beam 
> Katas that exist for Python, Go, and Java to also support Kotlin. 
> The port itself shouldn't be that involved since it can still target the JVM, 
> so it would likely just require the inclusion for Kotlin dependencies and a 
> conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9961) Python MongoDBIO does not apply projection

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-9961:
-

Assignee: Corvin Deboeser

> Python MongoDBIO does not apply projection
> --
>
> Key: BEAM-9961
> URL: https://issues.apache.org/jira/browse/BEAM-9961
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Assignee: Corvin Deboeser
>Priority: P2
>
> ReadFromMongoDB does not apply the provided projection when reading from the 
> client - only filter is being applied as you can see here:
> https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/mongodbio.py#L204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9961) Python MongoDBIO does not apply projection

2020-05-18 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108875#comment-17108875
 ] 

Kenneth Knowles edited comment on BEAM-9961 at 5/18/20, 9:05 PM:
-

Hi [~kenn] sure thing! Could also take care of BEAM-9960, BEAM-10002 and 
BEAM-10004 in one go. All quite small fixes.


was (Author: corvin):
Hi [~kenn] sure thing! Could also take care of #9960, #10002 and #10004 in one 
go. All quite small fixes.

> Python MongoDBIO does not apply projection
> --
>
> Key: BEAM-9961
> URL: https://issues.apache.org/jira/browse/BEAM-9961
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Assignee: Corvin Deboeser
>Priority: P2
>
> ReadFromMongoDB does not apply the provided projection when reading from the 
> client - only filter is being applied as you can see here:
> https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/mongodbio.py#L204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9960) Python MongoDBIO fails when response of split vector command is larger than 16mb

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-9960:
-

Assignee: Corvin Deboeser

> Python MongoDBIO fails when response of split vector command is larger than 
> 16mb
> 
>
> Key: BEAM-9960
> URL: https://issues.apache.org/jira/browse/BEAM-9960
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Assignee: Corvin Deboeser
>Priority: P2
>
> When using MongoDBIO on a large collection with large documents on average, 
> then the split vector command results in a lot of splits if the desired 
> bundle size is small. In extreme cases, the response from the split vector 
> command can be larger than 16mb which is not supported by pymongo / MongoDB:
> {{pymongo.errors.ProtocolError: Message length (33699186) is larger than 
> server max message size (33554432)}}
>  
> Environment: Was running this on Google Dataflow / Beam Python SDK 2.20.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-10004) ZeroDivisionError if source bundle smaller than 1mb

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-10004:
--

Assignee: Corvin Deboeser

> ZeroDivisionError if source bundle smaller than 1mb
> ---
>
> Key: BEAM-10004
> URL: https://issues.apache.org/jira/browse/BEAM-10004
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Assignee: Corvin Deboeser
>Priority: P2
>
> If the desired_bundle_size is lower than 1mb, then split returns only 
> SourceBundles with weight=0 which leads to a ZeroDivisionError down the line. 
> {noformat}
> ZeroDivisionError: float division by zero{noformat}
> This error is raised from _compute_cumulative_weights here:
> [https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/concat_source.py#L154]
>  
> Worked for me: Pulling the truncation from _get_split_keys 
> ([here|https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/mongodbio.py#L226])
>  into split instead.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-10002) Mongo cursor timeout leads to CursorNotFound error

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-10002:
--

Assignee: Corvin Deboeser

> Mongo cursor timeout leads to CursorNotFound error
> --
>
> Key: BEAM-10002
> URL: https://issues.apache.org/jira/browse/BEAM-10002
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Assignee: Corvin Deboeser
>Priority: P2
>
> If some work items take a lot of processing time and the cursor of a bundle 
> is not queried for too long, then mongodb will timeout the cursor which 
> results in
> {code:java}
> pymongo.errors.CursorNotFound: cursor id ... not found
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9961) Python MongoDBIO does not apply projection

2020-05-18 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110618#comment-17110618
 ] 

Kenneth Knowles commented on BEAM-9961:
---

Great!

> Python MongoDBIO does not apply projection
> --
>
> Key: BEAM-9961
> URL: https://issues.apache.org/jira/browse/BEAM-9961
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Assignee: Corvin Deboeser
>Priority: P2
>
> ReadFromMongoDB does not apply the provided projection when reading from the 
> client - only filter is being applied as you can see here:
> https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/mongodbio.py#L204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.

2020-05-18 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110619#comment-17110619
 ] 

Kenneth Knowles commented on BEAM-9745:
---

Yea it was disabled because it has been perma-red for a very long time.

> [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to 
> deserialize Custom DoFns and Custom Coders.
> -
>
> Key: BEAM-9745
> URL: https://issues.apache.org/jira/browse/BEAM-9745
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, java-fn-execution, sdk-java-harness, 
> test-failures
>Reporter: Daniel Oliveira
>Assignee: Kenneth Knowles
>Priority: P0
>  Labels: currently-failing
> Fix For: 2.22.0
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project]
> Initial investigation:
> The bug appears to be popping up on BigQuery tests mostly, but also a 
> BigTable and a Datastore test.
> Here's an example stacktrace of the two errors, showing _only_ the error 
> messages themselves. Source: 
> [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe]
> {noformat}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -191: 
> java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With 
> Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -191: java.lang.IllegalArgumentException: unable to deserialize 
> Custom DoFn With Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> {noformat}
> Update: Looks like this has been failing as far back as [Apr 
> 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] 
> after a long period where the test was consistently timing out since [Mar 
> 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. 
> So it's hard to narrow down what commit may have caused this. Plus, the test 
> was failing due to a completely different BigQuery failure before anyway, so 
> it seems like this test will need to be completely fixed from scratch, 
> instead of tracking down a specific breaking change.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#80

[jira] [Updated] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9745:
--
Priority: P1  (was: P0)

> [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to 
> deserialize Custom DoFns and Custom Coders.
> -
>
> Key: BEAM-9745
> URL: https://issues.apache.org/jira/browse/BEAM-9745
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, java-fn-execution, sdk-java-harness, 
> test-failures
>Reporter: Daniel Oliveira
>Assignee: Kenneth Knowles
>Priority: P1
>  Labels: currently-failing
> Fix For: 2.22.0
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project]
> Initial investigation:
> The bug appears to be popping up on BigQuery tests mostly, but also a 
> BigTable and a Datastore test.
> Here's an example stacktrace of the two errors, showing _only_ the error 
> messages themselves. Source: 
> [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe]
> {noformat}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -191: 
> java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With 
> Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -191: java.lang.IllegalArgumentException: unable to deserialize 
> Custom DoFn With Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> {noformat}
> Update: Looks like this has been failing as far back as [Apr 
> 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] 
> after a long period where the test was consistently timing out since [Mar 
> 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. 
> So it's hard to narrow down what commit may have caused this. Plus, the test 
> was failing due to a completely different BigQuery failure before anyway, so 
> it seems like this test will need to be completely fixed from scratch, 
> instead of tracking down a specific breaking change.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.

2020-05-18 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110621#comment-17110621
 ] 

Kenneth Knowles commented on BEAM-9745:
---

BEAM-9868 indicates that the test that should be definitive for this 
functionality would be the Python local fn api runner executing Java pipeline.

> [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to 
> deserialize Custom DoFns and Custom Coders.
> -
>
> Key: BEAM-9745
> URL: https://issues.apache.org/jira/browse/BEAM-9745
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, java-fn-execution, sdk-java-harness, 
> test-failures
>Reporter: Daniel Oliveira
>Assignee: Kenneth Knowles
>Priority: P1
>  Labels: currently-failing
> Fix For: 2.22.0
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project]
> Initial investigation:
> The bug appears to be popping up on BigQuery tests mostly, but also a 
> BigTable and a Datastore test.
> Here's an example stacktrace of the two errors, showing _only_ the error 
> messages themselves. Source: 
> [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe]
> {noformat}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -191: 
> java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With 
> Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -191: java.lang.IllegalArgumentException: unable to deserialize 
> Custom DoFn With Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -206: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.IllegalArgumentException: unable to deserialize Custom Coder Bytes
> ...
> Caused by: java.lang.IllegalArgumentException: unable to deserialize Custom 
> Coder Bytes
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder
> ...
> {noformat}
> Update: Looks like this has been failing as far back as [Apr 
> 4|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4566/] 
> after a long period where the test was consistently timing out since [Mar 
> 31|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4546/]. 
> So it's hard to narrow down what commit may have caused this. Plus, the test 
> was failing due to a completely different BigQuery failure before anyway, so 
> it seems like this test will need to be completely fixed from scratch, 
> instead of tracking down a specific breaking change.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/pos

[jira] [Commented] (BEAM-9239) Dependency conflict with Spark using aws io

2020-05-18 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110625#comment-17110625
 ] 

Kenneth Knowles commented on BEAM-9239:
---

That sounds familiar to me as well.

> Dependency conflict with Spark using aws io
> ---
>
> Key: BEAM-9239
> URL: https://issues.apache.org/jira/browse/BEAM-9239
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws, runner-spark
>Affects Versions: 2.17.0
>Reporter: David McIntosh
>Priority: P1
>
> Starting with beam 2.17.0 I get this error in the Spark 2.4.4 driver when aws 
> io is also used:
> {noformat}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.databind.jsontype.TypeSerializer.typeId(Ljava/lang/Object;Lcom/fasterxml/jackson/core/JsonToken;)Lcom/fasterxml/jackson/core/type/WritableTypeId;
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:163)
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:134)
>   at 
> com.fasterxml.jackson.databind.ser.impl.TypeWrappedSerializer.serialize(TypeWrappedSerializer.java:32)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:721)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:647)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:635)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.serializeToJson(SerializablePipelineOptions.java:67)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.(SerializablePipelineOptions.java:43)
>   at 
> org.apache.beam.runners.spark.translation.EvaluationContext.(EvaluationContext.java:71)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:215)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:90)
> {noformat}
> The cause seems to be that the Spark driver environment uses an older version 
> of Jackson. I tried to update jackson on the Spark cluster but that led to 
> several other errors. 
> The change that started causing this was:
> https://github.com/apache/beam/commit/b68d70a47b68ad84efcd9405c1799002739bd116
> After reverting that change I was able to successfully run my job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10028) [Java SDK] Support state backed iterables within the SDK harness

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10028:
---
Status: Open  (was: Triage Needed)

> [Java SDK] Support state backed iterables within the SDK harness
> 
>
> Key: BEAM-10028
> URL: https://issues.apache.org/jira/browse/BEAM-10028
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-2762) Coverage report for Python code

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-2762.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Coverage report for Python code
> ---
>
> Key: BEAM-2762
> URL: https://issues.apache.org/jira/browse/BEAM-2762
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: P2
> Fix For: Not applicable
>
>
> It's good to have code coverage in Python SDK to show the test coverage. Java 
> is using jacoco-maven-plugin to generate coverage report and coveralls 
> service to manage/display data. 
> Python have similar tool called coverage.py for report generation and 
> coveralls-python to send report to api of coveralls service. 
> It's nice to have one place (like coveralls service) to manage and show data 
> from different SDKs together/separately. However, there are still some 
> problems due to the fact of Beam CI system and multi languages in Beam. The 
> coveralls service doesn't have a good way to collect data separately from 
> different projects. But the postcommit builds are separated by sdks. 
> As the first step, I think it's good to have the python coverage report 
> printout in the build to give people a idea of the current coverage status at 
> lease.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (BEAM-2762) Coverage report for Python code

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reopened BEAM-2762:
---
  Assignee: (was: Mark Liu)

Actually I'm reopening this because done & done here would mean it is very easy 
to discover the coverage report.

> Coverage report for Python code
> ---
>
> Key: BEAM-2762
> URL: https://issues.apache.org/jira/browse/BEAM-2762
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core, testing
>Reporter: Mark Liu
>Priority: P2
> Fix For: Not applicable
>
>
> It's good to have code coverage in Python SDK to show the test coverage. Java 
> is using jacoco-maven-plugin to generate coverage report and coveralls 
> service to manage/display data. 
> Python have similar tool called coverage.py for report generation and 
> coveralls-python to send report to api of coveralls service. 
> It's nice to have one place (like coveralls service) to manage and show data 
> from different SDKs together/separately. However, there are still some 
> problems due to the fact of Beam CI system and multi languages in Beam. The 
> coveralls service doesn't have a good way to collect data separately from 
> different projects. But the postcommit builds are separated by sdks. 
> As the first step, I think it's good to have the python coverage report 
> printout in the build to give people a idea of the current coverage status at 
> lease.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-2762) Coverage report for Python code

2020-05-19 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111288#comment-17111288
 ] 

Kenneth Knowles edited comment on BEAM-2762 at 5/19/20, 3:48 PM:
-

Actually I'm reopening this because done & done here would mean it is very easy 
to discover the coverage report, and I actually couldn't find it in a couple 
minutes.


was (Author: kenn):
Actually I'm reopening this because done & done here would mean it is very easy 
to discover the coverage report.

> Coverage report for Python code
> ---
>
> Key: BEAM-2762
> URL: https://issues.apache.org/jira/browse/BEAM-2762
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core, testing
>Reporter: Mark Liu
>Priority: P2
> Fix For: Not applicable
>
>
> It's good to have code coverage in Python SDK to show the test coverage. Java 
> is using jacoco-maven-plugin to generate coverage report and coveralls 
> service to manage/display data. 
> Python have similar tool called coverage.py for report generation and 
> coveralls-python to send report to api of coveralls service. 
> It's nice to have one place (like coveralls service) to manage and show data 
> from different SDKs together/separately. However, there are still some 
> problems due to the fact of Beam CI system and multi languages in Beam. The 
> coveralls service doesn't have a good way to collect data separately from 
> different projects. But the postcommit builds are separated by sdks. 
> As the first step, I think it's good to have the python coverage report 
> printout in the build to give people a idea of the current coverage status at 
> lease.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-2318) Flake in HBaseIOTest

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2318:
--
Labels: flake  (was: )

> Flake in HBaseIOTest
> 
>
> Key: BEAM-2318
> URL: https://issues.apache.org/jira/browse/BEAM-2318
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hbase, testing
>Reporter: Kenneth Knowles
>Assignee: Dan Halperin
>Priority: P2
>  Labels: flake
> Fix For: 2.1.0
>
>
> Saw this failure in the nightly snapshot, but it doesn't reproduce right 
> away: 
> https://builds.apache.org/job/beam_Release_NightlySnapshot/org.apache.beam$beam-sdks-java-io-hbase/419/testReport/junit/org.apache.beam.sdk.io.hbase/HBaseIOTest/testWritingFailsTableDoesNotExist/
> Excerpting, since that link will be GC'd at some point:
> {code}
> Error Message
> Expected test to throw (an instance of java.lang.IllegalArgumentException and 
> exception with message a string containing "Table TEST-TABLE does not exist")
> Stacktrace
> java.lang.AssertionError: Expected test to throw (an instance of 
> java.lang.IllegalArgumentException and exception with message a string 
> containing "Table TEST-TABLE does not exist")
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.junit.rules.ExpectedException.failDueToMissingException(ExpectedException.java:263)
>   at 
> org.junit.rules.ExpectedException.access$200(ExpectedException.java:106)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:245)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at 
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-891) Flake in Spark metrics library?

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-891:
-
Labels: flake  (was: )

> Flake in Spark metrics library?
> ---
>
> Key: BEAM-891
> URL: https://issues.apache.org/jira/browse/BEAM-891
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Dan Halperin
>Assignee: Stas Levin
>Priority: P2
>  Labels: flake
> Fix For: 0.4.0
>
>
> [~staslev] I think you implemented this functionality originally? Want to 
> take a look? CC [~amitsela]
> Run: 
> https://builds.apache.org/job/beam_PostCommit_RunnableOnService_SparkLocal/org.apache.beam$beam-runners-spark/43/testReport/junit/org.apache.beam.sdk.transforms/FilterTest/testFilterGreaterThan/
> Error:
> {code}
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 5
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:169)
>   at 
> org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:77)
>   at 
> org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:53)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:182)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> org.apache.beam.sdk.transforms.FilterTest.testFilterGreaterThan(FilterTest.java:122)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at 
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IndexOutOfBoundsException: 5
>   at 
> scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43)
>   at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47)
>   at 
> scala.collection.IndexedSeqOptimized$class.segmentLength(IndexedSeqOptimized.scala:189)
>   at 
> scala.collection.mutable.ArrayBuffer.segmentLength(ArrayBuffer.scala:47)
>   at 
> scala.collection.IndexedSeqOptimized$class.indexWhere(IndexedSeqOptimized.scala:198)
>   at scala.collection.mutable.ArrayBuffer.indexWhere(ArrayBuffer.scala:47)
>   at scala.collection.GenSeqLike$class.indexOf(GenSeqLike.scala:144)
>   at scala.collection.AbstractSeq.indexOf(Seq.scala:40)
>   at scala.collection.GenSeqLike$class.indexOf(GenSeqLike.scala:128)
>   at scala.collection.AbstractSeq.indexOf(Seq.scala:40)
>   at 
> scala.collection.mutable.BufferLike$class.$minus$eq(BufferLike.scala:126)
>   at scala.collection.mutable.AbstractBuffer.$minus$eq(Buffer.scala:48)
>   at 
> org.apache.spark.metrics.MetricsSystem.removeSource(MetricsSystem.scala:159)
>   at 
> org.apache.beam.runners.spark.translation.SparkRuntimeContext.registerMetrics(SparkRuntimeContext.java:94)
>   at 
> org.apache.beam.runners.spark.translation.SparkRuntimeContext.(SparkRuntimeContext.java:66)
>   at 
> org.apache.beam.runners.spark.translation.EvaluationContext.(EvaluationContext.java:73)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:146)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8878) Possible flake in PortableRunnerTest.test_error_traceback_includes_user_code

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8878:
--
Labels: flake  (was: )

> Possible flake in PortableRunnerTest.test_error_traceback_includes_user_code
> 
>
> Key: BEAM-8878
> URL: https://issues.apache.org/jira/browse/BEAM-8878
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, sdk-py-core
>Reporter: Udi Meiri
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: flake
>
> It's not clear why this failed. Perhaps a flake in the test?
> {code}
> 'second' not found in 'Traceback (most recent call last):\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py",
>  line 640, in test_error_traceback_includes_user_code\np | 
> beam.Create([0]) | beam.Map(first)  # pylint: 
> disable=expression-not-assigned\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 436, in __exit__\nself.run().wait_until_finish()\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/portability/portable_runner.py",
>  line 438, in wait_until_finish\nself._job_id, self._state, 
> self._last_error_message()))\nRuntimeError: Pipeline 
> job-65bf3005-de82-4a1c-8d78-7185c6f627eb failed in state FAILED: unknown 
> error\n'
> {code}
> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2111/testReport/junit/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/
> (may need to reload a few times to get the failed run)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-5925) Test flake in ElasticsearchIOTest.testWriteFullAddressing

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-5925:
--
Labels: flake  (was: )

> Test flake in ElasticsearchIOTest.testWriteFullAddressing
> -
>
> Key: BEAM-5925
> URL: https://issues.apache.org/jira/browse/BEAM-5925
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch
>Reporter: Kenneth Knowles
>Assignee: Wout Scheepers
>Priority: P1
>  Labels: flake
> Fix For: 2.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_GradleBuild/1789/
> https://scans.gradle.com/s/j42mwdsn5svcs
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.IOException: 
> listener timeout after waiting for [3] ms
> {code}
> Log looks like this:
> {code}
> [2018-10-31T04:06:07,571][INFO ][o.a.b.s.i.e.ElasticsearchIOTest] 
> [testWriteFullAddressing]: before test
> [2018-10-31T04:06:07,572][INFO ][o.a.b.s.i.e.ElasticsearchIOTest] 
> [ElasticsearchIOTest#testWriteFullAddressing]: setting up test
> [2018-10-31T04:06:07,589][INFO ][o.e.c.m.MetaDataIndexTemplateService] 
> [node_s0] adding template [random_index_template] for index patterns [*]
> [2018-10-31T04:06:07,645][INFO ][o.a.b.s.i.e.ElasticsearchIOTest] 
> [ElasticsearchIOTest#testWriteFullAddressing]: all set up test
> [2018-10-31T04:06:10,536][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [galilei] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:33,963][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [curie] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:34,034][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [darwin] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:34,050][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [copernicus] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:34,075][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [faraday] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:34,095][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [bohr] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:34,113][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [pasteur] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:34,142][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [einstein] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:34,205][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [maxwell] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:34,226][INFO ][o.e.c.m.MetaDataCreateIndexService] 
> [node_s0] [newton] creating index, cause [auto(bulk api)], templates 
> [random_index_template], shards [6]/[0], mappings []
> [2018-10-31T04:06:36,914][INFO ][o.e.c.r.a.AllocationService] [node_s0] 
> Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards 
> started [[galilei][4], [galilei][5]] ...]).
> [2018-10-31T04:06:36,970][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [galilei/Vn1b8XXVSAmrTb5BVe2IJQ] create_mapping [TYPE_1]
> [2018-10-31T04:06:37,137][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [newton/bjnImLt_QguBGEFH9lBJ6Q] create_mapping [TYPE_-1]
> [2018-10-31T04:06:37,385][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [maxwell/-RZ32NbRRZWaGaVfaptFIA] create_mapping [TYPE_0]
> [2018-10-31T04:06:37,636][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [einstein/2lgF5Vj6Ti2KTS-pYSzv3Q] create_mapping [TYPE_1]
> [2018-10-31T04:06:37,806][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [pasteur/832OwzleRSOHsWx85vOH-w] create_mapping [TYPE_0]
> [2018-10-31T04:06:38,103][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [bohr/9YTwB1yvTYKf9YjYCmHjwg] create_mapping [TYPE_1]
> [2018-10-31T04:06:38,229][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [faraday/vIMYG8vpTQKqNkyajcFOxw] create_mapping [TYPE_0]
> [2018-10-31T04:06:38,576][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [copernicus/NzCZssInSiOdZKTmLCoXRw] create_mapping [TYPE_1]
> [2018-10-31T04:06:38,890][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] 
> [darwin/g_sIfS5aQwi6

[jira] [Updated] (BEAM-10034) ZetaSQLCalcRel should assert types match

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10034:
---
Status: Open  (was: Triage Needed)

> ZetaSQLCalcRel should assert types match
> 
>
> Key: BEAM-10034
> URL: https://issues.apache.org/jira/browse/BEAM-10034
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: P2
>
> We should use PreparedExpression.getOutputType() in setup to validate the 
> output type matches what we expect, there is a class of runtime bugs this 
> would fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10035) Pandas Dataframes API

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10035:
---
Status: Open  (was: Triage Needed)

> Pandas Dataframes API
> -
>
> Key: BEAM-10035
> URL: https://issues.apache.org/jira/browse/BEAM-10035
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P1
>
> This is an umbrella bug for the work towards 
> https://s.apache.org/beam-dataframes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10036) More flexible dataframes partitioning.

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10036:
---
Status: Open  (was: Triage Needed)

> More flexible dataframes partitioning.
> --
>
> Key: BEAM-10036
> URL: https://issues.apache.org/jira/browse/BEAM-10036
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>
> Currently we only track a boolean of whether a dataframe is partitioned by 
> the (full) index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10037) BeamSqlExample.java fails to build when running ./gradlew command

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10037:
---
Status: Open  (was: Triage Needed)

> BeamSqlExample.java fails to build when running ./gradlew command
> -
>
> Key: BEAM-10037
> URL: https://issues.apache.org/jira/browse/BEAM-10037
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Omar Ismail
>Assignee: Omar Ismail
>Priority: P3
>
> In the `BeamSqlExample.java` class, the instructions state that to run the 
> example, use: 
> `./gradlew :sdks:java:extensions:sql:runBasicExample`. 
> I tried this and the build failed due to `java.lang.IllegalStateException: 
> Unable to return a default Coder`
>  
> I will try to fix this!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10038) Add script to mass-comment Jenkins triggers on PR

2020-05-19 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10038:
---
Status: Open  (was: Triage Needed)

> Add script to mass-comment Jenkins triggers on PR
> -
>
> Key: BEAM-10038
> URL: https://issues.apache.org/jira/browse/BEAM-10038
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> This is a work in progress, it just needs to be touched up and added to the 
> Beam repo:
> https://gist.github.com/Ardagan/13e6031e8d1c9ebbd3029bf365c1a517



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10048) Remove "manual steps" from release guide.

2020-05-20 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10048:
---
Status: Open  (was: Triage Needed)

> Remove "manual steps" from release guide.
> -
>
> Key: BEAM-10048
> URL: https://issues.apache.org/jira/browse/BEAM-10048
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> release-guide.md contains most of the same instructions as 
> build_release_candidate.sh ("(Alternative) Run all steps manually"). This is 
> not ideal:
> - Mirroring the instructions in release-guide.md doesn't add any value.
> - Every single change to the process requires two identical changes to each 
> file, and this makes it unnecessarily difficult to keep the two in sync.
> - All the extra instructions make release-guide.md harder to read, obscuring 
> information that the release manager actually does need to know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10054) Direct Runner execution stalls with test pipeline

2020-05-21 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10054:
---
Status: Open  (was: Triage Needed)

> Direct Runner execution stalls with test pipeline
> -
>
> Key: BEAM-10054
> URL: https://issues.apache.org/jira/browse/BEAM-10054
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>
> Internally, we have a test pipeline which runs with the DirectRunner. When 
> upgrading from 2.18.0 to 2.21.0 the test failed with the following exception:
> {noformat}
> tp = Exception('Monitor task detected a pipeline stall.',), value = None, tb 
> = None
> def raise_(tp, value=None, tb=None):
> """
> A function that matches the Python 2.x ``raise`` statement. This
> allows re-raising exceptions with the cls value and traceback on
> Python 2 and 3.
> """
> if value is not None and isinstance(tp, Exception):
> raise TypeError("instance exception may not have a separate 
> value")
> if value is not None:
> exc = tp(value)
> else:
> exc = tp
> if exc.__traceback__ is not tb:
> raise exc.with_traceback(tb)
> >   raise exc
> E   Exception: Monitor task detected a pipeline stall.
> {noformat}
> I was able to bisect the error. This commit introduced the failure: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731
> If the following conditions evaluates to False, the pipeline runs correctly: 
> https://github.com/apache/beam/commit/ea9b1f350b88c2996cafb4d24351869e82857731#diff-2bb845e226f3a97c0f0f737d0558c5dbR1273



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10056) Side Input Validation too tight, doesn't allow CoGBK

2020-05-21 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10056:
---
Status: Open  (was: Triage Needed)

> Side Input Validation too tight, doesn't allow CoGBK
> 
>
> Key: BEAM-10056
> URL: https://issues.apache.org/jira/browse/BEAM-10056
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P1
>
> The following doesn't pass validation, though it should as it's a valid 
> signature for ParDo accepting a PCollection *clientHistory>>
> func (fn *writer) StartBundle(ctx context.Context) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> iter1, iter2 func(**clientHistory) bool)
> func (fn *writer) FinishBundle(ctx context.Context)
> It returns an error:
> Missing side inputs in the StartBundle method of a DoFn. If side inputs are 
> present in ProcessElement those side inputs must also be present in 
> StartBundle.
> Full error:
> inserting ParDo in scope root:
> graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
> side inputs expected in method StartBundle [recovered]
> panic: Missing side inputs in the StartBundle method of a DoFn. If 
> side inputs are present in ProcessElement those side inputs must also be 
> present in StartBundle.
> Full error:
> inserting ParDo in scope root:
> graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
> side inputs expected in method StartBundle
> This is happening in the input unaware validation, which means it needs to be 
> loosened, and validated elsewhere.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527
> There are "sibling" cases for the DoFn  signature
> func (fn *writer) StartBundle(context.Context, side func(**clientHistory) 
> bool) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> iter, side func(**clientHistory) bool)
> func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) 
> bool)
> and
> func (fn *writer) StartBundle(context.Context, side1, side2 
> func(**clientHistory) bool) error
> func (fn *writer) ProcessElement(
> ctx context.Context,
> key string,
> side1, side2 func(**clientHistory) bool)
> func (fn *writer) FinishBundle( context.Context, side1, side2 
> func(**clientHistory) bool)
> Would be for  > with <*clientHistory> on the 
> side, and
>   with <*clientHistory> and <*clientHistory> on the side 
> respectively.
> Which would only be determinable fully with the input, and should provide a 
> clear error when PCollection binding is occuring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10015) output timestamp not properly propagated through the Dataflow runner

2020-05-22 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114517#comment-17114517
 ] 

Kenneth Knowles commented on BEAM-10015:


[~reuvenlax] the typical protocol I use is to put Fix Version = 2.22 on this 
bug and make a separate Jira for the cherrypick and put Fix Version = 2.21 and 
do not resolve until it is on the 2.21 branch so it is sure to be released.

> output timestamp not properly propagated through the Dataflow runner
> 
>
> Key: BEAM-10015
> URL: https://issues.apache.org/jira/browse/BEAM-10015
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: P1
> Fix For: 2.21.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Dataflow runner does not propagate the output timestamp into timer firing, 
> resulting in incorrect default timestamps when outputting from a processTimer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10072) RequiesTimeSortedInput not working for DnFs without state

2020-05-25 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10072:
---
Status: Open  (was: Triage Needed)

> RequiesTimeSortedInput not working for DnFs without state
> -
>
> Key: BEAM-10072
> URL: https://issues.apache.org/jira/browse/BEAM-10072
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: P2
> Fix For: 2.22.0
>
>
> When DoFn annotated with `@RequiresTimeSortedInput` doesn't have a 
> `StateSpec`, the ordering might break.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10073) Add dashboard for pubsub performance tests to grafana

2020-05-25 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10073:
---
Status: Open  (was: Triage Needed)

> Add dashboard for pubsub performance tests to grafana
> -
>
> Key: BEAM-10073
> URL: https://issues.apache.org/jira/browse/BEAM-10073
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: P1
> Fix For: Not applicable
>
>
> Pubsub IO performance tests are not published to influx and not displayed in 
> grafana dashboards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10074) Hash Functions in BeamSQL

2020-05-25 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10074:
---
Status: Open  (was: Triage Needed)

> Hash Functions in BeamSQL
> -
>
> Key: BEAM-10074
> URL: https://issues.apache.org/jira/browse/BEAM-10074
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>
> I would like to propose hash functions: 
> # MD5
> Calculates an MD5 128-bit checksum of string or bytes and returns it as a hex 
> string
> _Note:_ calcite has a function with String as input but not for bytes as 
> input.
> {code:java}
> SELECT MD5("Some String") as md5;
> {code}
> # SHA1
> Calculates a SHA-1 hash value of string and returns it as a hex string.
> _Note:_ calcite has a function with String as input but not for bytes as 
> input.
> {code:java}
> SELECT SHA1("Some String") as sha1;
> {code}
> # SHA256
> Calculates a SHA-256 hash value of string and returns it as a hex string.
> {code:java}
> SELECT SHA256("Some String") as sha256;
> {code}
> # SHA512
> Calculates a SHA-512 hash value of string and returns it as a hex string.
> {code:java}
> SELECT SHA512("Some String") as sha512;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10075) Allow users to tune the grouping table size in batch dataflow pipelines

2020-05-25 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10075:
---
Status: Open  (was: Triage Needed)

> Allow users to tune the grouping table size in batch dataflow pipelines
> ---
>
> Key: BEAM-10075
> URL: https://issues.apache.org/jira/browse/BEAM-10075
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
>
> The dataflow worker hard-codes the grouping table size to 100 MB.  We should 
> allow users to specify this as a pipeline parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10076) Dataflow worker status page incorrectly displays work item statuses

2020-05-25 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10076:
---
Status: Open  (was: Triage Needed)

> Dataflow worker status page incorrectly displays work item statuses
> ---
>
> Key: BEAM-10076
> URL: https://issues.apache.org/jira/browse/BEAM-10076
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: P2
> Fix For: 2.23.0
>
> Attachments: image-2020-05-25-17-13-49-465.png
>
>
> The work item status page incorrectly renders its table due to an incorrectly 
> placed  tag.
>  (see attached screenshot)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10077) using filename + hash instead of UUID for staging name

2020-05-25 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10077:
---
Status: Open  (was: Triage Needed)

> using filename + hash instead of UUID for staging name
> --
>
> Key: BEAM-10077
> URL: https://issues.apache.org/jira/browse/BEAM-10077
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>
> change staging name from uuid to filename + hash so we can avoid re-uploading 
> same artifact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows

2020-05-26 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116890#comment-17116890
 ] 

Kenneth Knowles commented on BEAM-10005:


Thanks for checking.

I think the simplest solution was to always exposed the {{combineFn}} method 
like here: 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Count.java#L53

> Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when 
> inputs not windowed by GlobalWindows
> 
>
> Key: BEAM-10005
> URL: https://issues.apache.org/jira/browse/BEAM-10005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>
> Unable to use ApproximateQuantiles.globally or ApproximateUnique.globally 
> with input windowed not using GlobalWindows.
> To make it run we need to set either 
> {code:java}
> .withoutDefaults()
> {code}
> or
> {code:java}
> .asSingletonView()
> {code}
> Currently we can't call any of the above on 
> ApproximateQuantiles.globally()/ApproximateUnique.globally as it does not 
> return underlying Combine.globally, but PTransform or Globally in case of 
> ApproximateUnique.
> Example failing case:
> {code:java}
> PCollection elements = p.apply(GenerateSequence.from(0).to(100)
>   .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));
>   PCollection> input = elements
>   
> .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
>   .apply(ApproximateQuantiles.globally(17));
> {code}
> It throws expected error from internal Combine.globally() transform:
> {code:java}
> Default values are not supported in Combine.globally() if the input 
> PCollection is not windowed by GlobalWindows. Instead, use 
> Combine.globally().withoutDefaults() to output an empty PCollection if the 
> input PCollection is empty, or Combine.globally().asSingletonView() to get 
> the default output of the CombineFn if the input PCollection is empty.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-05-26 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10093:
---
Status: Open  (was: Triage Needed)

> Add ZetaSQL Nexmark variant
> ---
>
> Key: BEAM-10093
> URL: https://issues.apache.org/jira/browse/BEAM-10093
> Project: Beam
>  Issue Type: New Feature
>  Components: testing-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: P2
>
> Most queries will be identical, but best to simply stay decoupled, so this is 
> a copy/paste/modify job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-05-26 Thread Kenneth Knowles (Jira)

Kenneth Knowles created BEAM-10093:
--

 Summary: Add ZetaSQL Nexmark variant
 Key: BEAM-10093
 URL: https://issues.apache.org/jira/browse/BEAM-10093
 Project: Beam
  Issue Type: New Feature
  Components: testing-nexmark
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


Most queries will be identical, but best to simply stay decoupled, so this is a 
copy/paste/modify job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10095) Add hyperlinks to the beam-overview page.

2020-05-26 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10095:
---
Status: Open  (was: Triage Needed)

> Add hyperlinks to the beam-overview page.
> -
>
> Key: BEAM-10095
> URL: https://issues.apache.org/jira/browse/BEAM-10095
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P3
>
> - Java, Python, and Go should be hyperlinked to respective quickstart guides.
> - Runners listed should be hyperlinked to respective runner pages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10096) Spark runners are numbered 1,2,2

2020-05-26 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10096:
---
Status: Open  (was: Triage Needed)

> Spark runners are numbered 1,2,2
> 
>
> Key: BEAM-10096
> URL: https://issues.apache.org/jira/browse/BEAM-10096
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P4
>
> https://beam.apache.org/documentation/runners/spark/
> 1. A legacy Runner...
> 2. An Structured Streaming Spark Runner...
> 2. A portable Runner...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-05-26 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10097:
---
Status: Open  (was: Triage Needed)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10102) Fix query in python pubsub IO streaming performance tests dashboards

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10102:
---
Status: Open  (was: Triage Needed)

> Fix query in python pubsub IO streaming performance tests dashboards
> 
>
> Key: BEAM-10102
> URL: https://issues.apache.org/jira/browse/BEAM-10102
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: P1
> Fix For: Not applicable
>
>
> There is
> {color:#cc7832}\"{color}{color:#6a8759}metric{color}{color:#cc7832}\"{color}{color:#6a8759}
>  = \"pubsub_io_perf_read_runtime\"{color}
> instead of
> {color:#cc7832}\"{color}{color:#6a8759}metric{color}{color:#cc7832}\"{color}{color:#6a8759}
>  = 'pubsub_io_perf_read_runtime'{color}
> {color:#6a8759}in grafana dashboard json{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10106) Script the deployment of artifacts to pypi

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10106:
---
Status: Open  (was: Triage Needed)

> Script the deployment of artifacts to pypi
> --
>
> Key: BEAM-10106
> URL: https://issues.apache.org/jira/browse/BEAM-10106
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> Right now there's only manual instructions, which are tedious and error-prone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10107) beam website PR listed twice in release guide with contradictory instructions

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10107:
---
Status: Open  (was: Triage Needed)

> beam website PR listed twice in release guide with contradictory instructions
> -
>
> Key: BEAM-10107
> URL: https://issues.apache.org/jira/browse/BEAM-10107
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> The Beam website update PR is mentioned twice, once in 5. with the new 
> instructions and again in 6. with the old instructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10108) publish_docker_images.sh has out of date Flink versions

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10108:
---
Status: Open  (was: Triage Needed)

> publish_docker_images.sh has out of date Flink versions
> ---
>
> Key: BEAM-10108
> URL: https://issues.apache.org/jira/browse/BEAM-10108
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> Is 1.7, 1.8, 1.9. Should be 1.8, 1.9, 1.10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10109) Fix context classloader in Spark portable runner

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10109:
---
Status: Open  (was: Triage Needed)

> Fix context classloader in Spark portable runner
> 
>
> Key: BEAM-10109
> URL: https://issues.apache.org/jira/browse/BEAM-10109
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark
>
> Spark is setting the context class loader to support dynamic class loading, 
> leading to unpredictable behavior with duplicate jars being found on the 
> class path. We need to see if there is a way to disable this behavior so we 
> can use the context class loader deterministically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10110) Populate pipeline_proto_coder_id field for dataflow.

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10110:
---
Status: Open  (was: Triage Needed)

> Populate pipeline_proto_coder_id field for dataflow.
> 
>
> Key: BEAM-10110
> URL: https://issues.apache.org/jira/browse/BEAM-10110
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow, sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>
> Dataflow isn't natively translating from the Beam Pipeline Proto yet, but 
> requires SDKs to translate the graph into it's own format. Adding this hint 
> for custom coders (Coders  not known to Dataflow/Beam) avoids having dataflow 
> re-synthesize coders from it's format, back to the pipeline proto.
> Currently there's the awkward restriction on which coders should receive the 
> ID, rather than having the SDK apply the field to all of them, but this is a 
> good first step to get there. This restriction may be lifted on a subsequent 
> dataflow release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10111) Create methods in fileio to read from / write to archive files

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10111:
---
Status: Open  (was: Triage Needed)

> Create methods in fileio to read from / write to archive files
> --
>
> Key: BEAM-10111
> URL: https://issues.apache.org/jira/browse/BEAM-10111
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-files
>Reporter: Ashwin Ramaswami
>Assignee: Ashwin Ramaswami
>Priority: P2
>
> It would be good to be able to read from / write to archive files (.zip, 
> .tar) using fileio. The difference between this proposal and what we already 
> have with CompressionTypes is that this would allow converting one file -> 
> multiple files and vice versa. Here's how it might look like:
> *Reading all contents from archive files:*
> {code:python}
> files = (
> p
> | fileio.MatchFiles('hdfs://path/to/*.zip')
> | fileio.Extract()
> | fileio.MatchAll()
> | fileio.ReadMatches()
> | beam.Map(lambda x: (x.metadata.path, 
> x.metadata._parent_archive_paths, x.read_utf8()))
> )
> {code}
> `._parent_archive_paths` will then be equal to an array with the path of the 
> parent zip file (it's an array because we could conceivably nest this by 
> looking for archives within archives)
> *Nested archive example:* (look for "`*`" inside of "`*.tar`" inside of 
> "`*.zip`")
> {code:python}
> files = (
> p
> | fileio.MatchFiles('hdfs://path/to/*.zip')
> | fileio.Extract()
> | fileio.MatchAll('*.tar')
> | fileio.Extract()
> | fileio.MatchAll() # gets all entries
> | fileio.ReadMatches()
> | beam.Map(lambda x: (x.metadata.path, x.read_utf8()))
> )
> {code}
> Note that in this case, this would involve modifying MatchAll() to take an 
> argument, which would filter the files in the pcollection in the earlier 
> stage of the pipeline.
> *Reading from archive files and explicitly specifying the archive type (when 
> it can't be inferred by the file extension):*
> {code:python}
> files = (
> p
> | fileio.MatchFiles('hdfs://path/to/archive')
> | fileio.Extract(, archivesystem=ArchiveSystem.TAR)
> | fileio.MatchAll(archive_path='*.txt')
> | fileio.ReadMatches()
> | beam.Map(lambda x: (x.metadata.path, x.read_utf8()))
> )
> {code}
> `ArchiveSystem` would be a generic class, just like `FileSystem`, which would 
> allow for different implementations of methods such as `list()` and 
> `extract()`. It would be implemented for .zip, .tar, etc.
> *Writing multiple files to an archive file:*
> {code:python}
> files = (
> p
> | fileio.MatchFiles('hdfs://path/to/files/*.txt')
> | fileio.ReadMatches()
> | fileio.Compress(archivesystem=ArchiveSystem.ZIP)
> | textio.WriteToText("output.zip")
> )
> {code}
> *Writing to a .tar.gz file:*
> {code:python}
> files = (
> p
> | fileio.MatchFiles('hdfs://path/to/files/*.txt')
> | fileio.ReadMatches()
> | fileio.Compress(archivesystem=ArchiveSystem.TAR)
> | textio.WriteToText("output.tar.gz")
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10089) Apex Runner tests failing [Java 11]

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10089:
---
Status: Open  (was: Triage Needed)

> Apex Runner tests failing [Java 11]
> ---
>
> Key: BEAM-10089
> URL: https://issues.apache.org/jira/browse/BEAM-10089
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-apex
>Reporter: Pawel Pasterz
>Priority: P2
>  Labels: beam-fixit
>
> Gradle task _*:runners:apex:test*_ fails during Java 11 Precommit job
> Example stack trace:
> {code:java}
> org.apache.beam.runners.apex.ApexYarnLauncherTest > 
> testGetYarnDeployDependencies FAILED
> java.lang.ClassCastException: class 
> jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
> java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
> java.net.URLClassLoader are in module java.base of loader 'bootstrap')
> at 
> org.apache.beam.runners.apex.ApexYarnLauncher.getYarnDeployDependencies(ApexYarnLauncher.java:222)
> at 
> org.apache.beam.runners.apex.ApexYarnLauncherTest.testGetYarnDeployDependencies(ApexYarnLauncherTest.java:56)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-27 Thread Kenneth Knowles (Jira)

Kenneth Knowles created BEAM-10115:
--

 Summary: Staging requirements.txt fails but staging setup.py 
succeeds
 Key: BEAM-10115
 URL: https://issues.apache.org/jira/browse/BEAM-10115
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-py-core
Reporter: Kenneth Knowles


User reports on StackOverflow: 
https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python

The issue appears to be a problem with staging, and a difference between using 
`requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-27 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117955#comment-17117955
 ] 

Kenneth Knowles commented on BEAM-10115:


The code paths are different. See 
https://github.com/apache/beam/blob/80a2ff66c4ce403ae183bcaa59bbd52959d2b3da/sdks/python/apache_beam/runners/portability/stager.py#L28

The requirements.txt is staged directly. The setup.py file is used to build an 
sdist and the tarball is staged.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10116) Implement grouping on non-merging unknown windowing strategies.

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10116:
---
Status: Open  (was: Triage Needed)

> Implement grouping on non-merging unknown windowing strategies.
> ---
>
> Key: BEAM-10116
> URL: https://issues.apache.org/jira/browse/BEAM-10116
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-27 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117959#comment-17117959
 ] 

Kenneth Knowles commented on BEAM-10115:


[~heejong] [~goenka] do either of you have an idea about this? The underlying 
problem seems to be an invalid redirect served from GCS. But can we reproduce 
this in an integration test to see what causes the redirect?

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10118) Unconditionally use safe coders for data channels in FnAPI runner.

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10118:
---
Status: Open  (was: Triage Needed)

> Unconditionally use safe coders for data channels in FnAPI runner.
> --
>
> Key: BEAM-10118
> URL: https://issues.apache.org/jira/browse/BEAM-10118
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10121) Python RowCoder doesn't support nested structs

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10121:
---
Status: Open  (was: Triage Needed)

> Python RowCoder doesn't support nested structs
> --
>
> Key: BEAM-10121
> URL: https://issues.apache.org/jira/browse/BEAM-10121
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10122) Python RowCoder throws NotImplementedError in DataflowRunner

2020-05-27 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10122:
---
Status: Open  (was: Triage Needed)

> Python RowCoder throws NotImplementedError in DataflowRunner
> 
>
> Key: BEAM-10122
> URL: https://issues.apache.org/jira/browse/BEAM-10122
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>
> ... because it overrides as_cloud_object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10132) Remove reference to apachebeam/*

2020-05-28 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10132:
---
Status: Open  (was: Triage Needed)

> Remove reference to apachebeam/*
> 
>
> Key: BEAM-10132
> URL: https://issues.apache.org/jira/browse/BEAM-10132
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> Flink runner page includes outdated reference to the old Docker hub repo 
> (apachebeam/flink1.9_job_server:latest)
> https://beam.apache.org/documentation/runners/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-05-28 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10115:
---
Status: Open  (was: Triage Needed)

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10158) [Python] Reuse a shared unbounded thread pool

2020-05-29 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10158:
---
Status: Open  (was: Triage Needed)

> [Python] Reuse a shared unbounded thread pool
> -
>
> Key: BEAM-10158
> URL: https://issues.apache.org/jira/browse/BEAM-10158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>
> During testing we create a lot of thread pools many of which we don't 
> shutdown which can lead to thread exhaustion on some machiens.
>  
> Swapping to use a shared thread pool will decrease the memory overhead for 
> these unused threads and allow for greater reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9843) Flink UnboundedSourceWrapperTest flaky due to a timeout

2020-05-29 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120064#comment-17120064
 ] 

Kenneth Knowles commented on BEAM-9843:
---

[~mxm] timed out in 
https://builds.apache.org/job/beam_PreCommit_Java_Commit/11590/ which was a 
couple weeks after your comment. I cannot say if the root cause is related.

> Flink UnboundedSourceWrapperTest flaky due to a timeout
> ---
>
> Key: BEAM-9843
> URL: https://issues.apache.org/jira/browse/BEAM-9843
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: Not applicable
>
>
> For example,
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2684/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2682/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2680/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/testReport/junit/org.apache.beam.runners.flink.translation.wrappers.streaming.io/UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest/testWatermarkEmission_numTasks___4__numSplits_4_/]
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission(UnboundedSourceWrapperTest.java:354)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (BEAM-9843) Flink UnboundedSourceWrapperTest flaky due to a timeout

2020-05-29 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reopened BEAM-9843:
---
  Assignee: Maximilian Michels

> Flink UnboundedSourceWrapperTest flaky due to a timeout
> ---
>
> Key: BEAM-9843
> URL: https://issues.apache.org/jira/browse/BEAM-9843
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: Not applicable
>
>
> For example,
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2684/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2682/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2680/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/testReport/junit/org.apache.beam.runners.flink.translation.wrappers.streaming.io/UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest/testWatermarkEmission_numTasks___4__numSplits_4_/]
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission(UnboundedSourceWrapperTest.java:354)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9843) Flink UnboundedSourceWrapperTest flaky due to a timeout

2020-05-29 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9843:
--
Priority: P1  (was: P2)

> Flink UnboundedSourceWrapperTest flaky due to a timeout
> ---
>
> Key: BEAM-9843
> URL: https://issues.apache.org/jira/browse/BEAM-9843
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Maximilian Michels
>Priority: P1
>  Labels: flake
> Fix For: Not applicable
>
>
> For example,
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2684/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2682/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2680/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/testReport/junit/org.apache.beam.runners.flink.translation.wrappers.streaming.io/UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest/testWatermarkEmission_numTasks___4__numSplits_4_/]
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission(UnboundedSourceWrapperTest.java:354)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9843) Flink UnboundedSourceWrapperTest flaky due to a timeout

2020-05-29 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120079#comment-17120079
 ] 

Kenneth Knowles commented on BEAM-9843:
---

I'm reopening just because it is recent enough you probably have context on 
this.

> Flink UnboundedSourceWrapperTest flaky due to a timeout
> ---
>
> Key: BEAM-9843
> URL: https://issues.apache.org/jira/browse/BEAM-9843
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: Not applicable
>
>
> For example,
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2684/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2682/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2680/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/testReport/junit/org.apache.beam.runners.flink.translation.wrappers.streaming.io/UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest/testWatermarkEmission_numTasks___4__numSplits_4_/]
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission(UnboundedSourceWrapperTest.java:354)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9843) Flink UnboundedSourceWrapperTest flaky due to a timeout

2020-05-29 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9843:
--
Labels: flake  (was: )

> Flink UnboundedSourceWrapperTest flaky due to a timeout
> ---
>
> Key: BEAM-9843
> URL: https://issues.apache.org/jira/browse/BEAM-9843
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Maximilian Michels
>Priority: P2
>  Labels: flake
> Fix For: Not applicable
>
>
> For example,
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2684/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2682/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2680/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/testReport/junit/org.apache.beam.runners.flink.translation.wrappers.streaming.io/UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest/testWatermarkEmission_numTasks___4__numSplits_4_/]
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission(UnboundedSourceWrapperTest.java:354)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9843) Flink UnboundedSourceWrapperTest flaky due to a timeout

2020-05-29 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120081#comment-17120081
 ] 

Kenneth Knowles commented on BEAM-9843:
---

And marking P1 due to flaky test policy.

> Flink UnboundedSourceWrapperTest flaky due to a timeout
> ---
>
> Key: BEAM-9843
> URL: https://issues.apache.org/jira/browse/BEAM-9843
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Maximilian Michels
>Priority: P1
>  Labels: flake
> Fix For: Not applicable
>
>
> For example,
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2684/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2682/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2680/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/2685/testReport/junit/org.apache.beam.runners.flink.translation.wrappers.streaming.io/UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest/testWatermarkEmission_numTasks___4__numSplits_4_/]
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds at sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission(UnboundedSourceWrapperTest.java:354)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9239) Dependency conflict with Spark using aws io

2020-05-29 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120088#comment-17120088
 ] 

Kenneth Knowles commented on BEAM-9239:
---

Yea the spark experimental configs for user classpath first was what I meant in 
my first comment. I think last time I tried it the problem was only some of my 
classpath was compatible.

> Dependency conflict with Spark using aws io
> ---
>
> Key: BEAM-9239
> URL: https://issues.apache.org/jira/browse/BEAM-9239
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws, runner-spark
>Affects Versions: 2.17.0
>Reporter: David McIntosh
>Priority: P1
>
> Starting with beam 2.17.0 I get this error in the Spark 2.4.4 driver when aws 
> io is also used:
> {noformat}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.databind.jsontype.TypeSerializer.typeId(Ljava/lang/Object;Lcom/fasterxml/jackson/core/JsonToken;)Lcom/fasterxml/jackson/core/type/WritableTypeId;
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:163)
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:134)
>   at 
> com.fasterxml.jackson.databind.ser.impl.TypeWrappedSerializer.serialize(TypeWrappedSerializer.java:32)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:721)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:647)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:635)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.serializeToJson(SerializablePipelineOptions.java:67)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.(SerializablePipelineOptions.java:43)
>   at 
> org.apache.beam.runners.spark.translation.EvaluationContext.(EvaluationContext.java:71)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:215)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:90)
> {noformat}
> The cause seems to be that the Spark driver environment uses an older version 
> of Jackson. I tried to update jackson on the Spark cluster but that led to 
> several other errors. 
> The change that started causing this was:
> https://github.com/apache/beam/commit/b68d70a47b68ad84efcd9405c1799002739bd116
> After reverting that change I was able to successfully run my job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-3288) Guard against unsafe triggers at construction time

2020-05-29 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963195#comment-16963195
 ] 

Kenneth Knowles edited comment on BEAM-3288 at 5/30/20, 4:35 AM:
-

I believe it is a fundamental design flaw to have triggers able to drop data. I 
can't find the discussion now but I believe this has consensus. The trigger 
should regulate the flow rate but not alter the semantic answer. When a trigger 
claims to be "done", as a backstop we should still emit the final pane.


was (Author: kenn):
I believe it is a fundamental design but to have triggers able to drop data. I 
can't find the discussion now but I believe this has consensus. The trigger 
should regulate the flow rate but not alter the semantic answer. When a trigger 
claims to be "done", as a backstop we should still emit the final pane.

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: P2
> Fix For: 2.18.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10162) Add python pubsubIO performance tests to readme

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10162:
---
Status: Open  (was: Triage Needed)

> Add python pubsubIO performance tests to readme
> ---
>
> Key: BEAM-10162
> URL: https://issues.apache.org/jira/browse/BEAM-10162
> Project: Beam
>  Issue Type: Task
>  Components: benchmarking-py, sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: P2
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10163) Change pubsub and bq performance tests jobs name

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10163:
---
Status: Open  (was: Triage Needed)

> Change pubsub and bq performance tests jobs name
> 
>
> Key: BEAM-10163
> URL: https://issues.apache.org/jira/browse/BEAM-10163
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: P2
> Fix For: Not applicable
>
>
> Job names of those tests doesn't fit to the others



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10164) Flink: Memory efficient combine implementation for batch runner

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10164:
---
Status: Open  (was: Triage Needed)

> Flink: Memory efficient combine implementation for batch runner
> ---
>
> Key: BEAM-10164
> URL: https://issues.apache.org/jira/browse/BEAM-10164
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: P2
>
> Current Combine implementation assumes that all input values for a single key 
> (on both map and reduce side) fit in memory as it needs to sort them by 
> window before combining.
> We can easily optimize this for non-merging windows by pre-grouping elements 
> by (K, W) tuples.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10167) Fix 2.21.0 downloads link in blog post

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10167:
---
Status: Open  (was: Triage Needed)

> Fix 2.21.0 downloads link in blog post
> --
>
> Key: BEAM-10167
> URL: https://issues.apache.org/jira/browse/BEAM-10167
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P4
>
> Right now it goes to 
> [https://beam.apache.org/get-started/downloads/#-], which is a valid 
> URL, but not exactly the one we want.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10168) Add Github "publish release" to release guide

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10168:
---
Status: Open  (was: Triage Needed)

> Add Github "publish release" to release guide
> -
>
> Key: BEAM-10168
> URL: https://issues.apache.org/jira/browse/BEAM-10168
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> Github does not recognize tags as full-fledged releases unless they are 
> published through the Github API/UI. We need to add this step to the release 
> guide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121391#comment-17121391
 ] 

Kenneth Knowles edited comment on BEAM-10115 at 6/1/20, 11:05 PM:
--

I looked around a bit more and found 
https://stackoverflow.com/questions/59815620/gcloud-upload-httplib2-redirectmissinglocation-redirected-but-the-response-is-m
 which points to 
https://github.com/googleapis/google-api-python-client/issues/803. The issue is 
an incompatibility with an httplib2 version 0.16.0. Pinning is a workaround. I 
have not confirmed this is the Beam problem.


was (Author: kenn):
I looked around a bit more and found 
https://stackoverflow.com/questions/59815620/gcloud-upload-httplib2-redirectmissinglocation-redirected-but-the-response-is-m
 which points to 
https://github.com/googleapis/google-api-python-client/issues/803. The issue is 
an incompatibility with an httplib2 version. Pinning is a workaround. I have 
not confirmed this is the Beam problem.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121391#comment-17121391
 ] 

Kenneth Knowles commented on BEAM-10115:


I looked around a bit more and found 
https://stackoverflow.com/questions/59815620/gcloud-upload-httplib2-redirectmissinglocation-redirected-but-the-response-is-m
 which points to 
https://github.com/googleapis/google-api-python-client/issues/803. The issue is 
an incompatibility with an httplib2 version. Pinning is a workaround. I have 
not confirmed this is the Beam problem.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-758) Per-step, per-execution nonce

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-758:
-
Labels: stale-assigned  (was: )

> Per-step, per-execution nonce
> -
>
> Key: BEAM-758
> URL: https://issues.apache.org/jira/browse/BEAM-758
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: Not applicable
>Reporter: Dan Halperin
>Assignee: Sam McVeety
>Priority: P2
>  Labels: stale-assigned
>
> In the forthcoming runner API, a user will be able to save a pipeline to JSON 
> and then run it repeatedly.
> Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random 
> value (nonce). These values are typically generated at apply time, so that 
> they are deterministic (don't change across retries of DoFns) and global (are 
> the same across all workers).
> However, once the runner API lands the existing code would result in the same 
> nonce being reused across jobs. Other possible solutions:
> * Generate nonce in {{Create(1) | ParDo}} then use this as a side input. 
> Should work, as along as side inputs are actually checkpointed. But does not 
> work for {{BoundedSource}}.
> * If a nonce is only needed for the lifetime of one bundle, can be generated 
> in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}].
> * Add some context somewhere that lets user code access unique step name, and 
> somehow generate a nonce consistently e.g. by hashing. Will usually work, but 
> this is similarly not available to sources.
> Another Q: I'm not sure we have a good way to generate nonces in unbounded 
> pipelines -- we probably need one. This would enable us to, e.g., use 
> {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once 
> triggering per window]. Or generalizing to multiple firings...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9780) Add a DICOM IO Connector for Google Cloud Healthcare API

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121551#comment-17121551
 ] 

Kenneth Knowles commented on BEAM-9780:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Add a DICOM IO Connector for Google Cloud Healthcare API
> 
>
> Key: BEAM-9780
> URL: https://issues.apache.org/jira/browse/BEAM-9780
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: P3
>  Labels: stale-assigned
>
> Add IO Transforms for the DICOM store in the [Google Cloud Healthcare 
> API|https://cloud.google.com/healthcare/docs/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9709) timezone off by 8 hours

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121572#comment-17121572
 ] 

Kenneth Knowles commented on BEAM-9709:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> timezone off by 8 hours
> ---
>
> Key: BEAM-9709
> URL: https://issues.apache.org/jira/browse/BEAM-9709
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Robin Qiu
>Priority: P4
>  Labels: stale-assigned, zetasql-compliance
>
> two failures in shard 13, one failure in shard 19
> {code}
> Expected: ARRAY>[{2014-01-31 00:00:00+00}]
>   Actual: ARRAY>[{2014-01-31 08:00:00+00}], 
> {code}
> {code}
> select timestamp(date '2014-01-31')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9689) Update wordcount webpage with Spark/Flink + Go

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9689:
--
Labels: stale-assigned  (was: )

> Update wordcount webpage with Spark/Flink + Go
> --
>
> Key: BEAM-9689
> URL: https://issues.apache.org/jira/browse/BEAM-9689
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P3
>  Labels: stale-assigned
>
> Currently says "This runner is not yet available for the Go SDK." which is no 
> longer true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9643) Add user-facing Go SDF documentation.

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9643:
--
Labels: stale-assigned  (was: )

> Add user-facing Go SDF documentation.
> -
>
> Key: BEAM-9643
> URL: https://issues.apache.org/jira/browse/BEAM-9643
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This means adding the documentation about how to use SDFs and the contracts 
> of all the SDF methods to the Go SDK code, as well as updating the Go SDF 
> design doc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9546) Support for batching a schema-aware PCollection and processing as a Dataframe

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121626#comment-17121626
 ] 

Kenneth Knowles commented on BEAM-9546:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Support for batching a schema-aware PCollection and processing as a Dataframe
> -
>
> Key: BEAM-9546
> URL: https://issues.apache.org/jira/browse/BEAM-9546
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>  Labels: stale-assigned
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9839) OnTimerContext should not create a new one when processing each element/timer in FnApiDoFnRunner

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9839:
--
Labels: stale-assigned  (was: )

> OnTimerContext should not create a new one when processing each element/timer 
> in FnApiDoFnRunner
> 
>
> Key: BEAM-9839
> URL: https://issues.apache.org/jira/browse/BEAM-9839
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Rehman Murad Ali
>Assignee: Rehman Murad Ali
>Priority: P2
>  Labels: stale-assigned
>
> {color:#24292e}The intent of these Context objects was to not create a new 
> one when processing each element/timer and instead to reference a member 
> variable as can be seen in:{color}
>  
> Discussed here :
> https://github.com/apache/beam/pull/11154/#discussion_r416023080
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9863) AvroUtils is converting incorrectly LogicalType Timestamps from long into Joda DateTimes

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9863:
--
Labels: stale-assigned  (was: )

> AvroUtils is converting incorrectly LogicalType Timestamps from long into 
> Joda DateTimes
> 
>
> Key: BEAM-9863
> URL: https://issues.apache.org/jira/browse/BEAM-9863
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0, 2.21.0
>Reporter: Ismaël Mejía
>Assignee: Reuven Lax
>Priority: P2
>  Labels: stale-assigned
>
> Copied from the mailing list report:
> I think the method AvroUtils.toBeamSchema has a not expected side effect. 
> I found out that, if you invoke it and then you run a pipeline of 
> GenericRecords containing a timestamp (l tried with logical-type 
> timestamp-millis), Beam converts such timestamp from long to 
> org.joda.time.DateTime. Even if you don't apply any transformation to the 
> pipeline.
> Do you think it's a bug? 
> More details on how to reproduce here:
> https://lists.apache.org/thread.html/r43fb2896e496b7493a962207eb3b95360abc30b9d091b26f110264d0%40%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9863) AvroUtils is converting incorrectly LogicalType Timestamps from long into Joda DateTimes

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121542#comment-17121542
 ] 

Kenneth Knowles commented on BEAM-9863:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> AvroUtils is converting incorrectly LogicalType Timestamps from long into 
> Joda DateTimes
> 
>
> Key: BEAM-9863
> URL: https://issues.apache.org/jira/browse/BEAM-9863
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0, 2.21.0
>Reporter: Ismaël Mejía
>Assignee: Reuven Lax
>Priority: P2
>  Labels: stale-assigned
>
> Copied from the mailing list report:
> I think the method AvroUtils.toBeamSchema has a not expected side effect. 
> I found out that, if you invoke it and then you run a pipeline of 
> GenericRecords containing a timestamp (l tried with logical-type 
> timestamp-millis), Beam converts such timestamp from long to 
> org.joda.time.DateTime. Even if you don't apply any transformation to the 
> pipeline.
> Do you think it's a bug? 
> More details on how to reproduce here:
> https://lists.apache.org/thread.html/r43fb2896e496b7493a962207eb3b95360abc30b9d091b26f110264d0%40%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121565#comment-17121565
 ] 

Kenneth Knowles commented on BEAM-9742:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Add ability to pass FluentBackoff to JdbcIo.Write
> -
>
> Key: BEAM-9742
> URL: https://issues.apache.org/jira/browse/BEAM-9742
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Akshay Iyangar
>Assignee: Akshay Iyangar
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, the FluentBackoff is hardcoded with `maxRetries` and 
> `initialBackoff` .
> It would be helpful if the client were able to pass these values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9861) BigQueryStorageStreamSource fails with split fractions of 0.0 or 1.0

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121543#comment-17121543
 ] 

Kenneth Knowles commented on BEAM-9861:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> BigQueryStorageStreamSource fails with split fractions of 0.0 or 1.0
> 
>
> Key: BEAM-9861
> URL: https://issues.apache.org/jira/browse/BEAM-9861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kenneth Jung
>Assignee: Kenneth Jung
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9322) Python SDK ignores manually set PCollection tags

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9322:
--
Labels: stale-assigned  (was: )

> Python SDK ignores manually set PCollection tags
> 
>
> Key: BEAM-9322
> URL: https://issues.apache.org/jira/browse/BEAM-9322
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: P1
>  Labels: stale-assigned
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The Python SDK currently ignores any tags set on PCollections manually when 
> applying PTransforms when adding the PCollection to the PTransform 
> [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]].
>  In the 
> [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]]
>  method, the tag is set to None for all PValues, meaning the output tags are 
> set to an enumeration index over the PCollection outputs. The tags are not 
> propagated to correctly which can be a problem on relying on the output 
> PCollection tags to match the user set values.
> The fix is to correct BEAM-1833, and always pass in the tags. However, that 
> doesn't fix the problem for nested PCollections. If you have a dict of lists 
> of PCollections, what should their tags be correctly set to? In order to fix 
> this, first propagate the correct tag then talk with the community about the 
> best auto-generated tags.
> Some users may rely on the old implementation, so a flag will be created: 
> "force_generated_pcollection_output_ids" and be default set to False. If 
> True, this will go to the old implementation and generate tags for 
> PCollections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9640) Track PCollection watermark across bundle executions

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9640:
--
Labels: stale-assigned  (was: )

> Track PCollection watermark across bundle executions
> 
>
> Key: BEAM-9640
> URL: https://issues.apache.org/jira/browse/BEAM-9640
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This can be done without relying on the watermark manager for execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9787) Send clear error to users trying to use BigQuerySource on FnApi pipelines on Python SDK

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9787:
--
Labels: stale-assigned  (was: )

> Send clear error to users trying to use BigQuerySource on FnApi pipelines on 
> Python SDK
> ---
>
> Key: BEAM-9787
> URL: https://issues.apache.org/jira/browse/BEAM-9787
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9850) Key should be available in @OnTimer methods (Spark Runner)

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121539#comment-17121539
 ] 

Kenneth Knowles commented on BEAM-9850:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

>  Key should be available in @OnTimer methods (Spark Runner)
> ---
>
> Key: BEAM-9850
> URL: https://issues.apache.org/jira/browse/BEAM-9850
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Reporter: Rehman Murad Ali
>Assignee: Rehman Murad Ali
>Priority: P2
>  Labels: stale-assigned
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in the state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121689#comment-17121689
 ] 

Kenneth Knowles commented on BEAM-9286:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9626) pymongo should be an optional requirement

2020-06-01 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121605#comment-17121605
 ] 

Kenneth Knowles commented on BEAM-9626:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> pymongo should be an optional requirement
> -
>
> Key: BEAM-9626
> URL: https://issues.apache.org/jira/browse/BEAM-9626
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The pymongo driver is installed by default, but as the number of IO 
> connectors in the python sdk grows, I don't think this is the precedent we 
> want to set.  We already have "extra" packages for gcp, aws, and interactive, 
> we should also add one for mongo. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9682) Windowing | Go SDK Code Katas

2020-06-01 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9682:
--
Labels: stale-assigned  (was: )

> Windowing | Go SDK Code Katas
> -
>
> Key: BEAM-9682
> URL: https://issues.apache.org/jira/browse/BEAM-9682
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Labels: stale-assigned
>
> A kata devoted to windowing patterned after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Windowing].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 3136 matches

Mail list logo