[jira] [Work logged] (BEAM-9147) [Java] PTransform that integrates Video Intelligence functionality

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9147?focusedWorklogId=413801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413801
 ]

ASF GitHub Bot logged work on BEAM-9147:


Author: ASF GitHub Bot
Created on: 01/Apr/20 06:12
Start Date: 01/Apr/20 06:12
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #11261: [BEAM-9147] Add a 
VideoIntelligence transform to Java SDK
URL: https://github.com/apache/beam/pull/11261#issuecomment-607055702
 
 
   @Ardagan can you take a look at this or delegate to somebody else? Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413801)
Time Spent: 50m  (was: 40m)

> [Java] PTransform that integrates Video Intelligence functionality
> --
>
> Key: BEAM-9147
> URL: https://issues.apache.org/jira/browse/BEAM-9147
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Kamil Wasilewski
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> A module with the transform should be put into _`sdks/java/extensions`_ 
> folder.
> [1] [https://cloud.google.com/video-intelligence/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413799
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 01/Apr/20 06:05
Start Date: 01/Apr/20 06:05
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #11276: [BEAM-7923] An 
indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#issuecomment-607053264
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413799)
Time Spent: 14h 10m  (was: 14h)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9456) Upgrade to gradle 6.2

2020-03-31 Thread Alex Van Boxel (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072406#comment-17072406
 ] 

Alex Van Boxel commented on BEAM-9456:
--

it's a lot more involved than that. I already got protobuf, net.ltgt.gradle.*

[~dschmitt] Are you planning the upgrade? I hate todo double effort.

> Upgrade to gradle 6.2
> -
>
> Key: BEAM-9456
> URL: https://issues.apache.org/jira/browse/BEAM-9456
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=413792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413792
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 01/Apr/20 05:37
Start Date: 01/Apr/20 05:37
Worklog Time Spent: 10m 
  Work Description: vmarquez commented on issue #10546: [BEAM-9008] Add 
CassandraIO readAll method
URL: https://github.com/apache/beam/pull/10546#issuecomment-607043571
 
 
   Hi @iemejia I've been busy with work and life but was finally able to this 
just about finished up!  To get around the connection issue (not caching the 
connection was causing the Cassandra tests to run for ~ten minutes!), by 
passing in a ReadAll to the ReadFn, we're able to initiate a connection in 
the setup method, but we can still 'dynamically' use the passed in Read to 
generate specific queries or query ranges.  
   
   As for a more advanced connection pooling, I'd prefer to get this merged in 
and then perhaps work on an additional PR.  
   
   There are some minor merge conflicts but I wasn't sure how you wanted me to 
handle that, do you prefer I rebase against master (? that might then cause 
issues with Github's UI seeing other folks commits in my changelist), or should 
I merge master into this then another commit(which makes later squashing a bit 
harder).  Advice on that would be appreciated!  Thanks again for your guidance 
and help on this. I like the new design and the flexibility to modify the 
queries as well as ring ranges is pretty neat. 
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413792)
Time Spent: 5.5h  (was: 5h 20m)

> Add readAll() method to CassandraIO
> ---
>
> Key: BEAM-9008
> URL: https://issues.apache.org/jira/browse/BEAM-9008
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-cassandra
>Affects Versions: 2.16.0
>Reporter: vincent marquez
>Assignee: vincent marquez
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When querying a large cassandra database, it's often *much* more useful to 
> programatically generate the queries needed to to be run rather than reading 
> all partitions and attempting some filtering.  
> As an example:
> {code:java}
> public class Event { 
>@PartitionKey(0) public UUID accountId;
>@PartitionKey(1)public String yearMonthDay; 
>@ClusteringKey public UUID eventId;  
>//other data...
> }{code}
> If there is ten years worth of data, you may want to only query one year's 
> worth.  Here each token range would represent one 'token' but all events for 
> the day. 
> {code:java}
> Set accounts = getRelevantAccounts();
> Set dateRange = generateDateRange("2018-01-01", "2019-01-01");
> PCollection tokens = generateTokens(accounts, dateRange); 
> {code}
>  
>  I propose an additional _readAll()_ PTransform that can take a PCollection 
> of token ranges and can return a PCollection of what the query would 
> return. 
> *Question: How much code should be in common between both methods?* 
> Currently the read connector already groups all partitions into a List of 
> Token Ranges, so it would be simple to refactor the current read() based 
> method to a 'ParDo' based one and have them both share the same function.  
> Reasons against sharing code between read and readAll
>  * Not having the read based method return a BoundedSource connector would 
> mean losing the ability to know the size of the data returned
>  * Currently the CassandraReader executes all the grouped TokenRange queries 
> *asynchronously* which is (maybe?) fine when all that's happening is 
> splitting up all the partition ranges but terrible for executing potentially 
> millions of queries. 
>  Reasons _for_ sharing code would be simplified code base and that both of 
> the above issues would most likely have a negligable performance impact. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=413791&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413791
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 01/Apr/20 05:36
Start Date: 01/Apr/20 05:36
Worklog Time Spent: 10m 
  Work Description: vmarquez commented on issue #10546: [BEAM-9008] Add 
CassandraIO readAll method
URL: https://github.com/apache/beam/pull/10546#issuecomment-607043571
 
 
   Hi @iemejia I've been busy with work and life but was finally able to this 
just about finished up!  To get around the connection issue (not caching the 
connection was causing the Cassandra tests to run for ~ten minutes!), by 
passing in a ReadAll to the ReadFn, we're able to initiate a connection in 
the setup method, but we can still 'dynamically' use the passed in Read to 
generate specific queries or query ranges.  
   
   As for a more advanced connection pooling, I'd prefer to get this merged in 
and then perhaps work on an additional PR.  
   
   There are some minor conflicts but I wasn't sure how you wanted to handle 
that, do you prefer I rebase against master (? that might then cause issues 
with Github's UI seeing other folks commits in my changelist), or should I 
merge master into this then another commit(which makes later squashing a bit 
harder).  Advice on that would be appreciated!  Thanks again for your guidance 
and help on this. I like the new design and the flexibility to modify the 
queries as well as ring ranges is pretty neat. 
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413791)
Time Spent: 5h 20m  (was: 5h 10m)

> Add readAll() method to CassandraIO
> ---
>
> Key: BEAM-9008
> URL: https://issues.apache.org/jira/browse/BEAM-9008
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-cassandra
>Affects Versions: 2.16.0
>Reporter: vincent marquez
>Assignee: vincent marquez
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> When querying a large cassandra database, it's often *much* more useful to 
> programatically generate the queries needed to to be run rather than reading 
> all partitions and attempting some filtering.  
> As an example:
> {code:java}
> public class Event { 
>@PartitionKey(0) public UUID accountId;
>@PartitionKey(1)public String yearMonthDay; 
>@ClusteringKey public UUID eventId;  
>//other data...
> }{code}
> If there is ten years worth of data, you may want to only query one year's 
> worth.  Here each token range would represent one 'token' but all events for 
> the day. 
> {code:java}
> Set accounts = getRelevantAccounts();
> Set dateRange = generateDateRange("2018-01-01", "2019-01-01");
> PCollection tokens = generateTokens(accounts, dateRange); 
> {code}
>  
>  I propose an additional _readAll()_ PTransform that can take a PCollection 
> of token ranges and can return a PCollection of what the query would 
> return. 
> *Question: How much code should be in common between both methods?* 
> Currently the read connector already groups all partitions into a List of 
> Token Ranges, so it would be simple to refactor the current read() based 
> method to a 'ParDo' based one and have them both share the same function.  
> Reasons against sharing code between read and readAll
>  * Not having the read based method return a BoundedSource connector would 
> mean losing the ability to know the size of the data returned
>  * Currently the CassandraReader executes all the grouped TokenRange queries 
> *asynchronously* which is (maybe?) fine when all that's happening is 
> splitting up all the partition ranges but terrible for executing potentially 
> millions of queries. 
>  Reasons _for_ sharing code would be simplified code base and that both of 
> the above issues would most likely have a negligable performance impact. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9636) Run Dataflow ValidatesRunner (:runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest) failing

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9636?focusedWorklogId=413789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413789
 ]

ASF GitHub Bot logged work on BEAM-9636:


Author: ASF GitHub Bot
Created on: 01/Apr/20 05:17
Start Date: 01/Apr/20 05:17
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #11266: [BEAM-9636] Using 
TimerFamily should also be considered as stateful.
URL: https://github.com/apache/beam/pull/11266#issuecomment-607037664
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413789)
Time Spent: 2h 10m  (was: 2h)

> Run Dataflow ValidatesRunner 
> (:runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest) failing
> --
>
> Key: BEAM-9636
> URL: https://issues.apache.org/jira/browse/BEAM-9636
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/4719/console
> {noformat}
> 00:27:33 > Task 
> :runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest
> 00:28:48 
> 00:28:48 org.apache.beam.sdk.transforms.ParDoTest$TimerFamilyTests > 
> testTimerWithMultipleTimerFamilyUnbounded FAILED
> 00:28:48 java.lang.RuntimeException at ParDoTest.java:4770
> 00:28:48 Caused by: java.lang.IllegalArgumentException at 
> ParDoTest.java:4770
> 00:42:57 
> 00:42:57 org.apache.beam.sdk.transforms.ParDoTest$TimerFamilyTests > 
> testTimerFamilyEventTimeUnbounded FAILED
> 00:42:57 java.lang.RuntimeException at ParDoTest.java:4708
> 00:42:57 Caused by: java.lang.IllegalArgumentException at 
> ParDoTest.java:4708
> {noformat}
> History: 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413769
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 01/Apr/20 04:38
Start Date: 01/Apr/20 04:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-607027090
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413769)
Remaining Estimate: 145h 50m  (was: 146h)
Time Spent: 22h 10m  (was: 22h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 22h 10m
>  Remaining Estimate: 145h 50m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9263) Bump python sdk fnapi version to enable status reporting

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9263?focusedWorklogId=413722&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413722
 ]

ASF GitHub Bot logged work on BEAM-9263:


Author: ASF GitHub Bot
Created on: 01/Apr/20 03:29
Start Date: 01/Apr/20 03:29
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10870: [BEAM-9263] 
Bump up sdk dataflow environment major versions
URL: https://github.com/apache/beam/pull/10870
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413722)
Time Spent: 1h 50m  (was: 1h 40m)

> Bump python sdk fnapi version to enable status reporting
> 
>
> Key: BEAM-9263
> URL: https://issues.apache.org/jira/browse/BEAM-9263
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Bump python sdk fn api environment version to 8 for roll out the status 
> feature for sdk harness status reporting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9608) Add context managers for FnApiRunner to manage execution of each bundle

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9608?focusedWorklogId=413718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413718
 ]

ASF GitHub Bot logged work on BEAM-9608:


Author: ASF GitHub Bot
Created on: 01/Apr/20 03:19
Start Date: 01/Apr/20 03:19
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11229: [BEAM-9608] 
Increasing scope of context managers for FnApiRunner
URL: https://github.com/apache/beam/pull/11229
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413718)
Time Spent: 2h 50m  (was: 2h 40m)

> Add context managers for FnApiRunner to manage execution of each bundle
> ---
>
> Key: BEAM-9608
> URL: https://issues.apache.org/jira/browse/BEAM-9608
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413696&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413696
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 01/Apr/20 02:45
Start Date: 01/Apr/20 02:45
Worklog Time Spent: 10m 
  Work Description: vnorigoog commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-606995193
 
 
   On Tue, Mar 31, 2020 at 7:09 PM Lukasz Cwik 
   wrote:
   
   > Did you run check task of the gradle project to cover
   > spotbugs/checkstyle/unit tests/... since test only covers testing.
   >
   I didn't.
   that explains the step I am missing.
   
   thanks
   
   > Also, I had to run the command locally after checking out your PR since
   > the full error log in Jenkins points to a local file on the Jenkins host
   > which isn't accessible.
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413696)
Remaining Estimate: 146h  (was: 146h 10m)
Time Spent: 22h  (was: 21h 50m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 22h
>  Remaining Estimate: 146h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413691&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413691
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 01/Apr/20 02:27
Start Date: 01/Apr/20 02:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-606984664
 
 
   Did you run `check` task of the gradle project to cover 
spotbugs/checkstyle/unit tests/... since `test` only covers testing?
   
   Also, I had to run the command locally after checking out your PR since the 
full error log in Jenkins points to a local file on the Jenkins host which 
isn't accessible.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413691)
Remaining Estimate: 146h 10m  (was: 146h 20m)
Time Spent: 21h 50m  (was: 21h 40m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 21h 50m
>  Remaining Estimate: 146h 10m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9624) Combine operation should support only converting to accumulators

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9624?focusedWorklogId=413682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413682
 ]

ASF GitHub Bot logged work on BEAM-9624:


Author: ASF GitHub Bot
Created on: 01/Apr/20 02:18
Start Date: 01/Apr/20 02:18
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11271: [BEAM-9624] 
Adds Convert to Accumulators operator for use in combiner lifting for streaming 
pipelines
URL: https://github.com/apache/beam/pull/11271#discussion_r401318044
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/combine_test.go
 ##
 @@ -113,6 +113,40 @@ func TestLiftedCombine(t *testing.T) {
 
 }
 
+// TestConvertToaccumulators verifies that if we just do ConvertToAccumulators 
instead
+// of LiftedCombine before the GBK we still get the right answer.
+func TestConvertToAccumulators(t *testing.T) {
+   withCoder := func(t *testing.T, suffix string, key interface{}, 
keyCoder *coder.Coder) {
+   for _, test := range tests {
+   t.Run(fnName(test.Fn)+"_"+suffix, func(t *testing.T) {
+   edge := getCombineEdge(t, test.Fn, 
reflectx.Int, test.AccumCoder)
+
+   out := &CaptureNode{UID: 1}
+   extract := &ExtractOutput{Combine: 
&Combine{UID: 2, Fn: edge.CombineFn, Out: out}}
+   merge := &MergeAccumulators{Combine: 
&Combine{UID: 3, Fn: edge.CombineFn, Out: extract}}
+   gbk := &simpleGBK{UID: 4, KeyCoder: keyCoder, 
Out: merge}
+   convertToAccumulators := 
&ConvertToAccumulators{Combine: &Combine{UID: 5, Fn: edge.CombineFn, Out: gbk}}
 
 Review comment:
   This will validate that you get the right output but doesn't actually ensure 
that convertToAccumulators didn't do any combining.
   
   It might be worthwhile to just test the convertToAccumulators step without 
any of the other steps alongside it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413682)
Time Spent: 1h 40m  (was: 1.5h)

> Combine operation should support only converting to accumulators
> 
>
> Key: BEAM-9624
> URL: https://issues.apache.org/jira/browse/BEAM-9624
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> For streaming pipelines, we want to be able to lift the combiner into the 
> MergeBuckets without having to also do a PartialGroupByKey before the 
> shuffle. We don't want to do the PGBK since it could cause non-deterministic 
> results when used with some triggers.
> We propose adding a new URN for doing just the convert to accumulators step 
> and adding support for it in Java/Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9624) Combine operation should support only converting to accumulators

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9624?focusedWorklogId=413677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413677
 ]

ASF GitHub Bot logged work on BEAM-9624:


Author: ASF GitHub Bot
Created on: 01/Apr/20 02:14
Start Date: 01/Apr/20 02:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11271: [BEAM-9624] Adds 
Convert to Accumulators operator for use in combiner lifting for streaming 
pipelines
URL: https://github.com/apache/beam/pull/11271#issuecomment-606986088
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413677)
Time Spent: 1.5h  (was: 1h 20m)

> Combine operation should support only converting to accumulators
> 
>
> Key: BEAM-9624
> URL: https://issues.apache.org/jira/browse/BEAM-9624
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> For streaming pipelines, we want to be able to lift the combiner into the 
> MergeBuckets without having to also do a PartialGroupByKey before the 
> shuffle. We don't want to do the PGBK since it could cause non-deterministic 
> results when used with some triggers.
> We propose adding a new URN for doing just the convert to accumulators step 
> and adding support for it in Java/Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413676&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413676
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 01/Apr/20 02:09
Start Date: 01/Apr/20 02:09
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-606984664
 
 
   Did you run `check` task of the gradle project to cover 
spotbugs/checkstyle/unit tests/... since `test` only covers testing.
   
   Also, I had to run the command locally after checking out your PR since the 
full error log in Jenkins points to a local file on the Jenkins host which 
isn't accessible.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413676)
Remaining Estimate: 146h 20m  (was: 146.5h)
Time Spent: 21h 40m  (was: 21.5h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 21h 40m
>  Remaining Estimate: 146h 20m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413670&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413670
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 01/Apr/20 02:00
Start Date: 01/Apr/20 02:00
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-606982232
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413670)
Remaining Estimate: 146.5h  (was: 146h 40m)
Time Spent: 21.5h  (was: 21h 20m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 21.5h
>  Remaining Estimate: 146.5h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8831) Python PreCommit Failures: Could not copy file '/some/path/file.egg'

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8831?focusedWorklogId=413663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413663
 ]

ASF GitHub Bot logged work on BEAM-8831:


Author: ASF GitHub Bot
Created on: 01/Apr/20 01:41
Start Date: 01/Apr/20 01:41
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #10230: [BEAM-8831] 
Exclude generated files for Python source copy
URL: https://github.com/apache/beam/pull/10230#issuecomment-606976144
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413663)
Time Spent: 2h  (was: 1h 50m)

> Python PreCommit Failures: Could not copy file '/some/path/file.egg'
> 
>
> Key: BEAM-8831
> URL: https://issues.apache.org/jira/browse/BEAM-8831
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Luke Cwik
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Several precommits fail due to "Could not copy file '/some/path/file.egg'"
> Examples
> [https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd&openStackTraces=WzZd#top=0]
> [https://scans.gradle.com/s/ihfmrxr7evslw]
> {code}
> > Cannot create directory 
> > '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/.eggs/timeloop-1.0.2-py3.7.egg'
> >  as it already exists, but is not a directory
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9624) Combine operation should support only converting to accumulators

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9624?focusedWorklogId=413645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413645
 ]

ASF GitHub Bot logged work on BEAM-9624:


Author: ASF GitHub Bot
Created on: 01/Apr/20 00:44
Start Date: 01/Apr/20 00:44
Worklog Time Spent: 10m 
  Work Description: acrites commented on issue #11271: [BEAM-9624] Adds 
Convert to Accumulators operator for use in combiner lifting for streaming 
pipelines
URL: https://github.com/apache/beam/pull/11271#issuecomment-606961902
 
 
   I fixed the problem in :sdks:java:harness:compileTestJava. It turns out my 
branch was old and createRunnerForPTransform only had 14 parameters, but now 
it's 15.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413645)
Time Spent: 1h 20m  (was: 1h 10m)

> Combine operation should support only converting to accumulators
> 
>
> Key: BEAM-9624
> URL: https://issues.apache.org/jira/browse/BEAM-9624
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> For streaming pipelines, we want to be able to lift the combiner into the 
> MergeBuckets without having to also do a PartialGroupByKey before the 
> shuffle. We don't want to do the PGBK since it could cause non-deterministic 
> results when used with some triggers.
> We propose adding a new URN for doing just the convert to accumulators step 
> and adding support for it in Java/Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9608) Add context managers for FnApiRunner to manage execution of each bundle

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9608?focusedWorklogId=413641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413641
 ]

ASF GitHub Bot logged work on BEAM-9608:


Author: ASF GitHub Bot
Created on: 01/Apr/20 00:40
Start Date: 01/Apr/20 00:40
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11229: [BEAM-9608] 
Increasing scope of context managers for FnApiRunner
URL: https://github.com/apache/beam/pull/11229#discussion_r401293360
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -369,8 +366,8 @@ def _store_side_inputs_in_state(self,
   state_key = beam_fn_api_pb2.StateKey(
   iterable_side_input=beam_fn_api_pb2.StateKey.IterableSideInput(
   transform_id=transform_id, side_input_id=tag, window=window))
-  bundle_context_manager.worker_handler.state.append_raw(
-  state_key, elements_data)
+  runner_execution_context.worker_handler_manager.state_servicer\
 
 Review comment:
   Link to JIRA for refernce?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413641)
Time Spent: 2h 40m  (was: 2.5h)

> Add context managers for FnApiRunner to manage execution of each bundle
> ---
>
> Key: BEAM-9608
> URL: https://issues.apache.org/jira/browse/BEAM-9608
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413636
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 01/Apr/20 00:33
Start Date: 01/Apr/20 00:33
Worklog Time Spent: 10m 
  Work Description: vnorigoog commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-606959315
 
 
   That is a real problem.
   unused variable "gcsOptions"
   
   Thanks, Luke.
   
   could you please issue "Run Java PreCommit" again.
   I am trying to understand
   1. why local build didn't show this error when spotbugs ran on it.
   2. why didn't I see this error on github you saw. I should have. I wonder
   if I didn't look in the right place. not happy  with myself
   
   
   On Tue, Mar 31, 2020 at 4:50 PM Lukasz Cwik 
   wrote:
   
   > It looks like spotbugs is failing in the Java PreCommit with:
   >
   > Dead store to gcsOptions in 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory.create(PipelineOptions,
 Storage, HttpRequestInitializer, ExecutorService, Integer)
   > Bug type DLS_DEAD_LOCAL_STORE (click for details)
   > In class org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory
   > In method 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory.create(PipelineOptions,
 Storage, HttpRequestInitializer, ExecutorService, Integer)
   > Local variable named gcsOptions
   > At GcsUtil.java:[line 115]
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413636)
Remaining Estimate: 146h 40m  (was: 146h 50m)
Time Spent: 21h 20m  (was: 21h 10m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 21h 20m
>  Remaining Estimate: 146h 40m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9624) Combine operation should support only converting to accumulators

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9624?focusedWorklogId=413633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413633
 ]

ASF GitHub Bot logged work on BEAM-9624:


Author: ASF GitHub Bot
Created on: 01/Apr/20 00:22
Start Date: 01/Apr/20 00:22
Worklog Time Spent: 10m 
  Work Description: acrites commented on pull request #11271: [BEAM-9624] 
Adds Convert to Accumulators operator for use in combiner lifting for streaming 
pipelines
URL: https://github.com/apache/beam/pull/11271#discussion_r401288586
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/combine.go
 ##
 @@ -499,3 +499,30 @@ func (n *ExtractOutput) ProcessElement(ctx 
context.Context, value *FullValue, va
}
return n.Out.ProcessElement(n.Combine.ctx, &FullValue{Windows: 
value.Windows, Elm: value.Elm, Elm2: out, Timestamp: value.Timestamp})
 }
+
+// ConvertToAccumulators is an executor for converting an input value to an 
accumulator value.
+type ConvertToAccumulators struct {
+   *Combine
+}
+
+func (n *ConvertToAccumulators) String() string {
 
 Review comment:
   Talked with Robert and we came up with TestConvertToAccumulators.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413633)
Time Spent: 1h 10m  (was: 1h)

> Combine operation should support only converting to accumulators
> 
>
> Key: BEAM-9624
> URL: https://issues.apache.org/jira/browse/BEAM-9624
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For streaming pipelines, we want to be able to lift the combiner into the 
> MergeBuckets without having to also do a PartialGroupByKey before the 
> shuffle. We don't want to do the PGBK since it could cause non-deterministic 
> results when used with some triggers.
> We propose adding a new URN for doing just the convert to accumulators step 
> and adding support for it in Java/Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9624) Combine operation should support only converting to accumulators

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9624?focusedWorklogId=413632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413632
 ]

ASF GitHub Bot logged work on BEAM-9624:


Author: ASF GitHub Bot
Created on: 01/Apr/20 00:21
Start Date: 01/Apr/20 00:21
Worklog Time Spent: 10m 
  Work Description: acrites commented on pull request #11271: [BEAM-9624] 
Adds Convert to Accumulators operator for use in combiner lifting for streaming 
pipelines
URL: https://github.com/apache/beam/pull/11271#discussion_r401288240
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java
 ##
 @@ -101,6 +101,8 @@
   "beam:transform:combine_per_key_merge_accumulators:v1";
   public static final String COMBINE_PER_KEY_EXTRACT_OUTPUTS_TRANSFORM_URN =
   "beam:transform:combine_per_key_extract_outputs:v1";
+  public static final String 
COMBINE_PER_KEY_CONVERT_TO_ACCUMULATORS_TRANSFORM_URN =
 
 Review comment:
   Done!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413632)
Time Spent: 1h  (was: 50m)

> Combine operation should support only converting to accumulators
> 
>
> Key: BEAM-9624
> URL: https://issues.apache.org/jira/browse/BEAM-9624
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For streaming pipelines, we want to be able to lift the combiner into the 
> MergeBuckets without having to also do a PartialGroupByKey before the 
> shuffle. We don't want to do the PGBK since it could cause non-deterministic 
> results when used with some triggers.
> We propose adding a new URN for doing just the convert to accumulators step 
> and adding support for it in Java/Python/Go.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413629&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413629
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 01/Apr/20 00:09
Start Date: 01/Apr/20 00:09
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11276: [BEAM-7923] 
An indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#discussion_r401285069
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -142,3 +142,61 @@ def obfuscate(*inputs):
   str_inputs = [str(input) for input in inputs]
   merged_inputs = '_'.join(str_inputs)
   return hashlib.md5(merged_inputs.encode('utf-8')).hexdigest()
+
+
+class ProgressIndicator(object):
+  """An indicator visualizing code execution in progress."""
+  spinner_template = """
+https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"; 
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
 crossorigin="anonymous">
 
 Review comment:
   Added a TODO item to change the CDN. And once we create some Jupyter 
Labextension, we could have HTML+JS+CSS dependencies pre-installed, then the 
CDN would not be necessary.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413629)
Time Spent: 14h  (was: 13h 50m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413628
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:50
Start Date: 31/Mar/20 23:50
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-606946300
 
 
   It looks like spotbugs is failing in the Java PreCommit with:
   ```
   Dead store to gcsOptions in 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory.create(PipelineOptions,
 Storage, HttpRequestInitializer, ExecutorService, Integer)
   Bug type DLS_DEAD_LOCAL_STORE (click for details)
   In class org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory
   In method 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory.create(PipelineOptions,
 Storage, HttpRequestInitializer, ExecutorService, Integer)
   Local variable named gcsOptions
   At GcsUtil.java:[line 115]
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413628)
Remaining Estimate: 146h 50m  (was: 147h)
Time Spent: 21h 10m  (was: 21h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 21h 10m
>  Remaining Estimate: 146h 50m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9456) Upgrade to gradle 6.2

2020-03-31 Thread dasch (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072242#comment-17072242
 ] 

dasch commented on BEAM-9456:
-

Based on a quick look, I believe this will (at least) require updating the 
spotless version to 3.24.2+ [0] and changing the build-scan plugin [1].

[0] https://github.com/android/plaid/issues/802
[1] https://docs.gradle.com/enterprise/gradle-plugin/#gradle_6_x_and_later

> Upgrade to gradle 6.2
> -
>
> Key: BEAM-9456
> URL: https://issues.apache.org/jira/browse/BEAM-9456
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=413621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413621
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:33
Start Date: 31/Mar/20 23:33
Worklog Time Spent: 10m 
  Work Description: lastomato commented on issue #11151: [BEAM-9468]  Hl7v2 
io
URL: https://github.com/apache/beam/pull/11151#issuecomment-606940440
 
 
   Sorry for the delay. Please see my comments inline.
   
   > Open Questions:
   > 
   > 1. Should we remove adaptive throttling?
   
   I think it is fine to remove it since the API has quota enabled by default, 
and retry logic is in place.
   
   >
   >* Seems that we're using retries in the client request initializer and 
right now a "bad record" will slow down the Read / Write (even though the error 
has nothing to do with the HL7v2 store being overwhelmed). Originally we wanted 
to be safe with overwhelming QPS on the HL7v2 store in batch scenarios.
   > 2. Should we add more to the `HealthcareIOError`?
   
   The second point below would be very helpful, but I am fine with adding it 
to next PR (if that's easier).
   
   >
   >* Add (processing time) Timestamp?
   >* Add a convenience DoFn `HealthcareIOErrrorToTableRowFn` to ease 
writing deadletter queue to BigQuery.
   > 3. Would it be more useful to expose an error rate metric than an error 
count?
   
   This functionality is probably already provided by services like 
stackdriver, we might want to wait until there is a concrete use case.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413621)
Time Spent: 8h 40m  (was: 8.5h)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=413620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413620
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:32
Start Date: 31/Mar/20 23:32
Worklog Time Spent: 10m 
  Work Description: lastomato commented on issue #11151: [BEAM-9468]  Hl7v2 
io
URL: https://github.com/apache/beam/pull/11151#issuecomment-606940440
 
 
   Sorry for the delay. Please see my comments inline.
   
   > Open Questions:
   > 
   > 1. Should we remove adaptive throttling?
   I think it is fine to remove it since the API has quota enabled by default, 
and retry logic is in place.
   >
   >* Seems that we're using retries in the client request initializer and 
right now a "bad record" will slow down the Read / Write (even though the error 
has nothing to do with the HL7v2 store being overwhelmed). Originally we wanted 
to be safe with overwhelming QPS on the HL7v2 store in batch scenarios.
   > 2. Should we add more to the `HealthcareIOError`?
   The second point below would be very helpful, but I am fine with adding it 
to next PR (if that's easier).
   >
   >* Add (processing time) Timestamp?
   >* Add a convenience DoFn `HealthcareIOErrrorToTableRowFn` to ease 
writing deadletter queue to BigQuery.
   > 3. Would it be more useful to expose an error rate metric than an error 
count?
   This functionality is probably already provided by services like 
stackdriver, we might want to wait until there is a concrete use case.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413620)
Time Spent: 8.5h  (was: 8h 20m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9652) BigQueryIO MultiPartitionsWriteTables fails with ClassCastException: java.lang.Object cannot be cast to org.apache.beam.sdk.values.KV

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9652?focusedWorklogId=413614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413614
 ]

ASF GitHub Bot logged work on BEAM-9652:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:21
Start Date: 31/Mar/20 23:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11280: [BEAM-9652] 
Ensure that the multipartition write sets the correct coder to use instead of 
relying on coder inference.
URL: https://github.com/apache/beam/pull/11280#discussion_r401270488
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java
 ##
 @@ -399,11 +396,9 @@ public WriteResult 
expandUntriggered(PCollection> inp
 PCollection> tempTables =
 writeTempTables(partitions.get(multiPartitionsTag), 
loadJobIdPrefixView);
 
-Coder tableDestinationCoder =
-clusteringEnabled ? TableDestinationCoderV3.of() : 
TableDestinationCoderV2.of();
 tempTables
 .apply("ReifyRenameInput", new ReifyAsIterable<>())
-.setCoder(IterableCoder.of(KvCoder.of(tableDestinationCoder, 
StringUtf8Coder.of(
+.setCoder(IterableCoder.of(tempTables.getCoder()))
 
 Review comment:
   Fixed and double checked that the inputs to ReifyAsIterable are setting the 
correct coder.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413614)
Time Spent: 40m  (was: 0.5h)

> BigQueryIO MultiPartitionsWriteTables fails with ClassCastException: 
> java.lang.Object cannot be cast to org.apache.beam.sdk.values.KV
> -
>
> Key: BEAM-9652
> URL: https://issues.apache.org/jira/browse/BEAM-9652
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It looks like the coder inference fails for BatchLoad.writeTempTables and 
> selects an avro coder:
> {code:java}
> object_value: <
>   type: "org.apache.beam.sdk.coders.AvroCoder"
>   parameters: <
> name: "type"
> value: <
>   string_value: "java.lang.Object"
> >
>   >
>   parameters: <
> name: "schema"
> value: <
>   string_value: 
> "{\"type\":\"record\",\"name\":\"Object\",\"namespace\":\"java.lang\",\"fields\":[]}"
> >
>   >
> {code}
> Full exception:
> {code:java}
> exception: "java.lang.ClassCastException: java.lang.Object cannot be cast to 
> org.apache.beam.sdk.values.KV at 
> org.apache.beam.sdk.coders.KvCoder.registerByteSizeObserver(KvCoder.java:36) 
> at 
> org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:191)
>  at 
> org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:60)
>  at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:623)
>  at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:539)
>  at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9199) Make --region a required flag for DataflowRunner

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9199?focusedWorklogId=413613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413613
 ]

ASF GitHub Bot logged work on BEAM-9199:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:20
Start Date: 31/Mar/20 23:20
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11281: [BEAM-9199] 
Require --region option for Dataflow in Java SDK.
URL: https://github.com/apache/beam/pull/11281
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_Validat

[jira] [Work logged] (BEAM-9263) Bump python sdk fnapi version to enable status reporting

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9263?focusedWorklogId=413604&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413604
 ]

ASF GitHub Bot logged work on BEAM-9263:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:10
Start Date: 31/Mar/20 23:10
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10870: [BEAM-9263] Bump up 
sdk dataflow environment major versions
URL: https://github.com/apache/beam/pull/10870#issuecomment-606932628
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413604)
Time Spent: 1h 40m  (was: 1.5h)

> Bump python sdk fnapi version to enable status reporting
> 
>
> Key: BEAM-9263
> URL: https://issues.apache.org/jira/browse/BEAM-9263
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Bump python sdk fn api environment version to 8 for roll out the status 
> feature for sdk harness status reporting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9652) BigQueryIO MultiPartitionsWriteTables fails with ClassCastException: java.lang.Object cannot be cast to org.apache.beam.sdk.values.KV

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9652?focusedWorklogId=413603&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413603
 ]

ASF GitHub Bot logged work on BEAM-9652:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:09
Start Date: 31/Mar/20 23:09
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11280: [BEAM-9652] 
Ensure that the multipartition write sets the correct coder to use instead of 
relying on coder inference.
URL: https://github.com/apache/beam/pull/11280#discussion_r401266788
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java
 ##
 @@ -399,11 +396,9 @@ public WriteResult 
expandUntriggered(PCollection> inp
 PCollection> tempTables =
 writeTempTables(partitions.get(multiPartitionsTag), 
loadJobIdPrefixView);
 
-Coder tableDestinationCoder =
-clusteringEnabled ? TableDestinationCoderV3.of() : 
TableDestinationCoderV2.of();
 tempTables
 .apply("ReifyRenameInput", new ReifyAsIterable<>())
-.setCoder(IterableCoder.of(KvCoder.of(tableDestinationCoder, 
StringUtf8Coder.of(
+.setCoder(IterableCoder.of(tempTables.getCoder()))
 
 Review comment:
   I think this setCoder belongs inside the ReifyAsIterable PTransform
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413603)
Time Spent: 0.5h  (was: 20m)

> BigQueryIO MultiPartitionsWriteTables fails with ClassCastException: 
> java.lang.Object cannot be cast to org.apache.beam.sdk.values.KV
> -
>
> Key: BEAM-9652
> URL: https://issues.apache.org/jira/browse/BEAM-9652
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It looks like the coder inference fails for BatchLoad.writeTempTables and 
> selects an avro coder:
> {code:java}
> object_value: <
>   type: "org.apache.beam.sdk.coders.AvroCoder"
>   parameters: <
> name: "type"
> value: <
>   string_value: "java.lang.Object"
> >
>   >
>   parameters: <
> name: "schema"
> value: <
>   string_value: 
> "{\"type\":\"record\",\"name\":\"Object\",\"namespace\":\"java.lang\",\"fields\":[]}"
> >
>   >
> {code}
> Full exception:
> {code:java}
> exception: "java.lang.ClassCastException: java.lang.Object cannot be cast to 
> org.apache.beam.sdk.values.KV at 
> org.apache.beam.sdk.coders.KvCoder.registerByteSizeObserver(KvCoder.java:36) 
> at 
> org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:191)
>  at 
> org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:60)
>  at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:623)
>  at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:539)
>  at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9652) BigQueryIO MultiPartitionsWriteTables fails with ClassCastException: java.lang.Object cannot be cast to org.apache.beam.sdk.values.KV

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9652?focusedWorklogId=413600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413600
 ]

ASF GitHub Bot logged work on BEAM-9652:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:02
Start Date: 31/Mar/20 23:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11280: [BEAM-9652] Ensure 
that the multipartition write sets the correct coder to use instead of relying 
on coder inference.
URL: https://github.com/apache/beam/pull/11280#issuecomment-606929469
 
 
   R: @reuvenlax 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413600)
Time Spent: 20m  (was: 10m)

> BigQueryIO MultiPartitionsWriteTables fails with ClassCastException: 
> java.lang.Object cannot be cast to org.apache.beam.sdk.values.KV
> -
>
> Key: BEAM-9652
> URL: https://issues.apache.org/jira/browse/BEAM-9652
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It looks like the coder inference fails for BatchLoad.writeTempTables and 
> selects an avro coder:
> {code:java}
> object_value: <
>   type: "org.apache.beam.sdk.coders.AvroCoder"
>   parameters: <
> name: "type"
> value: <
>   string_value: "java.lang.Object"
> >
>   >
>   parameters: <
> name: "schema"
> value: <
>   string_value: 
> "{\"type\":\"record\",\"name\":\"Object\",\"namespace\":\"java.lang\",\"fields\":[]}"
> >
>   >
> {code}
> Full exception:
> {code:java}
> exception: "java.lang.ClassCastException: java.lang.Object cannot be cast to 
> org.apache.beam.sdk.values.KV at 
> org.apache.beam.sdk.coders.KvCoder.registerByteSizeObserver(KvCoder.java:36) 
> at 
> org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:191)
>  at 
> org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:60)
>  at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:623)
>  at 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:539)
>  at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9652) BigQueryIO MultiPartitionsWriteTables fails with ClassCastException: java.lang.Object cannot be cast to org.apache.beam.sdk.values.KV

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9652?focusedWorklogId=413599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413599
 ]

ASF GitHub Bot logged work on BEAM-9652:


Author: ASF GitHub Bot
Created on: 31/Mar/20 23:01
Start Date: 31/Mar/20 23:01
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11280: [BEAM-9652] 
Ensure that the multipartition write sets the correct coder to use instead of 
relying on coder inference.
URL: https://github.com/apache/beam/pull/11280
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRu

[jira] [Work logged] (BEAM-3097) Allow BigQuerySource to take a ValueProvider as a table input.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3097?focusedWorklogId=413593&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413593
 ]

ASF GitHub Bot logged work on BEAM-3097:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:50
Start Date: 31/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11244: [BEAM-3097] 
_ReadFromBigQuery supports valueprovider for table
URL: https://github.com/apache/beam/pull/11244#issuecomment-606925872
 
 
   r: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413593)
Remaining Estimate: 1h 10m  (was: 1h 20m)
Time Spent: 50m  (was: 40m)

> Allow BigQuerySource to take a ValueProvider as a table input.
> --
>
> Key: BEAM-3097
> URL: https://issues.apache.org/jira/browse/BEAM-3097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ed Mothershaw
>Priority: Minor
>   Original Estimate: 2h
>  Time Spent: 50m
>  Remaining Estimate: 1h 10m
>
> In file sdks/python/apache_beam/io/gcp/bigquery.py, class BigQuery, line 389. 
> When a ValueProvider is input as table the script will fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=413592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413592
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:49
Start Date: 31/Mar/20 22:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11203: [BEAM-9577] 
Define and implement dependency-aware artifact staging service.
URL: https://github.com/apache/beam/pull/11203
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413592)
Time Spent: 7.5h  (was: 7h 20m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=413591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413591
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:49
Start Date: 31/Mar/20 22:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #11203: [BEAM-9577] Define 
and implement dependency-aware artifact staging service.
URL: https://github.com/apache/beam/pull/11203#issuecomment-606925303
 
 
   The Java PreCommit has a different (unrelated) flake each time it's run. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413591)
Time Spent: 7h 20m  (was: 7h 10m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413578&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413578
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:32
Start Date: 31/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11276: [BEAM-7923] An 
indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#discussion_r401254017
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -142,3 +142,61 @@ def obfuscate(*inputs):
   str_inputs = [str(input) for input in inputs]
   merged_inputs = '_'.join(str_inputs)
   return hashlib.md5(merged_inputs.encode('utf-8')).hexdigest()
+
+
+class ProgressIndicator(object):
+  """An indicator visualizing code execution in progress."""
+  spinner_template = """
+https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"; 
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
 crossorigin="anonymous">
 
 Review comment:
   This will ping a third party server from users environment. I do not think 
this is great, but if we do not have a better option, that is fine.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413578)
Time Spent: 13h 50m  (was: 13h 40m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=413574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413574
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:31
Start Date: 31/Mar/20 22:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies for Python
URL: https://github.com/apache/beam/pull/11067#discussion_r401176311
 
 

 ##
 File path: sdks/python/container/license_scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,143 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A script to pull licenses for Python.
+The script is executed within Docker.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import tempfile
+import traceback
+import yaml
+
+from future.moves.urllib.request import urlopen
+from tenacity import retry
+from tenacity import stop_after_attempt
+from tenacity import wait_exponential
+
+LICENSE_DIR = '/opt/apache/beam/third_party_licenses'
+
+
+def run_bash_command(command):
+  return subprocess.check_output(command.split()).decode('utf-8')
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name'].lower()
+  dest_dir = '/'.join([LICENSE_DIR, name])
+  try:
+os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+print(
+'Successfully pulled license for {dep} with pip-licenses.'.format(
+dep=name))
+return True
+  except Exception as e:
 
 Review comment:
   Since you are catching the exception, tenacity will not know about it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413574)
Time Spent: 15h  (was: 14h 50m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=413575&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413575
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:31
Start Date: 31/Mar/20 22:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies for Python
URL: https://github.com/apache/beam/pull/11067#discussion_r401240289
 
 

 ##
 File path: sdks/python/container/license_scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,143 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A script to pull licenses for Python.
+The script is executed within Docker.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import tempfile
+import traceback
+import yaml
+
+from future.moves.urllib.request import urlopen
+from tenacity import retry
+from tenacity import stop_after_attempt
+from tenacity import wait_exponential
+
+LICENSE_DIR = '/opt/apache/beam/third_party_licenses'
+
+
+def run_bash_command(command):
+  return subprocess.check_output(command.split()).decode('utf-8')
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name'].lower()
+  dest_dir = '/'.join([LICENSE_DIR, name])
+  try:
+os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+print(
+'Successfully pulled license for {dep} with pip-licenses.'.format(
+dep=name))
+return True
+  except Exception as e:
+print(
+'Failed to copy from {source} to {dest}'.format(
+source=source_license_file, dest=dest_dir + '/LICENSE'))
+traceback.print_exc()
+
+
+@retry(wait=wait_exponential(multiplier=1, min=30, max=180))
+def pull_from_url(dep, configs):
+  '''
+  :param dep: name of a dependency
+  :param configs: a dict from dep_urls_py.yaml
+  :return: boolean
+
+  It downloads files form urls to a temp directory first in order to avoid
+  to deal with any temp files. It helps keep clean final directory.
+  '''
+  if dep in configs:
+config = configs[dep]
+dest_dir = '/'.join([LICENSE_DIR, dep])
+cur_temp_dir = tempfile.mkdtemp()
+
+try:
+  if config['license'] == 'skip':
+print('Skip pulling license for ', dep)
+  else:
+url_read = urlopen(config['license'])
+with open(cur_temp_dir + '/LICENSE', 'wb') as temp_write:
+  shutil.copyfileobj(url_read, temp_write, length=1 << 20)
+print(
+'Successfully pulled license for {dep} from {url}.'.format(
+dep=dep, url=config['license']))
+
+  # notice is optional.
+  if 'notice' in config:
+url_read = urlopen(config['notice'])
+with open(cur_temp_dir + '/NOTICE', 'wb') as temp_write:
+  shutil.copyfileobj(url_read, temp_write, length=1 << 20)
+
+  shutil.copytree(cur_temp_dir, dest_dir)
+  shutil.rmtree(cur_temp_dir)
+  return True
+except Exception as e:
+  shutil.rmtree(cur_temp_dir)
 
 Review comment:
   You can move 
   ```
 shutil.rmtree(cur_temp_dir)
   ```
   to `finally` block.
   Also same comment here - you are catching all exceptions, so tenacity will 
not retry.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413575)
Time Spent: 15h  (was: 14h 50m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: http

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=413577&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413577
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:31
Start Date: 31/Mar/20 22:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies for Python
URL: https://github.com/apache/beam/pull/11067#discussion_r401249019
 
 

 ##
 File path: sdks/python/container/license_scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,143 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+A script to pull licenses for Python.
+The script is executed within Docker.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import tempfile
+import traceback
+import yaml
+
+from future.moves.urllib.request import urlopen
+from tenacity import retry
+from tenacity import stop_after_attempt
+from tenacity import wait_exponential
+
+LICENSE_DIR = '/opt/apache/beam/third_party_licenses'
+
+
+def run_bash_command(command):
+  return subprocess.check_output(command.split()).decode('utf-8')
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name'].lower()
+  dest_dir = '/'.join([LICENSE_DIR, name])
+  try:
+os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+print(
+'Successfully pulled license for {dep} with pip-licenses.'.format(
+dep=name))
+return True
+  except Exception as e:
+print(
+'Failed to copy from {source} to {dest}'.format(
+source=source_license_file, dest=dest_dir + '/LICENSE'))
+traceback.print_exc()
+
+
+@retry(wait=wait_exponential(multiplier=1, min=30, max=180))
+def pull_from_url(dep, configs):
+  '''
+  :param dep: name of a dependency
+  :param configs: a dict from dep_urls_py.yaml
+  :return: boolean
+
+  It downloads files form urls to a temp directory first in order to avoid
+  to deal with any temp files. It helps keep clean final directory.
+  '''
+  if dep in configs:
+config = configs[dep]
+dest_dir = '/'.join([LICENSE_DIR, dep])
+cur_temp_dir = tempfile.mkdtemp()
+
+try:
+  if config['license'] == 'skip':
+print('Skip pulling license for ', dep)
+  else:
+url_read = urlopen(config['license'])
+with open(cur_temp_dir + '/LICENSE', 'wb') as temp_write:
+  shutil.copyfileobj(url_read, temp_write, length=1 << 20)
 
 Review comment:
   FYI length is an optional argument (buffer size). I hope licenses are much 
smaller than 1 MB.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413577)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=413573&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413573
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:31
Start Date: 31/Mar/20 22:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies for Python
URL: https://github.com/apache/beam/pull/11067#discussion_r400642519
 
 

 ##
 File path: sdks/python/container/license_scripts/pull_licenses_py.py
 ##
 @@ -63,11 +61,13 @@ def copy_license_files(dep):
 dep=name))
 return True
   except Exception as e:
-print(e)
-return False
+print(
+'Failed to copy from {source} to {dest}'.format(
+source=source_license_file, dest=dest_dir + '/LICENSE'))
+traceback.print_exc()
 
 
-@retry(stop=stop_after_attempt(3))
+@retry(wait=wait_exponential(multiplier=1, min=30, max=180))
 
 Review comment:
   What would be retry the intervals with this spec?
   I think it's going to be max(2, 30) = 30, max(4, 30) = 30, ..., max(32, 30) 
= 32, then 64, 128, then 180, 180, 180, .. indefinitely.
   
   I would have an initial retry faster, and then increase the delay, and give 
up after a few attempts - perhaps somebody made a typo, so it's better to find 
out sooner than later. 
   
   WDYT about:
   `@retry(reraise=True, wait=wait_exponential(multiplier=2), 
stop=stop_after_attempt(5))`
   This should translate to :  4s, 8s, 16s, 32s -> timeout.
   `reraise` will capture the original stacktrace, instead of throwing 
`RetryError`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413573)
Time Spent: 14h 50m  (was: 14h 40m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=413576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413576
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:31
Start Date: 31/Mar/20 22:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies for Python
URL: https://github.com/apache/beam/pull/11067#discussion_r401244854
 
 

 ##
 File path: sdks/python/container/Dockerfile
 ##
 @@ -51,16 +51,20 @@ RUN ln -s /usr/bin/ccache /usr/local/bin/gcc
 RUN ccache --set-config=sloppiness=file_macro && ccache 
--set-config=hash_dir=false
 
 COPY target/apache-beam.tar.gz /opt/apache/beam/tars/
+ADD target/license_scripts /tmp/license_scripts/
 
 Review comment:
   COPY is preferred over ADD. Can we remove usages of ADD to reduce cognitive 
load on people who might think why are we using sometimes COPY and sometimes 
ADD?
   See: 
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413576)
Time Spent: 15h 10m  (was: 15h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413572
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:30
Start Date: 31/Mar/20 22:30
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11276: [BEAM-7923] 
An indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#discussion_r401251743
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -142,3 +142,61 @@ def obfuscate(*inputs):
   str_inputs = [str(input) for input in inputs]
   merged_inputs = '_'.join(str_inputs)
   return hashlib.md5(merged_inputs.encode('utf-8')).hexdigest()
+
+
+class ProgressIndicator(object):
+  """An indicator visualizing code execution in progress."""
+  spinner_template = """
+https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"; 
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
 crossorigin="anonymous">
+
+"""
+  spinner_removal_template = """
+$("#{id}").remove();"""
+
+  def __init__(self, enter_text, exit_text):
+# type: (str, str) -> None
+
+self._id = 'progress_indicator_{}'.format(obfuscate(id(self)))
 
 Review comment:
   `obfuscate` is defined within the same module.
   It's a way to disguise your backend logic from the frontend, basically a 
hash. It's common to see obfuscation in Javascripts.
   Here we obfuscates the id of a python object in kernel and use it as part of 
the id of a Javascript div.
   Here is some additional reading about obfuscate: 
https://medium.com/nodesimplified/obfuscation-what-is-obfuscation-in-javascript-why-obfuscation-is-used-f6a5f5bcf022
   
   We also make this hexadigest so that the final id is always usable by jQuery 
and plain JavaScript as valid selectors.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413572)
Time Spent: 13h 40m  (was: 13.5h)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9263) Bump python sdk fnapi version to enable status reporting

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9263?focusedWorklogId=413571&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413571
 ]

ASF GitHub Bot logged work on BEAM-9263:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:28
Start Date: 31/Mar/20 22:28
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10870: [BEAM-9263] Bump up 
sdk dataflow environment major versions
URL: https://github.com/apache/beam/pull/10870#issuecomment-606918153
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413571)
Time Spent: 1.5h  (was: 1h 20m)

> Bump python sdk fnapi version to enable status reporting
> 
>
> Key: BEAM-9263
> URL: https://issues.apache.org/jira/browse/BEAM-9263
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Affects Versions: 2.20.0
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Bump python sdk fn api environment version to 8 for roll out the status 
> feature for sdk harness status reporting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9653) remove _AvroSource in favor of using _FastAvroSource

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9653?focusedWorklogId=413566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413566
 ]

ASF GitHub Bot logged work on BEAM-9653:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:26
Start Date: 31/Mar/20 22:26
Worklog Time Spent: 10m 
  Work Description: blcksrx commented on pull request #11279: [BEAM-9653] 
remove _AvroSource in favor of using _FastAvroSource
URL: https://github.com/apache/beam/pull/11279
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_Va

[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413568
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:26
Start Date: 31/Mar/20 22:26
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11276: [BEAM-7923] 
An indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#discussion_r401251743
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -142,3 +142,61 @@ def obfuscate(*inputs):
   str_inputs = [str(input) for input in inputs]
   merged_inputs = '_'.join(str_inputs)
   return hashlib.md5(merged_inputs.encode('utf-8')).hexdigest()
+
+
+class ProgressIndicator(object):
+  """An indicator visualizing code execution in progress."""
+  spinner_template = """
+https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"; 
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
 crossorigin="anonymous">
+
+"""
+  spinner_removal_template = """
+$("#{id}").remove();"""
+
+  def __init__(self, enter_text, exit_text):
+# type: (str, str) -> None
+
+self._id = 'progress_indicator_{}'.format(obfuscate(id(self)))
 
 Review comment:
   `obfuscate` is defined within the same module.
   It's a way to disguise your backend logic from the frontend, basically a 
hash. It's common to see obfuscation in Javascripts.
   Here we obfuscates the id of a python object in kernel and use it as part of 
the id of a Javascript div.
   Here is some additional reading about obfuscate: 
https://medium.com/nodesimplified/obfuscation-what-is-obfuscation-in-javascript-why-obfuscation-is-used-f6a5f5bcf022
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413568)
Time Spent: 13.5h  (was: 13h 20m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9653) remove _AvroSource in favor of using _FastAvroSource

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9653?focusedWorklogId=413567&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413567
 ]

ASF GitHub Bot logged work on BEAM-9653:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:26
Start Date: 31/Mar/20 22:26
Worklog Time Spent: 10m 
  Work Description: blcksrx commented on issue #11279: [BEAM-9653] remove 
_AvroSource in favor of using _FastAvroSource
URL: https://github.com/apache/beam/pull/11279#issuecomment-606917166
 
 
   R: @mxm
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413567)
Time Spent: 20m  (was: 10m)

>  remove _AvroSource in favor of using _FastAvroSource
> -
>
> Key: BEAM-9653
> URL: https://issues.apache.org/jira/browse/BEAM-9653
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-avro
>Affects Versions: 2.19.0
>Reporter: Sayed Mohammad Hossein Torabi
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413563&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413563
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:23
Start Date: 31/Mar/20 22:23
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11276: [BEAM-7923] 
An indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#discussion_r401250464
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -142,3 +142,61 @@ def obfuscate(*inputs):
   str_inputs = [str(input) for input in inputs]
   merged_inputs = '_'.join(str_inputs)
   return hashlib.md5(merged_inputs.encode('utf-8')).hexdigest()
+
+
+class ProgressIndicator(object):
+  """An indicator visualizing code execution in progress."""
+  spinner_template = """
+https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"; 
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
 crossorigin="anonymous">
 
 Review comment:
   Yes, it's just some CSS file. We've been using the same CSS file in the 
IPythonLogHandler too. Bootstrap has been widely known and used, so it's very 
safe.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413563)
Time Spent: 13h 20m  (was: 13h 10m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9653) remove _AvroSource in favor of using _FastAvroSource

2020-03-31 Thread Sayed Mohammad Hossein Torabi (Jira)
Sayed Mohammad Hossein Torabi created BEAM-9653:
---

 Summary:  remove _AvroSource in favor of using _FastAvroSource
 Key: BEAM-9653
 URL: https://issues.apache.org/jira/browse/BEAM-9653
 Project: Beam
  Issue Type: Improvement
  Components: io-py-avro
Affects Versions: 2.19.0
Reporter: Sayed Mohammad Hossein Torabi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9636) Run Dataflow ValidatesRunner (:runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest) failing

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9636?focusedWorklogId=413561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413561
 ]

ASF GitHub Bot logged work on BEAM-9636:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:20
Start Date: 31/Mar/20 22:20
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #11266: [BEAM-9636] Using 
TimerFamily should also be considered as stateful.
URL: https://github.com/apache/beam/pull/11266#issuecomment-606908499
 
 
   cc: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413561)
Time Spent: 2h  (was: 1h 50m)

> Run Dataflow ValidatesRunner 
> (:runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest) failing
> --
>
> Key: BEAM-9636
> URL: https://issues.apache.org/jira/browse/BEAM-9636
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/4719/console
> {noformat}
> 00:27:33 > Task 
> :runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest
> 00:28:48 
> 00:28:48 org.apache.beam.sdk.transforms.ParDoTest$TimerFamilyTests > 
> testTimerWithMultipleTimerFamilyUnbounded FAILED
> 00:28:48 java.lang.RuntimeException at ParDoTest.java:4770
> 00:28:48 Caused by: java.lang.IllegalArgumentException at 
> ParDoTest.java:4770
> 00:42:57 
> 00:42:57 org.apache.beam.sdk.transforms.ParDoTest$TimerFamilyTests > 
> testTimerFamilyEventTimeUnbounded FAILED
> 00:42:57 java.lang.RuntimeException at ParDoTest.java:4708
> 00:42:57 Caused by: java.lang.IllegalArgumentException at 
> ParDoTest.java:4708
> {noformat}
> History: 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413560
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:17
Start Date: 31/Mar/20 22:17
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11276: [BEAM-7923] An 
indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#discussion_r401248341
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -142,3 +142,61 @@ def obfuscate(*inputs):
   str_inputs = [str(input) for input in inputs]
   merged_inputs = '_'.join(str_inputs)
   return hashlib.md5(merged_inputs.encode('utf-8')).hexdigest()
+
+
+class ProgressIndicator(object):
+  """An indicator visualizing code execution in progress."""
+  spinner_template = """
+https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"; 
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
 crossorigin="anonymous">
+
+"""
+  spinner_removal_template = """
+$("#{id}").remove();"""
+
+  def __init__(self, enter_text, exit_text):
+# type: (str, str) -> None
+
+self._id = 'progress_indicator_{}'.format(obfuscate(id(self)))
 
 Review comment:
   what is obfuscate? and why obfuscate the id?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413560)
Time Spent: 13h 10m  (was: 13h)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413559
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 31/Mar/20 22:17
Start Date: 31/Mar/20 22:17
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11276: [BEAM-7923] An 
indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#discussion_r401248323
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/utils.py
 ##
 @@ -142,3 +142,61 @@ def obfuscate(*inputs):
   str_inputs = [str(input) for input in inputs]
   merged_inputs = '_'.join(str_inputs)
   return hashlib.md5(merged_inputs.encode('utf-8')).hexdigest()
+
+
+class ProgressIndicator(object):
+  """An indicator visualizing code execution in progress."""
+  spinner_template = """
+https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"; 
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
 crossorigin="anonymous">
 
 Review comment:
   Is it fine to link to a 3rd party domain here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413559)
Time Spent: 13h  (was: 12h 50m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9652) BigQueryIO MultiPartitionsWriteTables fails with ClassCastException: java.lang.Object cannot be cast to org.apache.beam.sdk.values.KV

2020-03-31 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-9652:
---

 Summary: BigQueryIO MultiPartitionsWriteTables fails with 
ClassCastException: java.lang.Object cannot be cast to 
org.apache.beam.sdk.values.KV
 Key: BEAM-9652
 URL: https://issues.apache.org/jira/browse/BEAM-9652
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.19.0
Reporter: Luke Cwik
Assignee: Luke Cwik
 Fix For: 2.21.0


It looks like the coder inference fails for BatchLoad.writeTempTables and 
selects an avro coder:

{code:java}
object_value: <
  type: "org.apache.beam.sdk.coders.AvroCoder"
  parameters: <
name: "type"
value: <
  string_value: "java.lang.Object"
>
  >
  parameters: <
name: "schema"
value: <
  string_value: 
"{\"type\":\"record\",\"name\":\"Object\",\"namespace\":\"java.lang\",\"fields\":[]}"
>
  >
{code}


Full exception:

{code:java}
exception: "java.lang.ClassCastException: java.lang.Object cannot be cast to 
org.apache.beam.sdk.values.KV at 
org.apache.beam.sdk.coders.KvCoder.registerByteSizeObserver(KvCoder.java:36) at 
org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:191)
 at 
org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:60)
 at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:623)
 at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:539)
 at 
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-03-31 Thread Sam Whittle (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072207#comment-17072207
 ] 

Sam Whittle commented on BEAM-9651:
---

This is different from the other issue because we are not making the call from 
a grpc thread.  The executor for grpc client callbacks is unlimited so those 
blocking doesn't explain why the rpc never becomes ready.
The rpc may be unready because the channel is either not accepting or has too 
much data queued.  It is unclear why that happened but we should timeout when 
waiting (to stream deadline for example) instead of blocking forever.

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413535
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 31/Mar/20 21:45
Start Date: 31/Mar/20 21:45
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-606893670
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413535)
Remaining Estimate: 147h  (was: 147h 10m)
Time Spent: 21h  (was: 20h 50m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 21h
>  Remaining Estimate: 147h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=413527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413527
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 31/Mar/20 21:26
Start Date: 31/Mar/20 21:26
Worklog Time Spent: 10m 
  Work Description: jaketf commented on issue #11151: [BEAM-9468]  Hl7v2 io
URL: https://github.com/apache/beam/pull/11151#issuecomment-603980265
 
 
   Open Questions:
   1. Should we remove adaptive throttling?
   - Seems that we're using retries in the client request initializer and 
right now a "bad record" will slow down the Read / Write (even though the error 
has nothing to do with the HL7v2 store being overwhelmed). Originally we wanted 
to be safe with overwhelming QPS on the HL7v2 store in batch scenarios.
   1. Should we add more to the `HealthcareIOError`?
   - Add (processing time) Timestamp?
   - Add a convenience DoFn `HealthcareIOErrrorToTableRowFn` to ease 
writing deadletter queue to BigQuery.
   1. Would it be more useful to expose an error rate metric than an error 
count?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413527)
Time Spent: 8h 20m  (was: 8h 10m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=413526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413526
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 31/Mar/20 21:25
Start Date: 31/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: jaketf commented on issue #11151: [BEAM-9468]  Hl7v2 io
URL: https://github.com/apache/beam/pull/11151#issuecomment-604655037
 
 
   Ok an updates here from an internal thread w/ API team.
   
   1. [Message.List returning message contents is available in beta API with 
the view parameter.
   1. Schematized Data should be in next beta release roughly in ~2 weeks.
   1. right now the sink is outputting schematized data json wrapped in 
"{data=}" 
   
   In light of these I will do the following refactors: 
   1. [x] how we batch read from to always avoid the double get. This will make 
it a completely parallel code path than the real-time path but I think that's 
ok.
   1. [x] refactor to use beta client library (once it includes schematizedData)
   - FYI pre-work for v1beta1 migration in this branch 
[migration/v1beta1](https://github.com/jaketf/beam/tree/migration/v1beta1). 
Will merge back to this branch once we have schematizedData in beta client 
library.
   1. [x] I'll strip out that `{data=}` wrapper to make this easier for users.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413526)
Time Spent: 8h 10m  (was: 8h)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9340) Properly populate pipeline proto requirements.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9340?focusedWorklogId=413516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413516
 ]

ASF GitHub Bot logged work on BEAM-9340:


Author: ASF GitHub Bot
Created on: 31/Mar/20 21:04
Start Date: 31/Mar/20 21:04
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11277: [BEAM-9340] 
Populate requirement for timer families.
URL: https://github.com/apache/beam/pull/11277#issuecomment-606874607
 
 
   CC: @boyuanzz 
   
   This looks like a version that covers more then #11266
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413516)
Time Spent: 5h 20m  (was: 5h 10m)

> Properly populate pipeline proto requirements.
> --
>
> Key: BEAM-9340
> URL: https://issues.apache.org/jira/browse/BEAM-9340
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-go, sdk-java-core, sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=413515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413515
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 31/Mar/20 21:00
Start Date: 31/Mar/20 21:00
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #11203: [BEAM-9577] Define 
and implement dependency-aware artifact staging service.
URL: https://github.com/apache/beam/pull/11203#issuecomment-606872418
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413515)
Time Spent: 7h 10m  (was: 7h)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413510&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413510
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:52
Start Date: 31/Mar/20 20:52
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #11276: [BEAM-7923] An 
indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276#issuecomment-606868917
 
 
   `yapf` formatted.
   `lint` passed locally.
   
   R: @aaltay 
   R: @davidyan74 
   R: @rohdesamuel 
   
   PTAL, thx!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413510)
Time Spent: 12h 50m  (was: 12h 40m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9340) Properly populate pipeline proto requirements.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9340?focusedWorklogId=413511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413511
 ]

ASF GitHub Bot logged work on BEAM-9340:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:52
Start Date: 31/Mar/20 20:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #11277: [BEAM-9340] 
Populate requirement for timer families.
URL: https://github.com/apache/beam/pull/11277#issuecomment-606869063
 
 
   We should consider consolidating timers and timer families at a lower level, 
at lower priority than getting the protos in shape. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413511)
Time Spent: 5h 10m  (was: 5h)

> Properly populate pipeline proto requirements.
> --
>
> Key: BEAM-9340
> URL: https://issues.apache.org/jira/browse/BEAM-9340
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-go, sdk-java-core, sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9340) Properly populate pipeline proto requirements.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9340?focusedWorklogId=413509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413509
 ]

ASF GitHub Bot logged work on BEAM-9340:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:50
Start Date: 31/Mar/20 20:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11277: [BEAM-9340] 
Populate requirement for timer families.
URL: https://github.com/apache/beam/pull/11277
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![

[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=413506&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413506
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:44
Start Date: 31/Mar/20 20:44
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11254: 
[BEAM-7961] Refactors X-Lang test pipelines.
URL: https://github.com/apache/beam/pull/11254
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413506)
Time Spent: 24h 40m  (was: 24.5h)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 24h 40m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=413505&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413505
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:43
Start Date: 31/Mar/20 20:43
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #11254: [BEAM-7961] 
Refactors X-Lang test pipelines.
URL: https://github.com/apache/beam/pull/11254#issuecomment-606864502
 
 
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413505)
Time Spent: 24.5h  (was: 24h 20m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 24.5h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=413503&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413503
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:31
Start Date: 31/Mar/20 20:31
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11276: [BEAM-7923] 
An indicator of progress in notebooks
URL: https://github.com/apache/beam/pull/11276
 
 
   1. The problem: when an intended blocking call such as `show`, `collect`
  or `head` is invoked, the user sometimes doesn't realize the code is
  still running and blocking. Also, the `*` prompts in notebooks are
  easily ignored and sometimes buggy. We need an obvious progress
  indicator to tell the user that the data/pipeline is being processed
  and potentially expose more metrics about running pipelines through
  the mechanism.
   2. Added a ProgressIndicator class that functions as a context manager
  to display/remove a spinner on entering/exiting a blocking code
  execution.
   3. Added a progress_indicated decorator to make any callable adapt the
  ProgressIndicator.
   4. When in notebooks, a spinner is displayed. When in ipython but
  without frontends, texts will be displayed on entering/exiting.
  Otherwise, NOOP.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRu

[jira] [Resolved] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-31 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-9325.
-
Resolution: Fixed

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Assignee: Kyoungha Min
>Priority: Major
> Fix For: 2.21.0
>
>   Original Estimate: 1m
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?focusedWorklogId=413493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413493
 ]

ASF GitHub Bot logged work on BEAM-9325:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:07
Start Date: 31/Mar/20 20:07
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11263: [BEAM-9325] 
Override proper write method in UnownedOutputStream
URL: https://github.com/apache/beam/pull/11263
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413493)
Time Spent: 4h 20m  (was: 4h 10m)

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Assignee: Kyoungha Min
>Priority: Major
> Fix For: 2.21.0
>
>   Original Estimate: 1m
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=413491&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413491
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:04
Start Date: 31/Mar/20 20:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #11203: [BEAM-9577] Define 
and implement dependency-aware artifact staging service.
URL: https://github.com/apache/beam/pull/11203#issuecomment-606845984
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413491)
Time Spent: 7h  (was: 6h 50m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=413490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413490
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 31/Mar/20 20:04
Start Date: 31/Mar/20 20:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #11203: [BEAM-9577] Define 
and implement dependency-aware artifact staging service.
URL: https://github.com/apache/beam/pull/11203#issuecomment-606845864
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413490)
Time Spent: 6h 50m  (was: 6h 40m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9399) Possible deadlock between DataflowWorkerLoggingHandler and overridden System.err PrintStream

2020-03-31 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-9399.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Possible deadlock between DataflowWorkerLoggingHandler and overridden 
> System.err PrintStream
> 
>
> Key: BEAM-9399
> URL: https://issues.apache.org/jira/browse/BEAM-9399
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When an exception is encountered in DataflowWorkerLoggingHandler the 
> ErrorManager is used to log the exception.  ErrorManager uses System.err 
> which is overridden to be a PrintStream that writes back into 
> DataflowWorkerLoggingHandler.
> This has the lock ordering DataflowWorkerLoggingHandler -> PrintStream.
> Other logging of System.err has the inverse lock ordering 
> PrintStream->DataflowWorkerLoggingHandler so there is potential for deadlock.
> This is one known cause of the inversion, but any other System.err logs from 
> inside DataflowWorkerLoggingHandler could cause the same issue.
> Proposed fix is to address low-hanging fruit of having ErrorManager output to 
> the original System.err.  A full fix would be to improve our override of 
> System.err to a PrintStream that can detect the locking inversion or possibly 
> we could use the PrintStream mutex in both cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9399) Possible deadlock between DataflowWorkerLoggingHandler and overridden System.err PrintStream

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9399?focusedWorklogId=413484&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413484
 ]

ASF GitHub Bot logged work on BEAM-9399:


Author: ASF GitHub Bot
Created on: 31/Mar/20 19:53
Start Date: 31/Mar/20 19:53
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11096: [BEAM-9399] 
Change the redirection of System.err to be a custom PrintStream
URL: https://github.com/apache/beam/pull/11096
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413484)
Time Spent: 5.5h  (was: 5h 20m)

> Possible deadlock between DataflowWorkerLoggingHandler and overridden 
> System.err PrintStream
> 
>
> Key: BEAM-9399
> URL: https://issues.apache.org/jira/browse/BEAM-9399
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When an exception is encountered in DataflowWorkerLoggingHandler the 
> ErrorManager is used to log the exception.  ErrorManager uses System.err 
> which is overridden to be a PrintStream that writes back into 
> DataflowWorkerLoggingHandler.
> This has the lock ordering DataflowWorkerLoggingHandler -> PrintStream.
> Other logging of System.err has the inverse lock ordering 
> PrintStream->DataflowWorkerLoggingHandler so there is potential for deadlock.
> This is one known cause of the inversion, but any other System.err logs from 
> inside DataflowWorkerLoggingHandler could cause the same issue.
> Proposed fix is to address low-hanging fruit of having ErrorManager output to 
> the original System.err.  A full fix would be to improve our override of 
> System.err to a PrintStream that can detect the locking inversion or possibly 
> we could use the PrintStream mutex in both cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9399) Possible deadlock between DataflowWorkerLoggingHandler and overridden System.err PrintStream

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9399?focusedWorklogId=413483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413483
 ]

ASF GitHub Bot logged work on BEAM-9399:


Author: ASF GitHub Bot
Created on: 31/Mar/20 19:52
Start Date: 31/Mar/20 19:52
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11096: [BEAM-9399] Change 
the redirection of System.err to be a custom PrintStream
URL: https://github.com/apache/beam/pull/11096#issuecomment-606839742
 
 
   https://builds.apache.org/job/beam_PreCommit_Java_Commit/10616/ passed, 
merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413483)
Time Spent: 5h 20m  (was: 5h 10m)

> Possible deadlock between DataflowWorkerLoggingHandler and overridden 
> System.err PrintStream
> 
>
> Key: BEAM-9399
> URL: https://issues.apache.org/jira/browse/BEAM-9399
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> When an exception is encountered in DataflowWorkerLoggingHandler the 
> ErrorManager is used to log the exception.  ErrorManager uses System.err 
> which is overridden to be a PrintStream that writes back into 
> DataflowWorkerLoggingHandler.
> This has the lock ordering DataflowWorkerLoggingHandler -> PrintStream.
> Other logging of System.err has the inverse lock ordering 
> PrintStream->DataflowWorkerLoggingHandler so there is potential for deadlock.
> This is one known cause of the inversion, but any other System.err logs from 
> inside DataflowWorkerLoggingHandler could cause the same issue.
> Proposed fix is to address low-hanging fruit of having ErrorManager output to 
> the original System.err.  A full fix would be to improve our override of 
> System.err to a PrintStream that can detect the locking inversion or possibly 
> we could use the PrintStream mutex in both cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-03-31 Thread Sam Whittle (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072114#comment-17072114
 ] 

Sam Whittle commented on BEAM-9651:
---

This looks similar to BEAM-4280 regarding the fnapi DirectObserver class which 
now times out if the phase is not reached to avoid deadlock.

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7699) Remove duplicated null checks in Row.equals

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7699?focusedWorklogId=413478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413478
 ]

ASF GitHub Bot logged work on BEAM-7699:


Author: ASF GitHub Bot
Created on: 31/Mar/20 19:41
Start Date: 31/Mar/20 19:41
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #9009: [BEAM-7699] 
Propagate null check/comparison to deepEquals
URL: https://github.com/apache/beam/pull/9009#issuecomment-606833632
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413478)
Time Spent: 2h 50m  (was: 2h 40m)

> Remove duplicated null checks in Row.equals
> ---
>
> Key: BEAM-7699
> URL: https://issues.apache.org/jira/browse/BEAM-7699
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Naheon Kim
>Assignee: Nicholas Got
>Priority: Trivial
>  Labels: easyfix, starter
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> After [BEAM-5866|https://issues.apache.org/jira/browse/BEAM-5866] and 
> [BEAM-7125|https://issues.apache.org/jira/browse/BEAM-7125] Row has 
> duplicated null value checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7699) Remove duplicated null checks in Row.equals

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7699?focusedWorklogId=413479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413479
 ]

ASF GitHub Bot logged work on BEAM-7699:


Author: ASF GitHub Bot
Created on: 31/Mar/20 19:41
Start Date: 31/Mar/20 19:41
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #9009: [BEAM-7699] 
Propagate null check/comparison to deepEquals
URL: https://github.com/apache/beam/pull/9009
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413479)
Time Spent: 3h  (was: 2h 50m)

> Remove duplicated null checks in Row.equals
> ---
>
> Key: BEAM-7699
> URL: https://issues.apache.org/jira/browse/BEAM-7699
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Naheon Kim
>Assignee: Nicholas Got
>Priority: Trivial
>  Labels: easyfix, starter
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> After [BEAM-5866|https://issues.apache.org/jira/browse/BEAM-5866] and 
> [BEAM-7125|https://issues.apache.org/jira/browse/BEAM-7125] Row has 
> duplicated null value checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-03-31 Thread Sam Whittle (Jira)
Sam Whittle created BEAM-9651:
-

 Summary: StreamingDataflowWorker stuck waiting for 
org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
 Key: BEAM-9651
 URL: https://issues.apache.org/jira/browse/BEAM-9651
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Sam Whittle
Assignee: Sam Whittle


Operation ongoing in step  for at least 28h10m00s without outputting 
or completing in state windmill-read at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
 at 
org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
 at 
org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
 at 
org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
 at 
org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
 at 
org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
 at 
org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
 Source) at 
org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
 at 
org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
 at 
org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
 at 
org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
 at 
org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
 at 
org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
 at 
org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
 at


Because the stream is started in a StreamPool synchronized block, all other 
threads interacting with StreamPool to get or release streams end up blocking.

It is unclear if the stream never became usable and thus blocked forever or if 
there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor

2020-03-31 Thread Maximilian Michels (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072103#comment-17072103
 ] 

Maximilian Michels commented on BEAM-8944:
--

Essentially, yes. Especially, this is a concern for latency because Flink has 
to hold back in-flight elements while performing the checkpoint alignment of 
the operators. It appears the alignment is off due to the Python bundles not 
completing in time. Not sure why that is the case. We use the load-balancing 
feature of DefaultJobBundleFactory where we use 16 Python environment and 
round-robin them.

> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---
>
> Key: BEAM-8944
> URL: https://issues.apache.org/jira/browse/BEAM-8944
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.18.0
>Reporter: Yichi Zhang
>Priority: Critical
> Fix For: 2.18.0
>
> Attachments: profiling.png, profiling_one_thread.png, 
> profiling_twelve_threads.png
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load 
> tests.
>  
> After some investigation, it appears to be caused by swapping the original 
> ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is 
> that python performance is worse with more threads on cpu-bounded tasks.
>  
> A simple test for comparing the multiple thread pool executor performance:
>  
> {code:python}
> def test_performance(self):
>    def run_perf(executor):
>      total_number = 100
>      q = queue.Queue()
>     def task(number):
>        hash(number)
>        q.put(number + 200)
>        return number
>     t = time.time()
>      count = 0
>      for i in range(200):
>        q.put(i)
>     while count < total_number:
>        executor.submit(task, q.get(block=True))
>        count += 1
>      print('%s uses %s' % (executor, time.time() - t))
>    with UnboundedThreadPoolExecutor() as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=1) as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=12) as executor:
>      run_perf(executor)
> {code}
> Results:
>  0x7fab400dbe50> uses 268.160675049
>   uses 
> 79.904583931
>   uses 
> 191.179054976
>  ```
> Profiling:
> UnboundedThreadPoolExecutor:
>  !profiling.png! 
> 1 Thread ThreadPoolExecutor:
>  !profiling_one_thread.png! 
> 12 Threads ThreadPoolExecutor:
>  !profiling_twelve_threads.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?focusedWorklogId=413468&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413468
 ]

ASF GitHub Bot logged work on BEAM-9325:


Author: ASF GitHub Bot
Created on: 31/Mar/20 19:18
Start Date: 31/Mar/20 19:18
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #11263: [BEAM-9325] Override 
proper write method in UnownedOutputStream
URL: https://github.com/apache/beam/pull/11263#issuecomment-606822025
 
 
   > I wish I found it in a fancier way, but I just found it by luck.
   
   Curious this looks like something that can be matched 'easily' by static 
analyzers.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413468)
Time Spent: 4h 10m  (was: 4h)

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Assignee: Kyoungha Min
>Priority: Major
> Fix For: 2.21.0
>
>   Original Estimate: 1m
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?focusedWorklogId=413466&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413466
 ]

ASF GitHub Bot logged work on BEAM-9325:


Author: ASF GitHub Bot
Created on: 31/Mar/20 19:17
Start Date: 31/Mar/20 19:17
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #11263: [BEAM-9325] Override 
proper write method in UnownedOutputStream
URL: https://github.com/apache/beam/pull/11263#issuecomment-606821403
 
 
   Mmm seems tests are not running on this one, weird.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413466)
Time Spent: 3h 50m  (was: 3h 40m)

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Assignee: Kyoungha Min
>Priority: Major
> Fix For: 2.21.0
>
>   Original Estimate: 1m
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?focusedWorklogId=413467&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413467
 ]

ASF GitHub Bot logged work on BEAM-9325:


Author: ASF GitHub Bot
Created on: 31/Mar/20 19:17
Start Date: 31/Mar/20 19:17
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #11263: [BEAM-9325] Override 
proper write method in UnownedOutputStream
URL: https://github.com/apache/beam/pull/11263#issuecomment-606821653
 
 
   Now they are!  Time to wait to and then merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413467)
Time Spent: 4h  (was: 3h 50m)

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Assignee: Kyoungha Min
>Priority: Major
> Fix For: 2.21.0
>
>   Original Estimate: 1m
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9650) Add consistent slowly changing side inputs support

2020-03-31 Thread Mikhail Gryzykhin (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Gryzykhin reassigned BEAM-9650:
---

Assignee: Mikhail Gryzykhin

> Add consistent slowly changing side inputs support
> --
>
> Key: BEAM-9650
> URL: https://issues.apache.org/jira/browse/BEAM-9650
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>
> Add implementation for slowly changing dimentions based on [design 
> doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9650) Add consistent slowly changing side inputs support

2020-03-31 Thread Mikhail Gryzykhin (Jira)
Mikhail Gryzykhin created BEAM-9650:
---

 Summary: Add consistent slowly changing side inputs support
 Key: BEAM-9650
 URL: https://issues.apache.org/jira/browse/BEAM-9650
 Project: Beam
  Issue Type: Bug
  Components: io-ideas
Reporter: Mikhail Gryzykhin


Add implementation for slowly changing dimentions based on [design 
doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5411) Separate BeamUnnest and BeamCalc

2020-03-31 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud reassigned BEAM-5411:


Assignee: (was: Andrew Pilloud)

> Separate BeamUnnest and BeamCalc
> 
>
> Key: BEAM-5411
> URL: https://issues.apache.org/jira/browse/BEAM-5411
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently Correlated Uncollect (BeamUnnest) embeds a fork of BeamCalc. This 
> isn't actually needed, simplifying this node enables easier replacement of 
> Calc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9514) AssertionError type mismatch from SUM

2020-03-31 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9514 started by Andrew Pilloud.

> AssertionError type mismatch from SUM
> -
>
> Key: BEAM-9514
> URL: https://issues.apache.org/jira/browse/BEAM-9514
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Labels: zetasql-compliance
>
> {code:java}
> Mar 16, 2020 12:59:49 PM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: select sum(distinct_4) from TableDistincts
> group by distinct_2
> having false
> Exception in thread "pool-1-thread-1" java.lang.AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(BIGINT distinct_2, BIGINT $col1) NOT NULL
> rowtype of set:
> RecordType(BIGINT distinct_2, BIGINT NOT NULL $col1) NOT NULL
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1984)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:284)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:148)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1806)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoPlanner.reregister(VolcanoPlanner.java:1480)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.RelSet.mergeWith(RelSet.java:331)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoPlanner.merge(VolcanoPlanner.java:1571)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:863)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1927)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:129)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.AggregateRemoveRule.onMatch(AggregateRemoveRule.java:126)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:631)
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:328)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:180)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:150)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:115)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:242)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8362) Don't use ZetaSQL's unimplemented functions

2020-03-31 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud reassigned BEAM-8362:


Assignee: (was: Andrew Pilloud)

> Don't use ZetaSQL's unimplemented functions
> ---
>
> Key: BEAM-8362
> URL: https://issues.apache.org/jira/browse/BEAM-8362
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Affects Versions: 2.15.0
>Reporter: Andrew Pilloud
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Unfortunately a bunch of debug functionality is still unimplemented in 
> ZetaSQL. We should avoid calling those functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2020-03-31 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud reassigned BEAM-8896:


Assignee: (was: Andrew Pilloud)

> WITH query AS + SELECT query JOIN other throws invalid type
> ---
>
> Key: BEAM-8896
> URL: https://issues.apache.org/jira/browse/BEAM-8896
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.16.0
>Reporter: fdiazgon
>Priority: Major
>
> The first one of the three following queries fails, despite queries being 
> equivalent:
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schemaA =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.BYTES),
> Schema.Field.of("fA1", Schema.FieldType.STRING));
> Schema schemaB =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.STRING),
> Schema.Field.of("fB1", Schema.FieldType.STRING));
> PCollection inputA =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));
> PCollection inputB =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));
> // Fails
> String query1 =
> "WITH query AS "
> + "( "
> + " SELECT id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";
> // Ok
> String query2 =
> "WITH query AS "
> + "( "
> + " SELECT fA1, fB1, fA1 AS fA1_2 "
> + " FROM tblA "
> + " JOIN tblB "
> + " ON (TO_HEX(tblA.id) = tblB.id) "
> + ")"
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query ";
> // Ok
> String query3 =
> "WITH query AS "
> + "( "
> + " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (query.id = tblB.id)";
> Schema transform3 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query3))
> .getSchema();
> System.out.println(transform3);
> Schema transform2 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query2))
> .getSchema();
> System.out.println(transform2);
> Schema transform1 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query1))
> .getSchema();
> System.out.println(transform1);
> {code}
>  
> The error is:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is 
> invalid for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread 
> "main" java.lang.AssertionError: Field ordinal 2 is invalid for  type 
> 'RecordType(VARBINARY id, VARCHAR fA1)' at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
>  
> If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), all 
> queries work fine. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8401) Upgrade to ZetaSQL 2019.10.1

2020-03-31 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud resolved BEAM-8401.
--
Fix Version/s: Not applicable
   Resolution: Fixed

> Upgrade to ZetaSQL 2019.10.1
> 
>
> Key: BEAM-8401
> URL: https://issues.apache.org/jira/browse/BEAM-8401
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ZetaSQL 2019.10.1 will be available in maven central momentarily. We should 
> upgrade to it. The new version fixes a circular dependency issue that breaks 
> maven and a threading issue that may result in a shutdown hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=413433&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413433
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 31/Mar/20 18:18
Start Date: 31/Mar/20 18:18
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10717: [BEAM-8280] 
Enable type hint annotations
URL: https://github.com/apache/beam/pull/10717#discussion_r401118829
 
 

 ##
 File path: sdks/python/apache_beam/typehints/decorators.py
 ##
 @@ -138,8 +139,7 @@ def foo((a, b)):
 
 _ANY_VAR_POSITIONAL = typehints.Tuple[typehints.Any, ...]
 _ANY_VAR_KEYWORD = typehints.Dict[typehints.Any, typehints.Any]
-# TODO(BEAM-8280): Remove this when from_callable is ready to be enabled.
-_enable_from_callable = False
+_disable_from_callable = False
 
 Review comment:
   No, I inverted the logic and made using type annotations on by default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413433)
Time Spent: 12h  (was: 11h 50m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=413428&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413428
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 31/Mar/20 18:16
Start Date: 31/Mar/20 18:16
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10717: [BEAM-8280] 
Enable type hint annotations
URL: https://github.com/apache/beam/pull/10717#discussion_r401117161
 
 

 ##
 File path: sdks/python/apache_beam/typehints/decorators.py
 ##
 @@ -138,8 +139,7 @@ def foo((a, b)):
 
 _ANY_VAR_POSITIONAL = typehints.Tuple[typehints.Any, ...]
 _ANY_VAR_KEYWORD = typehints.Dict[typehints.Any, typehints.Any]
-# TODO(BEAM-8280): Remove this when from_callable is ready to be enabled.
-_enable_from_callable = False
+_disable_from_callable = False
 
 Review comment:
   Should this be True? (based on the check in line 234)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413428)
Time Spent: 11h 50m  (was: 11h 40m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=413422&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413422
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 31/Mar/20 18:11
Start Date: 31/Mar/20 18:11
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11246: [BEAM-9136]Add 
licenses for dependencies for Go
URL: https://github.com/apache/beam/pull/11246#discussion_r401114775
 
 

 ##
 File path: sdks/go/container/license_script.sh
 ##
 @@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+output_dir=third_party_licenses
+# remove output_dir if existing
+if [ -d "$output_dir" ]; then rm -rf $output_dir; fi
+
+# get go-licenses and run
+go get github.com/google/go-licenses
+$GOPATH/bin/go-licenses save "github.com/apache/beam/sdks/go/pkg/beam/" 
--save_path="$output_dir"
 
 Review comment:
   Sounds good. 
   
   @Hannah-Jiang - could we remove Go containers from Beam release instructions 
and add a link to JIRA that needs to be resolved before starting that process 
again?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413422)
Time Spent: 14h 40m  (was: 14.5h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9399) Possible deadlock between DataflowWorkerLoggingHandler and overridden System.err PrintStream

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9399?focusedWorklogId=413417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413417
 ]

ASF GitHub Bot logged work on BEAM-9399:


Author: ASF GitHub Bot
Created on: 31/Mar/20 18:05
Start Date: 31/Mar/20 18:05
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11096: [BEAM-9399] Change 
the redirection of System.err to be a custom PrintStream
URL: https://github.com/apache/beam/pull/11096#issuecomment-606785621
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413417)
Time Spent: 5h 10m  (was: 5h)

> Possible deadlock between DataflowWorkerLoggingHandler and overridden 
> System.err PrintStream
> 
>
> Key: BEAM-9399
> URL: https://issues.apache.org/jira/browse/BEAM-9399
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When an exception is encountered in DataflowWorkerLoggingHandler the 
> ErrorManager is used to log the exception.  ErrorManager uses System.err 
> which is overridden to be a PrintStream that writes back into 
> DataflowWorkerLoggingHandler.
> This has the lock ordering DataflowWorkerLoggingHandler -> PrintStream.
> Other logging of System.err has the inverse lock ordering 
> PrintStream->DataflowWorkerLoggingHandler so there is potential for deadlock.
> This is one known cause of the inversion, but any other System.err logs from 
> inside DataflowWorkerLoggingHandler could cause the same issue.
> Proposed fix is to address low-hanging fruit of having ErrorManager output to 
> the original System.err.  A full fix would be to improve our override of 
> System.err to a PrintStream that can detect the locking inversion or possibly 
> we could use the PrintStream mutex in both cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9622) Support for consuming tagged PCollections in Python SqlTransform

2020-03-31 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud updated BEAM-9622:
-
Status: Open  (was: Triage Needed)

> Support for consuming tagged PCollections in Python SqlTransform
> 
>
> Key: BEAM-9622
> URL: https://issues.apache.org/jira/browse/BEAM-9622
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9623) Add support for TableProviders in Python SqlTransform

2020-03-31 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud updated BEAM-9623:
-
Status: Open  (was: Triage Needed)

> Add support for TableProviders in Python SqlTransform
> -
>
> Key: BEAM-9623
> URL: https://issues.apache.org/jira/browse/BEAM-9623
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> It should be possible to use e.g. DataCatalogTableProvider and access 
> BigQuery, PubSub, and GCS in queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=413412&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413412
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 31/Mar/20 17:53
Start Date: 31/Mar/20 17:53
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11246: 
[BEAM-9136]Add licenses for dependencies for Go
URL: https://github.com/apache/beam/pull/11246#discussion_r401103831
 
 

 ##
 File path: sdks/go/container/license_script.sh
 ##
 @@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+output_dir=third_party_licenses
+# remove output_dir if existing
+if [ -d "$output_dir" ]; then rm -rf $output_dir; fi
+
+# get go-licenses and run
+go get github.com/google/go-licenses
+$GOPATH/bin/go-licenses save "github.com/apache/beam/sdks/go/pkg/beam/" 
--save_path="$output_dir"
 
 Review comment:
   The instructions currently indicate users should build their own. The Go SDK 
Jenkins runs only use their own built images so testing will continue to work. 
The code doesn't currently make reasonable default choices for built images.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413412)
Time Spent: 14.5h  (was: 14h 20m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9641) Support ZetaSQL DATE functions in BeamSQL

2020-03-31 Thread Yueyang Qiu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu updated BEAM-9641:
--
Labels: zetasql-compliance  (was: )

> Support ZetaSQL DATE functions in BeamSQL
> -
>
> Key: BEAM-9641
> URL: https://issues.apache.org/jira/browse/BEAM-9641
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Labels: zetasql-compliance
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9641) Support ZetaSQL DATE functions in BeamSQL

2020-03-31 Thread Yueyang Qiu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9641 started by Yueyang Qiu.
-
> Support ZetaSQL DATE functions in BeamSQL
> -
>
> Key: BEAM-9641
> URL: https://issues.apache.org/jira/browse/BEAM-9641
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Labels: zetasql-compliance
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=413400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413400
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 31/Mar/20 17:37
Start Date: 31/Mar/20 17:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11183: [BEAM-8889]add 
experiment flag use_grpc_for_gcs
URL: https://github.com/apache/beam/pull/11183#issuecomment-606771191
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413400)
Remaining Estimate: 147h 10m  (was: 147h 20m)
Time Spent: 20h 50m  (was: 20h 40m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 20h 50m
>  Remaining Estimate: 147h 10m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor

2020-03-31 Thread Yichi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071998#comment-17071998
 ] 

Yichi Zhang commented on BEAM-8944:
---

[~mxm] the checkpoint duration is lower means that without 
UnboundedThreadPoolExecutor FlinkRunner is faster? 

> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---
>
> Key: BEAM-8944
> URL: https://issues.apache.org/jira/browse/BEAM-8944
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.18.0
>Reporter: Yichi Zhang
>Priority: Critical
> Fix For: 2.18.0
>
> Attachments: profiling.png, profiling_one_thread.png, 
> profiling_twelve_threads.png
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load 
> tests.
>  
> After some investigation, it appears to be caused by swapping the original 
> ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is 
> that python performance is worse with more threads on cpu-bounded tasks.
>  
> A simple test for comparing the multiple thread pool executor performance:
>  
> {code:python}
> def test_performance(self):
>    def run_perf(executor):
>      total_number = 100
>      q = queue.Queue()
>     def task(number):
>        hash(number)
>        q.put(number + 200)
>        return number
>     t = time.time()
>      count = 0
>      for i in range(200):
>        q.put(i)
>     while count < total_number:
>        executor.submit(task, q.get(block=True))
>        count += 1
>      print('%s uses %s' % (executor, time.time() - t))
>    with UnboundedThreadPoolExecutor() as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=1) as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=12) as executor:
>      run_perf(executor)
> {code}
> Results:
>  0x7fab400dbe50> uses 268.160675049
>   uses 
> 79.904583931
>   uses 
> 191.179054976
>  ```
> Profiling:
> UnboundedThreadPoolExecutor:
>  !profiling.png! 
> 1 Thread ThreadPoolExecutor:
>  !profiling_one_thread.png! 
> 12 Threads ThreadPoolExecutor:
>  !profiling_twelve_threads.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?focusedWorklogId=413393&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413393
 ]

ASF GitHub Bot logged work on BEAM-9325:


Author: ASF GitHub Bot
Created on: 31/Mar/20 17:17
Start Date: 31/Mar/20 17:17
Worklog Time Spent: 10m 
  Work Description: lukemin89 commented on issue #11263: [BEAM-9325] 
Override proper write method in UnownedOutputStream
URL: https://github.com/apache/beam/pull/11263#issuecomment-606756553
 
 
   I wish I found it in a fancier way, but I just found it by luck.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413393)
Time Spent: 3h 40m  (was: 3.5h)

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Assignee: Kyoungha Min
>Priority: Major
> Fix For: 2.21.0
>
>   Original Estimate: 1m
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9399) Possible deadlock between DataflowWorkerLoggingHandler and overridden System.err PrintStream

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9399?focusedWorklogId=413389&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413389
 ]

ASF GitHub Bot logged work on BEAM-9399:


Author: ASF GitHub Bot
Created on: 31/Mar/20 17:10
Start Date: 31/Mar/20 17:10
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11096: [BEAM-9399] Change 
the redirection of System.err to be a custom PrintStream
URL: https://github.com/apache/beam/pull/11096#issuecomment-606757069
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413389)
Time Spent: 5h  (was: 4h 50m)

> Possible deadlock between DataflowWorkerLoggingHandler and overridden 
> System.err PrintStream
> 
>
> Key: BEAM-9399
> URL: https://issues.apache.org/jira/browse/BEAM-9399
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When an exception is encountered in DataflowWorkerLoggingHandler the 
> ErrorManager is used to log the exception.  ErrorManager uses System.err 
> which is overridden to be a PrintStream that writes back into 
> DataflowWorkerLoggingHandler.
> This has the lock ordering DataflowWorkerLoggingHandler -> PrintStream.
> Other logging of System.err has the inverse lock ordering 
> PrintStream->DataflowWorkerLoggingHandler so there is potential for deadlock.
> This is one known cause of the inversion, but any other System.err logs from 
> inside DataflowWorkerLoggingHandler could cause the same issue.
> Proposed fix is to address low-hanging fruit of having ErrorManager output to 
> the original System.err.  A full fix would be to improve our override of 
> System.err to a PrintStream that can detect the locking inversion or possibly 
> we could use the PrintStream mutex in both cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?focusedWorklogId=413387&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-413387
 ]

ASF GitHub Bot logged work on BEAM-9325:


Author: ASF GitHub Bot
Created on: 31/Mar/20 17:09
Start Date: 31/Mar/20 17:09
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11263: [BEAM-9325] 
Override proper write method in UnownedOutputStream
URL: https://github.com/apache/beam/pull/11263#issuecomment-606756607
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 413387)
Time Spent: 3.5h  (was: 3h 20m)

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Assignee: Kyoungha Min
>Priority: Major
> Fix For: 2.21.0
>
>   Original Estimate: 1m
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >