[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=432012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432012
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 08/May/20 07:11
Start Date: 08/May/20 07:11
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-625673344


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432012)
Time Spent: 8h 10m  (was: 8h)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=432019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432019
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 08/May/20 07:44
Start Date: 08/May/20 07:44
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #11566:
URL: https://github.com/apache/beam/pull/11566#issuecomment-625686624


   @santhh Thanks for the feedback! I need to think a little about table 
support, but as for the batch size, it's configurable through the builder. The 
upper bound for the batch is hardcoded and checked at runtime



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432019)
Remaining Estimate: 0h
Time Spent: 10m

> [Java] PTransform that connects to Cloud DLP deidentification service
> -
>
> Key: BEAM-9723
> URL: https://issues.apache.org/jira/browse/BEAM-9723
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=432021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432021
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 08/May/20 07:47
Start Date: 08/May/20 07:47
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #11566:
URL: https://github.com/apache/beam/pull/11566#issuecomment-625687808


   Run JavaPortabilityApiJava11 PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432021)
Time Spent: 20m  (was: 10m)

> [Java] PTransform that connects to Cloud DLP deidentification service
> -
>
> Key: BEAM-9723
> URL: https://issues.apache.org/jira/browse/BEAM-9723
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=432032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432032
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 08/May/20 08:05
Start Date: 08/May/20 08:05
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-625695376


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432032)
Time Spent: 8h 20m  (was: 8h 10m)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6710) Add Landing page to community metrics dashboard

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6710?focusedWorklogId=432035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432035
 ]

ASF GitHub Bot logged work on BEAM-6710:


Author: ASF GitHub Bot
Created on: 08/May/20 08:11
Start Date: 08/May/20 08:11
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11629:
URL: https://github.com/apache/beam/pull/11629#issuecomment-625697491


   cc: @aaltay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432035)
Time Spent: 0.5h  (was: 20m)

> Add Landing page to  community metrics dashboard
> 
>
> Key: BEAM-6710
> URL: https://issues.apache.org/jira/browse/BEAM-6710
> Project: Beam
>  Issue Type: New Feature
>  Components: community-metrics, project-management
>Reporter: Mikhail Gryzykhin
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Community metrics dashboard sends user to list of recently opened dashboards, 
> that's empty. This confuses new users. 
> We want to add landing page with links to relevant dashboard.
> Link: ttp://104.154.241.245/
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9760) KafkaIO supports consumer group?

2020-05-08 Thread Ka Wah WONG (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102363#comment-17102363
 ] 

Ka Wah WONG commented on BEAM-9760:
---

Hi [~aromanenko] , my proposal applies on the case when there are multiple 
partitions in a topic, and multiple consumers in a consumer group that 
subscribe the same topic.

>From my understanding, to get a dynamically assigned partition with group 
>management by Kafka coordinator, it needs to use KafkaConsumer's subscribe 
>method. Using KafkaConsumer's assign method would manually assign a partition 
>to the consumer and does not use the consumer's group management 
>functionality. (Reference: Javaodc of 
>org.apache.kafka.clients.consumer.KafkaConsumer).

If I want to have two separate Java applications both using Apache Beam 
subscribing to the same Kafka topic through KafkaIO with same consumer group 
defined, I would like there can be failover feature supported.

For example, when topic-partition-0 is assigned to App-0, and topic-partition-1 
is assigned to App-1, if App-0 is down, the topic-partition-0 is then assigned 
to the App-1 by the Kafka coordinator / broker. Then App-1 subscribes message 
from both of topic-partition-0 and topic-partition-1 when App-0 is down. Note 
that App-0 and App-1 defines same consumer group and subscribes same topic with 
the 2 partitions.

 

> KafkaIO supports consumer group?
> 
>
> Key: BEAM-9760
> URL: https://issues.apache.org/jira/browse/BEAM-9760
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Ka Wah WONG
>Priority: Minor
>
> It seems only assign method of Kafka Consumer class is called in 
> org.apache.beam.sdk.io.kafka.ConsumerSpEL class. According to documentation 
> of org.apache.kafka.clients.consumer.KafkaConsumer,  manual topic assignment 
> through this assign method does not use the consumer's group management 
> functionality.
> May I ask if KafkaIO will be enhanced to support consumer's group management 
> with using Kafka consumer's subscribe method?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9930) Add announcement for Beam Summit Digital 2020 to the blog

2020-05-08 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9930:
--
Status: Open  (was: Triage Needed)

> Add announcement for Beam Summit Digital 2020 to the blog
> -
>
> Key: BEAM-9930
> URL: https://issues.apache.org/jira/browse/BEAM-9930
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>
> We need to announce Beam Summit Digital 2020 on the blog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9930) Add announcement for Beam Summit Digital 2020 to the blog

2020-05-08 Thread Maximilian Michels (Jira)
Maximilian Michels created BEAM-9930:


 Summary: Add announcement for Beam Summit Digital 2020 to the blog
 Key: BEAM-9930
 URL: https://issues.apache.org/jira/browse/BEAM-9930
 Project: Beam
  Issue Type: Task
  Components: website
Reporter: Maximilian Michels
Assignee: Maximilian Michels


We need to announce Beam Summit Digital 2020 on the blog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9930) Add announcement for Beam Summit Digital 2020 to the blog

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9930?focusedWorklogId=432041&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432041
 ]

ASF GitHub Bot logged work on BEAM-9930:


Author: ASF GitHub Bot
Created on: 08/May/20 08:26
Start Date: 08/May/20 08:26
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #11640:
URL: https://github.com/apache/beam/pull/11640


   Not the best timing in light of #11608 but let's get this out now and 
migrate it later on. 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apach

[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432055
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 09:20
Start Date: 08/May/20 09:20
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #11577:
URL: https://github.com/apache/beam/pull/11577#issuecomment-625725372


   LGTM, thanks for the contribution! :) Resolve the conflicts and feel free to 
merge



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432055)
Time Spent: 3h  (was: 2h 50m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432063
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 10:20
Start Date: 08/May/20 10:20
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11577:
URL: https://github.com/apache/beam/pull/11577#issuecomment-625747576


   Run Seed Job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432063)
Time Spent: 3h 10m  (was: 3h)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432064
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 10:35
Start Date: 08/May/20 10:35
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11577:
URL: https://github.com/apache/beam/pull/11577#issuecomment-625753189


   Run Java HadoopFormatIO Performance Test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432064)
Time Spent: 3h 20m  (was: 3h 10m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432065
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 10:36
Start Date: 08/May/20 10:36
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11577:
URL: https://github.com/apache/beam/pull/11577#issuecomment-625753366


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432065)
Time Spent: 3.5h  (was: 3h 20m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432079
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 10:56
Start Date: 08/May/20 10:56
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11567:
URL: https://github.com/apache/beam/pull/11567#issuecomment-625760052


   Run PythonDocker PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432079)
Time Spent: 3h 40m  (was: 3.5h)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432081
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 10:59
Start Date: 08/May/20 10:59
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11577:
URL: https://github.com/apache/beam/pull/11577#issuecomment-625761300


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432081)
Time Spent: 3h 50m  (was: 3h 40m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432085
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 11:26
Start Date: 08/May/20 11:26
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11567:
URL: https://github.com/apache/beam/pull/11567#issuecomment-625770721


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432085)
Time Spent: 4h 10m  (was: 4h)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432084
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 11:26
Start Date: 08/May/20 11:26
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11577:
URL: https://github.com/apache/beam/pull/11577#issuecomment-625770546


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432084)
Time Spent: 4h  (was: 3h 50m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=432094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432094
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 08/May/20 12:10
Start Date: 08/May/20 12:10
Worklog Time Spent: 10m 
  Work Description: mwalenia removed a comment on pull request #11566:
URL: https://github.com/apache/beam/pull/11566#issuecomment-625687808


   Run JavaPortabilityApiJava11 PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432094)
Time Spent: 0.5h  (was: 20m)

> [Java] PTransform that connects to Cloud DLP deidentification service
> -
>
> Key: BEAM-9723
> URL: https://issues.apache.org/jira/browse/BEAM-9723
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432097
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 12:28
Start Date: 08/May/20 12:28
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11577:
URL: https://github.com/apache/beam/pull/11577#issuecomment-625791717


   Run seed job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432097)
Time Spent: 4h 20m  (was: 4h 10m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=432102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432102
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 08/May/20 12:56
Start Date: 08/May/20 12:56
Worklog Time Spent: 10m 
  Work Description: mszb commented on a change in pull request #11210:
URL: https://github.com/apache/beam/pull/11210#discussion_r422104136



##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio_test.py
##
@@ -499,6 +499,7 @@ def test_batch_byte_size(
   # and each bach should contains 25 mutations.
   res = (
   p | beam.Create(mutation_group)
+  | 'combine to list' >> beam.combiners.ToList()

Review comment:
   Yes, the `_BatchFn` requires a single iterable of collection and loop 
through them to make the batches. Just replicating the same pipeline for the 
batching in the `_WriteGroup` transform.

##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py
##
@@ -1008,31 +1007,30 @@ def _reset_count(self):
 self._cells = 0
 
   def process(self, element):
-mg_info = element.info
+for elem in element:

Review comment:
   There was no issue in processing mutation group, the issue was with the 
batch size. According to the Beam execution model, ‘**The division of the 
collection into bundles is arbitrary and selected by the runner.**’ Which 
causes finish_bundle to be called multiple times rather than on the complete 
collection unit which causes the improper number of batches in the dataflow 
runner. That's the reason I've added the ToList transform to make a single 
collection and generate the batches properly.

##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py
##
@@ -1008,31 +1007,30 @@ def _reset_count(self):
 self._cells = 0
 
   def process(self, element):
-mg_info = element.info
+for elem in element:
+  mg_info = elem.info
+  if mg_info['byte_size'] + self._size_in_bytes > \

Review comment:
   Sure. Should I create a new Jira ticket and (1) add ticket number in 
this PR for reference OR (2) create a new PR for this change, and once it gets 
merge then I rebase this PR and request review? 
   
   I think the first approach required less time to close the tickets! What you 
suggest?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432102)
Time Spent: 9h 50m  (was: 9h 40m)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432106&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432106
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 13:04
Start Date: 08/May/20 13:04
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11577:
URL: https://github.com/apache/beam/pull/11577#issuecomment-625805262


   Run Java HadoopFormatIO Performance Test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432106)
Time Spent: 4.5h  (was: 4h 20m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432108
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 13:07
Start Date: 08/May/20 13:07
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on a change in pull request #11567:
URL: https://github.com/apache/beam/pull/11567#discussion_r422124124



##
File path: sdks/python/apache_beam/testing/load_tests/load_test.py
##
@@ -45,6 +47,19 @@ def _add_argparse_args(cls, parser):
 '--metrics_table',
 help='A BigQuery table where metrics should be '
 'written.')
+parser.add_argument(
+'--influx_measurement',
+help='An InfluxDB measurement where metrics should be published to. If 
'

Review comment:
   I am not sure if I correctly understand what measurement means. Is it 
name for metric such as `runtime` or name of place where metric will be stored 
as "table" or "column"?

##
File path: sdks/python/apache_beam/testing/load_tests/load_test.py
##
@@ -67,22 +82,30 @@ def _str_to_boolean(value):
 
 
 class LoadTest(object):
+  """Base class for all integration and performance tests which export
+  metrics to external databases: BigQuery or/and InfluxDB.
+
+  Refer to :class:`~apache_beam.testing.load_tests.LoadTestOptions` for more
+  information on the required pipeline options.
+
+  If using InfluxDB with Basic HTTP authentication enabled, provide the
+  following environment options: `INFLUXDB_USER` and `INFLUXDB_USER_PASSWORD`.

Review comment:
   Is it something we could enable to provide via PipelineOptions as well?

##
File path: sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py
##
@@ -167,14 +175,15 @@ class MetricsReader(object):
   A :class:`MetricsReader` retrieves metrics from pipeline result,
   prepares it for publishers and setup publishers.
   """
-  publishers = []  # type: List[ConsoleMetricsPublisher]
+  publishers = []  # type: List[Any]
 
   def __init__(
   self,
   project_name=None,
   bq_table=None,
   bq_dataset=None,
   publish_to_bq=False,

Review comment:
   Do you think it would be good to have consistent parameter naming for 
influx and bq? Or we plan to abandon bq in future?

##
File path: sdks/python/apache_beam/testing/load_tests/load_test.py
##
@@ -67,22 +82,30 @@ def _str_to_boolean(value):
 
 
 class LoadTest(object):
+  """Base class for all integration and performance tests which export
+  metrics to external databases: BigQuery or/and InfluxDB.
+
+  Refer to :class:`~apache_beam.testing.load_tests.LoadTestOptions` for more
+  information on the required pipeline options.
+
+  If using InfluxDB with Basic HTTP authentication enabled, provide the
+  following environment options: `INFLUXDB_USER` and `INFLUXDB_USER_PASSWORD`.
+  """
   def __init__(self):
 # Be sure to set blocking to false for timeout_ms to work properly
 self.pipeline = TestPipeline(is_integration_test=True, blocking=False)
 assert not self.pipeline.blocking
 
-load_test_options = self.pipeline.get_pipeline_options().view_as(
-LoadTestOptions)
-self.timeout_ms = load_test_options.timeout_ms
-self.input_options = load_test_options.input_options
-self.metrics_namespace = load_test_options.metrics_table or 'default'
-publish_to_bq = load_test_options.publish_to_big_query
+options = self.pipeline.get_pipeline_options().view_as(LoadTestOptions)
+self.timeout_ms = options.timeout_ms
+self.input_options = options.input_options
+self.metrics_namespace = options.metrics_table or 'default'
+publish_to_bq = options.publish_to_big_query
 if publish_to_bq is None:

Review comment:
   Maybe we should remove this `if` since we have now two targets where we 
publish metrics?

##
File path: sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py
##
@@ -404,6 +419,77 @@ def save(self, results):
 return self._client.insert_rows(self._bq_table, results)
 
 
+class InfluxDBMetricsPublisherOptions(object):
+  def __init__(
+  self,
+  measurement,  # type: str
+  db_name,  # type: str
+  hostname='http://localhost:8086',  # type: str

Review comment:
   Why do we need this default value here? Isn't it provided from pipeline 
options default value?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432108)
Time Spent: 4h 

[jira] [Work logged] (BEAM-9646) [Java] PTransform that integrates Cloud Vision functionality

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9646?focusedWorklogId=432114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432114
 ]

ASF GitHub Bot logged work on BEAM-9646:


Author: ASF GitHub Bot
Created on: 08/May/20 13:34
Start Date: 08/May/20 13:34
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #11331:
URL: https://github.com/apache/beam/pull/11331#issuecomment-625818489


   @tysonjh Pinging for review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432114)
Time Spent: 1h  (was: 50m)

> [Java] PTransform that integrates Cloud Vision functionality
> 
>
> Key: BEAM-9646
> URL: https://issues.apache.org/jira/browse/BEAM-9646
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432125
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 14:00
Start Date: 08/May/20 14:00
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on a change in pull request #11567:
URL: https://github.com/apache/beam/pull/11567#discussion_r422159594



##
File path: sdks/python/apache_beam/testing/load_tests/load_test.py
##
@@ -45,6 +47,19 @@ def _add_argparse_args(cls, parser):
 '--metrics_table',
 help='A BigQuery table where metrics should be '
 'written.')
+parser.add_argument(
+'--influx_measurement',
+help='An InfluxDB measurement where metrics should be published to. If 
'

Review comment:
   It's a name of place where metric will be stored. It's like "table" in 
other databases. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432125)
Time Spent: 4h 50m  (was: 4h 40m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432128&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432128
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 14:07
Start Date: 08/May/20 14:07
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on a change in pull request #11567:
URL: https://github.com/apache/beam/pull/11567#discussion_r422163533



##
File path: sdks/python/apache_beam/testing/load_tests/load_test.py
##
@@ -67,22 +82,30 @@ def _str_to_boolean(value):
 
 
 class LoadTest(object):
+  """Base class for all integration and performance tests which export
+  metrics to external databases: BigQuery or/and InfluxDB.
+
+  Refer to :class:`~apache_beam.testing.load_tests.LoadTestOptions` for more
+  information on the required pipeline options.
+
+  If using InfluxDB with Basic HTTP authentication enabled, provide the
+  following environment options: `INFLUXDB_USER` and `INFLUXDB_USER_PASSWORD`.

Review comment:
   If we did, we could put vulnerable data (like password) at risk by 
exposing them in logs. Apart from that, I think the only way of using Jenkins 
credentials are environment variables only

##
File path: sdks/python/apache_beam/testing/load_tests/load_test.py
##
@@ -67,22 +82,30 @@ def _str_to_boolean(value):
 
 
 class LoadTest(object):
+  """Base class for all integration and performance tests which export
+  metrics to external databases: BigQuery or/and InfluxDB.
+
+  Refer to :class:`~apache_beam.testing.load_tests.LoadTestOptions` for more
+  information on the required pipeline options.
+
+  If using InfluxDB with Basic HTTP authentication enabled, provide the
+  following environment options: `INFLUXDB_USER` and `INFLUXDB_USER_PASSWORD`.

Review comment:
   If we did, we could put vulnerable data (like password) at risk by 
exposing them in logs. Apart from that, I think the only way of using Jenkins 
credentials are environment variables 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432128)
Time Spent: 5h  (was: 4h 50m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432137&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432137
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 14:26
Start Date: 08/May/20 14:26
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on a change in pull request #11567:
URL: https://github.com/apache/beam/pull/11567#discussion_r422172322



##
File path: sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py
##
@@ -404,6 +419,77 @@ def save(self, results):
 return self._client.insert_rows(self._bq_table, results)
 
 
+class InfluxDBMetricsPublisherOptions(object):
+  def __init__(
+  self,
+  measurement,  # type: str
+  db_name,  # type: str
+  hostname='http://localhost:8086',  # type: str

Review comment:
   There are some minor chances that someone would use InfluxDBPublisher in 
their code without casting pipeline options to LoadTestOptions (view_as(...)). 
But beside that, there's no particular reason





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432137)
Time Spent: 5h 20m  (was: 5h 10m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432133
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 14:15
Start Date: 08/May/20 14:15
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on a change in pull request #11567:
URL: https://github.com/apache/beam/pull/11567#discussion_r422167855



##
File path: sdks/python/apache_beam/testing/load_tests/load_test.py
##
@@ -67,22 +82,30 @@ def _str_to_boolean(value):
 
 
 class LoadTest(object):
+  """Base class for all integration and performance tests which export
+  metrics to external databases: BigQuery or/and InfluxDB.
+
+  Refer to :class:`~apache_beam.testing.load_tests.LoadTestOptions` for more
+  information on the required pipeline options.
+
+  If using InfluxDB with Basic HTTP authentication enabled, provide the
+  following environment options: `INFLUXDB_USER` and `INFLUXDB_USER_PASSWORD`.
+  """
   def __init__(self):
 # Be sure to set blocking to false for timeout_ms to work properly
 self.pipeline = TestPipeline(is_integration_test=True, blocking=False)
 assert not self.pipeline.blocking
 
-load_test_options = self.pipeline.get_pipeline_options().view_as(
-LoadTestOptions)
-self.timeout_ms = load_test_options.timeout_ms
-self.input_options = load_test_options.input_options
-self.metrics_namespace = load_test_options.metrics_table or 'default'
-publish_to_bq = load_test_options.publish_to_big_query
+options = self.pipeline.get_pipeline_options().view_as(LoadTestOptions)
+self.timeout_ms = options.timeout_ms
+self.input_options = options.input_options
+self.metrics_namespace = options.metrics_table or 'default'
+publish_to_bq = options.publish_to_big_query
 if publish_to_bq is None:

Review comment:
   Our goal is to keep sending metrics to BigQuery for some time. think 
it'd good to keep the interface intact until we eventually abandon bq.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432133)
Time Spent: 5h 10m  (was: 5h)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8132) Create metrics publisher in Python SDK

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8132?focusedWorklogId=432139&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432139
 ]

ASF GitHub Bot logged work on BEAM-8132:


Author: ASF GitHub Bot
Created on: 08/May/20 14:26
Start Date: 08/May/20 14:26
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on a change in pull request #11567:
URL: https://github.com/apache/beam/pull/11567#discussion_r422174187



##
File path: sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py
##
@@ -167,14 +175,15 @@ class MetricsReader(object):
   A :class:`MetricsReader` retrieves metrics from pipeline result,
   prepares it for publishers and setup publishers.
   """
-  publishers = []  # type: List[ConsoleMetricsPublisher]
+  publishers = []  # type: List[Any]
 
   def __init__(
   self,
   project_name=None,
   bq_table=None,
   bq_dataset=None,
   publish_to_bq=False,

Review comment:
   It's likely, but there's no decision yet. What influx parameters do you 
think can be improved? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432139)
Time Spent: 5.5h  (was: 5h 20m)

> Create metrics publisher in Python SDK
> --
>
> Key: BEAM-8132
> URL: https://issues.apache.org/jira/browse/BEAM-8132
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=432141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432141
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 08/May/20 14:38
Start Date: 08/May/20 14:38
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#11210:
URL: https://github.com/apache/beam/pull/11210#discussion_r422180693



##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py
##
@@ -1008,31 +1007,30 @@ def _reset_count(self):
 self._cells = 0
 
   def process(self, element):
-mg_info = element.info
+for elem in element:

Review comment:
   "Which causes finish_bundle to be called multiple times" do you mean 
that finish_bundle will be called once per bundle ? 
   This is the expected behavior and users will observe this behavior as well. 
Implementation should work for arbitrary bundle sizes without users having to 
group PCollection elements together.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432141)
Time Spent: 10h  (was: 9h 50m)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=432142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432142
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 08/May/20 14:39
Start Date: 08/May/20 14:39
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#11210:
URL: https://github.com/apache/beam/pull/11210#discussion_r422181467



##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio_test.py
##
@@ -499,6 +499,7 @@ def test_batch_byte_size(
   # and each bach should contains 25 mutations.
   res = (
   p | beam.Create(mutation_group)
+  | 'combine to list' >> beam.combiners.ToList()

Review comment:
   Do users have to do this as well ? Seems like we are missing something 
in the implementation. How does Java implementation operate ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432142)
Time Spent: 10h 10m  (was: 10h)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=432143&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432143
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 08/May/20 14:41
Start Date: 08/May/20 14:41
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#11210:
URL: https://github.com/apache/beam/pull/11210#discussion_r422182633



##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py
##
@@ -1008,31 +1007,30 @@ def _reset_count(self):
 self._cells = 0
 
   def process(self, element):
-mg_info = element.info
+for elem in element:
+  mg_info = elem.info
+  if mg_info['byte_size'] + self._size_in_bytes > \

Review comment:
   I think (2) is better but we should fix the Spanner connector 
implementation to work for arbitrary bundle sizes than reducing the bundle to a 
single element for the test.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432143)
Time Spent: 10h 20m  (was: 10h 10m)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432144
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 14:42
Start Date: 08/May/20 14:42
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on a change in pull request #11274:
URL: https://github.com/apache/beam/pull/11274#discussion_r422183052



##
File path: .test-infra/jenkins/job_PerformanceTests_PubsubIO_Python.groovy
##
@@ -41,7 +42,7 @@ def psio_test = [
 metrics_dataset  : 'beam_performance',
 metrics_table: 'psio_io_2GB_msg_results',
 input_options: '\'{' +
-'"num_records": 2097152,' +
+'"num_records": 2097152' +

Review comment:
   Why did you remove that comma?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432144)
Time Spent: 12h 50m  (was: 12h 40m)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9795) Support custom avro DatumWriters when writing to BigQuery

2020-05-08 Thread Steve Niemitz (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Niemitz resolved BEAM-9795.
-
Fix Version/s: 2.22.0
   Resolution: Fixed

> Support custom avro DatumWriters when writing to BigQuery
> -
>
> Key: BEAM-9795
> URL: https://issues.apache.org/jira/browse/BEAM-9795
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There are times (for more advanced use cases) where I want more control over 
> the user -> avro process when writing to BigQuery.  Being able to pass in my 
> own DatumWriter and configuration around that would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9931) Support custom avro DatumReaders in AvroIO

2020-05-08 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9931:
--
Status: Open  (was: Triage Needed)

> Support custom avro DatumReaders in AvroIO
> --
>
> Key: BEAM-9931
> URL: https://issues.apache.org/jira/browse/BEAM-9931
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-avro
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9931) Support custom avro DatumReaders in AvroIO

2020-05-08 Thread Steve Niemitz (Jira)
Steve Niemitz created BEAM-9931:
---

 Summary: Support custom avro DatumReaders in AvroIO
 Key: BEAM-9931
 URL: https://issues.apache.org/jira/browse/BEAM-9931
 Project: Beam
  Issue Type: Improvement
  Components: io-java-avro
Reporter: Steve Niemitz
Assignee: Steve Niemitz






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2530) Make Beam compatible with next Java LTS version (Java 11)

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2530?focusedWorklogId=432150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432150
 ]

ASF GitHub Bot logged work on BEAM-2530:


Author: ASF GitHub Bot
Created on: 08/May/20 15:04
Start Date: 08/May/20 15:04
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #11619:
URL: https://github.com/apache/beam/pull/11619#issuecomment-625859443


   Merged manually to add the missing ticket prefix `[BEAM-2530`. Thanks again!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432150)
Time Spent: 4h 50m  (was: 4h 40m)

> Make Beam compatible with next Java LTS version (Java 11)
> -
>
> Key: BEAM-2530
> URL: https://issues.apache.org/jira/browse/BEAM-2530
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: Java11, java9
> Fix For: Not applicable
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The goal of this task is to validate that the Java SDK and the Java Direct 
> Runner (and its tests) work as intended on the next Java LTS version (Java 11 
> /18.9). For this we will base the compilation on the java.base profile and 
> include other core Java modules when needed.  
> *Notes:*
> - Ideally validation of the IOs/extensions will be included but if serious 
> issues are found they will be tracked independently.
> - The goal of using the Java Platform module system is out of the scope of 
> this work.
> - Support for other runners will be a tracked as a separate effort because 
> other runners depend strongly in the support of the native runner ecosystems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9931) Support custom avro DatumReaders in AvroIO

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9931?focusedWorklogId=432152&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432152
 ]

ASF GitHub Bot logged work on BEAM-9931:


Author: ASF GitHub Bot
Created on: 08/May/20 15:05
Start Date: 08/May/20 15:05
Worklog Time Spent: 10m 
  Work Description: steveniemitz opened a new pull request #11641:
URL: https://github.com/apache/beam/pull/11641


   Similar to PR #11479, it would be useful to be able to explicitly pass a 
DatumReader factory to AvroIO, and have it use that instead of 
GenericDatumReader or SpecificDatumReader.
   
   This PR adds `withDatumReaderFactory` to AvroIO and plumbs it through into 
AvroSource.
   
   R: @iemejia 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild

[jira] [Created] (BEAM-9932) Add documentation describing cross-language test pipelines

2020-05-08 Thread Chamikara Madhusanka Jayalath (Jira)
Chamikara Madhusanka Jayalath created BEAM-9932:
---

 Summary: Add documentation describing cross-language test pipelines
 Key: BEAM-9932
 URL: https://issues.apache.org/jira/browse/BEAM-9932
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Chamikara Madhusanka Jayalath
Assignee: Chamikara Madhusanka Jayalath


We designed cross-language test pipelines [1][2] based on the discussion in [3].

Adding some pydocs and Java docs regarding rational behind each pipeline will 
be helpful.

[1] 
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/validate_runner_xlang_test.py]

[2] 
[https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java]

 [3] 
[https://docs.google.com/document/d/1xQp0ElIV84b8OCVz8CD2hvbiWdR8w4BvWxPTZJZA6NA/edit?usp=sharing]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9932) Add documentation describing cross-language test pipelines

2020-05-08 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9932:
--
Status: Open  (was: Triage Needed)

> Add documentation describing cross-language test pipelines
> --
>
> Key: BEAM-9932
> URL: https://issues.apache.org/jira/browse/BEAM-9932
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> We designed cross-language test pipelines [1][2] based on the discussion in 
> [3].
> Adding some pydocs and Java docs regarding rational behind each pipeline will 
> be helpful.
> [1] 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/validate_runner_xlang_test.py]
> [2] 
> [https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java]
>  [3] 
> [https://docs.google.com/document/d/1xQp0ElIV84b8OCVz8CD2hvbiWdR8w4BvWxPTZJZA6NA/edit?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432155
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 15:15
Start Date: 08/May/20 15:15
Worklog Time Spent: 10m 
  Work Description: piotr-szuberski commented on a change in pull request 
#11274:
URL: https://github.com/apache/beam/pull/11274#discussion_r422202409



##
File path: .test-infra/jenkins/job_PerformanceTests_PubsubIO_Python.groovy
##
@@ -41,7 +42,7 @@ def psio_test = [
 metrics_dataset  : 'beam_performance',
 metrics_table: 'psio_io_2GB_msg_results',
 input_options: '\'{' +
-'"num_records": 2097152,' +
+'"num_records": 2097152' +

Review comment:
   It sounds stupid but my cat went through my keyboard and I deleted too 
many digits and overlooked it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432155)
Time Spent: 13h  (was: 12h 50m)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9931) Support custom Avro DatumReaders in AvroIO

2020-05-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9931:
---
Summary: Support custom Avro DatumReaders in AvroIO  (was: Support custom 
avro DatumReaders in AvroIO)

> Support custom Avro DatumReaders in AvroIO
> --
>
> Key: BEAM-9931
> URL: https://issues.apache.org/jira/browse/BEAM-9931
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-avro
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432159&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432159
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 15:35
Start Date: 08/May/20 15:35
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-625874085


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432159)
Time Spent: 13h 10m  (was: 13h)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9449) Consider passing pipeline options for expansion service.

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9449?focusedWorklogId=432160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432160
 ]

ASF GitHub Bot logged work on BEAM-9449:


Author: ASF GitHub Bot
Created on: 08/May/20 15:37
Start Date: 08/May/20 15:37
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11638:
URL: https://github.com/apache/beam/pull/11638#issuecomment-625874811


   Run RAT PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432160)
Time Spent: 1h 40m  (was: 1.5h)

> Consider passing pipeline options for expansion service.
> 
>
> Key: BEAM-9449
> URL: https://issues.apache.org/jira/browse/BEAM-9449
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9930) Add announcement for Beam Summit Digital 2020 to the blog

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9930?focusedWorklogId=432161&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432161
 ]

ASF GitHub Bot logged work on BEAM-9930:


Author: ASF GitHub Bot
Created on: 08/May/20 15:39
Start Date: 08/May/20 15:39
Worklog Time Spent: 10m 
  Work Description: matthiasa4 commented on pull request #11640:
URL: https://github.com/apache/beam/pull/11640#issuecomment-625875861


   Added a few minor changes - LGTM for the rest! Thanks Max!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432161)
Time Spent: 20m  (was: 10m)

> Add announcement for Beam Summit Digital 2020 to the blog
> -
>
> Key: BEAM-9930
> URL: https://issues.apache.org/jira/browse/BEAM-9930
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We need to announce Beam Summit Digital 2020 on the blog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432165
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 15:44
Start Date: 08/May/20 15:44
Worklog Time Spent: 10m 
  Work Description: kamilwu removed a comment on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-625874085


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432165)
Time Spent: 13h 40m  (was: 13.5h)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432164
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 15:44
Start Date: 08/May/20 15:44
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-625878298


   Run seed job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432164)
Time Spent: 13.5h  (was: 13h 20m)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432163&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432163
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 15:44
Start Date: 08/May/20 15:44
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-625878134


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432163)
Time Spent: 13h 20m  (was: 13h 10m)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9810) Add a Tox (precommit) suite for Python 3.8

2020-05-08 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-9810:
--

Assignee: Kamil Wasilewski

> Add a Tox (precommit) suite for Python 3.8
> --
>
> Key: BEAM-9810
> URL: https://issues.apache.org/jira/browse/BEAM-9810
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Kamil Wasilewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8494) Python 3.8 Support

2020-05-08 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102687#comment-17102687
 ] 

Kamil Wasilewski commented on BEAM-8494:


I've assigned https://issues.apache.org/jira/browse/BEAM-9810 to myself and I'd 
like to start working on this, if you don't mind. I'll create new tox suites 
(py38, py38-cloud, py38-cython) and ensure that they pass on Python 3.8.

> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9911) Replace SpannerIO.write latency counter to distribution

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9911?focusedWorklogId=432166&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432166
 ]

ASF GitHub Bot logged work on BEAM-9911:


Author: ASF GitHub Bot
Created on: 08/May/20 15:56
Start Date: 08/May/20 15:56
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11628:
URL: https://github.com/apache/beam/pull/11628#issuecomment-625883964


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432166)
Time Spent: 40m  (was: 0.5h)

> Replace SpannerIO.write latency counter to distribution
> ---
>
> Key: BEAM-9911
> URL: https://issues.apache.org/jira/browse/BEAM-9911
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Allen Pradeep Xavier
>Assignee: Allen Pradeep Xavier
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As part of improvements to spanner write, spanner_write_total_latency_ms was 
> added for more visibility.
> This counter tracks the total latency in milliseconds suffered by all the 
> write calls to spanner and is not actionable.
> Replacing this with a Distribution make this more actionable as it provides 4 
> counters(MIN, MAX, MEAN, COUNT) 
>  [ 
> https://beam.apache.org/releases/javadoc/2.15.0/org/apache/beam/sdk/metrics/Distribution.html|https://beam.apache.org/releases/javadoc/2.15.0/org/apache/beam/sdk/metrics/Distribution.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432167&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432167
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 16:01
Start Date: 08/May/20 16:01
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-625886138


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432167)
Time Spent: 13h 50m  (was: 13h 40m)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432171&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432171
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 16:03
Start Date: 08/May/20 16:03
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-625887287


   Run PubsubIO Performance Test Python



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432171)
Time Spent: 14h 10m  (was: 14h)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9633) PubsubIO performance tests

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9633?focusedWorklogId=432170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432170
 ]

ASF GitHub Bot logged work on BEAM-9633:


Author: ASF GitHub Bot
Created on: 08/May/20 16:03
Start Date: 08/May/20 16:03
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-625887140


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432170)
Time Spent: 14h  (was: 13h 50m)

> PubsubIO performance tests
> --
>
> Key: BEAM-9633
> URL: https://issues.apache.org/jira/browse/BEAM-9633
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Affects Versions: Not applicable
>Reporter: Piotr Szuberski
>Assignee: Piotr Szuberski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> There is no performance tests for PubsubIO in Python sdk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=432172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432172
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 08/May/20 16:04
Start Date: 08/May/20 16:04
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r48268



##
File path: website/www/site/data/meetings.yml
##
@@ -9,31 +9,30 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-# Welcome to Jekyll!
 
 events:
-- date: 2016/04/01
-  time: "9:30 - 16:00 Pacific"
-  location: PayPalSan Jose, CA, USA
-  type: Dev/PPMC Meeting
-  materials:
-- title: Presentation - PPMC Deep Dive
-  link: 
"https://docs.google.com/presentation/d/1uTb7dx4-Y2OM_B0_3XF_whwAL2FlDTTuq2QzP9sJ4Mg/edit?usp=sharing";
+  - date: 2016/04/01
+time: "9:30 - 16:00 Pacific"
+location: PayPalSan Jose, CA, USA
+type: Dev/PPMC Meeting
+materials:
+  - title: Presentation - PPMC Deep Dive
+link: 
"https://docs.google.com/presentation/d/1uTb7dx4-Y2OM_B0_3XF_whwAL2FlDTTuq2QzP9sJ4Mg/edit?usp=sharing";
 
-- title: Notes - PPMC Deep Dive
-  link: 
"https://docs.google.com/document/d/1SXSLj7FMIgKqj43nTcczFpJzqASeUMUCpbyklk2fBkg/edit?usp=sharing";
-  notes:
+  - title: Notes - PPMC Deep Dive
+link: 
"https://docs.google.com/document/d/1SXSLj7FMIgKqj43nTcczFpJzqASeUMUCpbyklk2fBkg/edit?usp=sharing";
+notes:
 
-- date: 2016/05/04
-  time: "8:00 - 11:00 Pacific"
-  location: Virtual
-  type: Technical Deep Dive
-  materials:
-- title: Presentation - Beam Community Meeting
-  link: 
"https://drive.google.com/open?id=17i7SHViboWtLEZw27iabdMisPl987WWxvapJaXg_dEE";
+  - date: 2016/05/04
+time: "8:00 - 11:00 Pacific"
+location: Virtual
+type: Technical Deep Dive
+materials:
+  - title: Presentation - Beam Community Meeting
+link: 
"https://drive.google.com/open?id=17i7SHViboWtLEZw27iabdMisPl987WWxvapJaXg_dEE";
 
-- title: Notes - Beam Community Meeting
-  link: 
"https://drive.google.com/open?id=1szhEE_pfhEtrQye61jXAidUcMW7oebZCRc2InUe3ou0";
-  notes:
+  - title: Notes - Beam Community Meeting
+link: 
"https://drive.google.com/open?id=1szhEE_pfhEtrQye61jXAidUcMW7oebZCRc2InUe3ou0";
+notes:

Review comment:
   Are the whitespace changes in these yaml files necessary?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432172)
Time Spent: 8.5h  (was: 8h 20m)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9856) HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9856?focusedWorklogId=432174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432174
 ]

ASF GitHub Bot logged work on BEAM-9856:


Author: ASF GitHub Bot
Created on: 08/May/20 16:14
Start Date: 08/May/20 16:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11596:
URL: https://github.com/apache/beam/pull/11596#discussion_r422233303



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##
@@ -472,24 +548,120 @@ public void initClient() throws IOException {
   this.client = new HttpHealthcareApiClient();
 }
 
+@GetInitialRestriction
+public OrderedTimeRange getEarliestToLatestRestriction(@Element String 
hl7v2Store)
+throws IOException {
+  from = this.client.getEarliestHL7v2SendTime(hl7v2Store, this.filter);
+  // filters are [from, to) to match logic of OffsetRangeTracker but need 
latest element to be
+  // included in results set to add an extra ms to the upper bound.
+  to = this.client.getLatestHL7v2SendTime(hl7v2Store, this.filter).plus(1);
+  return new OrderedTimeRange(from, to);
+}
+
+@NewTracker
+public OrderedTimeRangeTracker newTracker(@Restriction OrderedTimeRange 
timeRange) {
+  return timeRange.newTracker();
+}
+
+@SplitRestriction
+public void split(
+@Restriction OrderedTimeRange timeRange, 
OutputReceiver out) {
+  // TODO(jaketf) How to pick optimal values for desiredNumOffsetsPerSplit 
?

Review comment:
   That seems like a lot.
   
   Dataflow has an API limit of 20mbs for split descriptions when being 
returned which usually tops out around 10k splits for sources but even 10k is 
too much. Typically 20-50 splits is enough since dynamic splitting will ramp 
that up to 1000s if necessary.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432174)
Time Spent: 4h 10m  (was: 4h)

> HL7v2IO.ListHL7v2Messages should be refactored to support more parallelization
> --
>
> Key: BEAM-9856
> URL: https://issues.apache.org/jira/browse/BEAM-9856
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently the List Messages API paginates through in a single ProcessElement 
> Call.
> However we could get a restriction based on createTime using Messages.List 
> filter and orderby.
>  
> This is inline with the future roadmap of  HL7v2 bulk export API becomes 
> available that should allow splitting on (e.g. create time dimension). 
> Leveraging this bulk export might be  a future optimization to explore.
>  
> This could take one of two forms:
> 1. dyanmically splitable via splitable DoFn (sexy, beam idiomatic: make 
> optimization the runner's problem, potentially unnecessarily complex for this 
> use case )
> 2. static splitting on some time partition e.g. finding the earliest 
> createTime and emitting a PCollection of 1 hour partitions and paginating 
> through each hour of data w/ in the time frame that the store spans, in a 
> separate ProcessElement. (easy to implement but will likely have hot keys / 
> stragglers based on "busy hours")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=432176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432176
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 08/May/20 16:14
Start Date: 08/May/20 16:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11632:
URL: https://github.com/apache/beam/pull/11632#issuecomment-625892367


   PTAL



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432176)
Time Spent: 82h 10m  (was: 82h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 82h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=432175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432175
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 08/May/20 16:14
Start Date: 08/May/20 16:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11632:
URL: https://github.com/apache/beam/pull/11632#discussion_r422233278



##
File path: sdks/python/apache_beam/dataframe/convert.py
##
@@ -16,13 +16,23 @@
 
 from __future__ import absolute_import
 
+import typing
+
 import inspect
 
 from apache_beam import pvalue
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frame_base
 from apache_beam.dataframe import transforms
 
+if typing.TYPE_CHECKING:
+  # pylint: disable=ungrouped-imports
+  from typing import Any
+  from typing import Dict
+  from typing import Tuple
+  from typing import Union

Review comment:
   So lint complains about unguarded imports, so I put them back. We'll 
just to a massive sweep to fix these when we change to use type annotations. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432175)
Time Spent: 82h  (was: 81h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 82h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9430) Migrate from ProcessContext#updateWatermark to WatermarkEstimators

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9430?focusedWorklogId=432179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432179
 ]

ASF GitHub Bot logged work on BEAM-9430:


Author: ASF GitHub Bot
Created on: 08/May/20 16:20
Start Date: 08/May/20 16:20
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11607:
URL: https://github.com/apache/beam/pull/11607#discussion_r422236167



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.java
##
@@ -37,14 +37,16 @@
 private Instant lastReportedWatermark;
 
 public Manual(Instant watermark) {
-  this.watermark = checkNotNull(watermark, "watermark must not be null.");
-  if (watermark.isBefore(GlobalWindow.TIMESTAMP_MIN_VALUE)
-  || watermark.isAfter(GlobalWindow.TIMESTAMP_MAX_VALUE)) {
-throw new IllegalArgumentException(
-String.format(
-"Provided watermark %s must be within bounds [%s, %s].",
-watermark, GlobalWindow.TIMESTAMP_MIN_VALUE, 
GlobalWindow.TIMESTAMP_MAX_VALUE));
+  checkNotNull(watermark, "watermark must not be null.");
+
+  // Making sure that the watermark is within bounds.

Review comment:
   Your right, it would be good to migrate to use BoundedWindow as the 
import for the static though.
   
   I think it makes sense to make the constructor validate the bounds and have 
setWatermark ensure that the value is within the range as expected. We can fix 
the UnboundedSource SDF wrapper to clamp the watermark value that is being 
reported from UnboundedReader instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432179)
Time Spent: 7h 10m  (was: 7h)

> Migrate from ProcessContext#updateWatermark to WatermarkEstimators
> --
>
> Key: BEAM-9430
> URL: https://issues.apache.org/jira/browse/BEAM-9430
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Labels: backward-incompatible
> Fix For: 2.21.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Current discussion underway in 
> [https://lists.apache.org/thread.html/r5d974b6a58bc04ff4c02682fda4ef68608121f1bf23a86e9d592ca6e%40%3Cdev.beam.apache.org%3E]
>  
> Proposed API: [https://github.com/apache/beam/pull/10992]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=432180&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432180
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 08/May/20 16:24
Start Date: 08/May/20 16:24
Worklog Time Spent: 10m 
  Work Description: bntnam commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r422238541



##
File path: website/www/site/data/meetings.yml
##
@@ -9,31 +9,30 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-# Welcome to Jekyll!
 
 events:
-- date: 2016/04/01
-  time: "9:30 - 16:00 Pacific"
-  location: PayPalSan Jose, CA, USA
-  type: Dev/PPMC Meeting
-  materials:
-- title: Presentation - PPMC Deep Dive
-  link: 
"https://docs.google.com/presentation/d/1uTb7dx4-Y2OM_B0_3XF_whwAL2FlDTTuq2QzP9sJ4Mg/edit?usp=sharing";
+  - date: 2016/04/01
+time: "9:30 - 16:00 Pacific"
+location: PayPalSan Jose, CA, USA
+type: Dev/PPMC Meeting
+materials:
+  - title: Presentation - PPMC Deep Dive
+link: 
"https://docs.google.com/presentation/d/1uTb7dx4-Y2OM_B0_3XF_whwAL2FlDTTuq2QzP9sJ4Mg/edit?usp=sharing";
 
-- title: Notes - PPMC Deep Dive
-  link: 
"https://docs.google.com/document/d/1SXSLj7FMIgKqj43nTcczFpJzqASeUMUCpbyklk2fBkg/edit?usp=sharing";
-  notes:
+  - title: Notes - PPMC Deep Dive
+link: 
"https://docs.google.com/document/d/1SXSLj7FMIgKqj43nTcczFpJzqASeUMUCpbyklk2fBkg/edit?usp=sharing";
+notes:
 
-- date: 2016/05/04
-  time: "8:00 - 11:00 Pacific"
-  location: Virtual
-  type: Technical Deep Dive
-  materials:
-- title: Presentation - Beam Community Meeting
-  link: 
"https://drive.google.com/open?id=17i7SHViboWtLEZw27iabdMisPl987WWxvapJaXg_dEE";
+  - date: 2016/05/04
+time: "8:00 - 11:00 Pacific"
+location: Virtual
+type: Technical Deep Dive
+materials:
+  - title: Presentation - Beam Community Meeting
+link: 
"https://drive.google.com/open?id=17i7SHViboWtLEZw27iabdMisPl987WWxvapJaXg_dEE";
 
-- title: Notes - Beam Community Meeting
-  link: 
"https://drive.google.com/open?id=1szhEE_pfhEtrQye61jXAidUcMW7oebZCRc2InUe3ou0";
-  notes:
+  - title: Notes - Beam Community Meeting
+link: 
"https://drive.google.com/open?id=1szhEE_pfhEtrQye61jXAidUcMW7oebZCRc2InUe3ou0";
+notes:

Review comment:
   @TheNeuralBit: The file is formatted in the correct form according to 
the Indentation rule [1].
   
   [1] 
https://docs.saltstack.com/en/master/topics/troubleshooting/yaml_idiosyncrasies.html





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432180)
Time Spent: 8h 40m  (was: 8.5h)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9646) [Java] PTransform that integrates Cloud Vision functionality

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9646?focusedWorklogId=432182&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432182
 ]

ASF GitHub Bot logged work on BEAM-9646:


Author: ASF GitHub Bot
Created on: 08/May/20 16:27
Start Date: 08/May/20 16:27
Worklog Time Spent: 10m 
  Work Description: tysonjh commented on a change in pull request #11331:
URL: https://github.com/apache/beam/pull/11331#discussion_r421689993



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/AnnotateImages.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.cloud.vision.v1.AnnotateImageRequest;
+import com.google.cloud.vision.v1.AnnotateImageResponse;
+import com.google.cloud.vision.v1.BatchAnnotateImagesResponse;
+import com.google.cloud.vision.v1.Feature;
+import com.google.cloud.vision.v1.ImageAnnotatorClient;
+import com.google.cloud.vision.v1.ImageContext;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupIntoBatches;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * Parent class for transform utilizing Cloud Vision API.
+ *
+ * @param  Type of input PCollection.
+ */
+public abstract class AnnotateImages
+extends PTransform, 
PCollection>> {
+
+  private static final Long MIN_BATCH_SIZE = 1L;
+  private static final Long MAX_BATCH_SIZE = 5L;
+
+  protected final PCollectionView> contextSideInput;
+  protected final List featureList;
+  private long batchSize;
+
+  public AnnotateImages(

Review comment:
   Would you please add comments to this as well? It would be useful for 
those implementing subclasses.

##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/AnnotateImages.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.cloud.vision.v1.AnnotateImageRequest;
+import com.google.cloud.vision.v1.AnnotateImageResponse;
+import com.google.cloud.vision.v1.BatchAnnotateImagesResponse;
+import com.google.cloud.vision.v1.Feature;
+import com.google.cloud.vision.v1.ImageAnnotatorClient;
+import com.google.cloud.vision.v1.ImageContext;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupIntoBatches;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * Parent class for transform utilizing Cloud Vision API.
+ *
+ * @param  Type of input PCollection.
+ */
+public abstract class AnnotateImages
+extends PTransform, 
PCollection>> {
+
+  private static final Long MIN_BATCH_SIZE = 1L;
+  priva

[jira] [Resolved] (BEAM-2112) Add support for PCollectionView in spark runner in streaming mode

2020-05-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-2112.

Fix Version/s: Not applicable
   Resolution: Duplicate

> Add support for PCollectionView in spark runner in streaming mode
> -
>
> Key: BEAM-2112
> URL: https://issues.apache.org/jira/browse/BEAM-2112
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
> Fix For: Not applicable
>
>
> As a test, Nexmark query7 can be used
> run Nexmark query7 (https://github.com/iemejia/beam/tree/BEAM-160-nexmark) in 
> streaming mode using Spark.
> Run main in
> {code}org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver{code}
> with VMOptions:
> {code} -Dspark.ui.enabled=false -DSPARK_LOCAL_IP=localhost 
> -Dsun.io.serialization.extendedDebugInfo=true {code}
> with Program arguments:
> {code}--query=7  --streaming=true --numEventGenerators=4 
> --manageResources=false --monitorJobs=true --enforceEncodability=false 
> --enforceImmutability=false{code}
> StackTrace is 
> {code}
> Exception in thread "main" java.lang.IllegalStateException: No 
> TransformEvaluator registered for UNBOUNDED transform class 
> org.apache.beam.sdk.transforms.View$CreatePCollectionView
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:518)
>   at 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$Translator.translateUnbounded(StreamingTransformTranslator.java:529)
>   at 
> org.apache.beam.runners.spark.SparkRunner$Evaluator.translate(SparkRunner.java:435)
>   at 
> org.apache.beam.runners.spark.SparkRunner$Evaluator.doVisitTransform(SparkRunner.java:405)
>   at 
> org.apache.beam.runners.spark.SparkRunner$Evaluator.visitPrimitiveTransform(SparkRunner.java:395)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:488)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:232)
>   at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:207)
>   at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:384)
>   at 
> org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:88)
>   at 
> org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:47)
>   at 
> org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$10.apply(JavaStreamingContext.scala:776)
>   at 
> org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$10.apply(JavaStreamingContext.scala:775)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:864)
>   at 
> org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:775)
>   at 
> org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:155)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:85)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:277)
>   at 
> org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1281)
>   at 
> org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
>   at 
> org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8494) Python 3.8 Support

2020-05-08 Thread yoshiki obata (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102721#comment-17102721
 ] 

yoshiki obata commented on BEAM-8494:
-

Thank you [~kamilwu].
I'll update gradle tasks in this weekend so that we can run py38 precommit 
tests via gradle.

> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9933:
---
Labels: structured-streaming  (was: )

> Support Streaming in Spark Structured Streaming Runner
> --
>
> Key: BEAM-9933
> URL: https://issues.apache.org/jira/browse/BEAM-9933
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: structured-streaming
>
> The current version of the Spark Structured Streaming Runner only support 
> Bounded IO. This ticket is to address the support of full Streaming.
> This issue is blocked because of issues on Spark with multiple aggregations
> https://issues.apache.org/jira/browse/SPARK-26655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira
Ismaël Mejía created BEAM-9933:
--

 Summary: Support Streaming in Spark Structured Streaming Runner
 Key: BEAM-9933
 URL: https://issues.apache.org/jira/browse/BEAM-9933
 Project: Beam
  Issue Type: New Feature
  Components: runner-spark
Reporter: Ismaël Mejía


The current version of the Spark Structured Streaming Runner only support 
Bounded IO. This ticket is to address the support of full Streaming.

This issue is blocked because of issues on Spark with multiple aggregations
https://issues.apache.org/jira/browse/SPARK-26655




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9933:
---
Status: Open  (was: Triage Needed)

> Support Streaming in Spark Structured Streaming Runner
> --
>
> Key: BEAM-9933
> URL: https://issues.apache.org/jira/browse/BEAM-9933
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: structured-streaming
>
> The current version of the Spark Structured Streaming Runner only support 
> Bounded IO. This ticket is to address the support of full Streaming.
> This issue is blocked because of issues on Spark with multiple aggregations
> https://issues.apache.org/jira/browse/SPARK-26655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102724#comment-17102724
 ] 

Ismaël Mejía commented on BEAM-9933:


I did an experiment to reproduce the aggregation issue in pure Spark here
https://github.com/iemejia/spark-playground


> Support Streaming in Spark Structured Streaming Runner
> --
>
> Key: BEAM-9933
> URL: https://issues.apache.org/jira/browse/BEAM-9933
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: structured-streaming
>
> The current version of the Spark Structured Streaming Runner only support 
> Bounded IO. This ticket is to address the support of full Streaming.
> This issue is blocked because of issues on Spark with multiple aggregations
> https://issues.apache.org/jira/browse/SPARK-26655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9933:
---
Description: 
The current version of the Spark Structured Streaming Runner only support 
Bounded IO. This ticket is to address the support of full Streaming.

This issue is blocked because of issues on Spark with multiple aggregations 
SPARK-26655


  was:
The current version of the Spark Structured Streaming Runner only support 
Bounded IO. This ticket is to address the support of full Streaming.

This issue is blocked because of issues on Spark with multiple aggregations
https://issues.apache.org/jira/browse/SPARK-26655



> Support Streaming in Spark Structured Streaming Runner
> --
>
> Key: BEAM-9933
> URL: https://issues.apache.org/jira/browse/BEAM-9933
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: structured-streaming
>
> The current version of the Spark Structured Streaming Runner only support 
> Bounded IO. This ticket is to address the support of full Streaming.
> This issue is blocked because of issues on Spark with multiple aggregations 
> SPARK-26655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102724#comment-17102724
 ] 

Ismaël Mejía edited comment on BEAM-9933 at 5/8/20, 4:51 PM:
-

I did an experiment to reproduce the aggregation issue in pure Spark
https://github.com/iemejia/spark-playground



was (Author: iemejia):
I did an experiment to reproduce the aggregation issue in pure Spark here
https://github.com/iemejia/spark-playground


> Support Streaming in Spark Structured Streaming Runner
> --
>
> Key: BEAM-9933
> URL: https://issues.apache.org/jira/browse/BEAM-9933
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: structured-streaming
>
> The current version of the Spark Structured Streaming Runner only support 
> Bounded IO. This ticket is to address the support of full Streaming.
> This issue is blocked because of issues on Spark with multiple aggregations 
> SPARK-26655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102734#comment-17102734
 ] 

Ismaël Mejía commented on BEAM-9933:


Spark has a checker of unsupported operations that matches this
https://github.com/apache/spark/blob/0fb607ef3744e073a09423460a79e3325b24b6ad/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L128
It can be somehow avoided by workaround it with bytecode manipulation but this 
is not an ideal solution.

> Support Streaming in Spark Structured Streaming Runner
> --
>
> Key: BEAM-9933
> URL: https://issues.apache.org/jira/browse/BEAM-9933
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: structured-streaming
>
> The current version of the Spark Structured Streaming Runner only support 
> Bounded IO. This ticket is to address the support of full Streaming.
> This issue is blocked because of issues on Spark with multiple aggregations 
> SPARK-26655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9907) apache_beam.transforms.external_test.ExternalTransformTest.test_nested flaky

2020-05-08 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-9907.
-
Fix Version/s: 2.22.0
   Resolution: Fixed

> apache_beam.transforms.external_test.ExternalTransformTest.test_nested flaky
> 
>
> Key: BEAM-9907
> URL: https://issues.apache.org/jira/browse/BEAM-9907
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Ning Kang
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Example test failures:
> https://builds.apache.org/job/beam_PreCommit_Python_Commit/12682/
> https://builds.apache.org/job/beam_PreCommit_Python_Commit/12684/
> A stacktrace
> {code:bash}
> apache_beam.transforms.external_test.ExternalTransformTest.test_nested (from 
> py37-cloud)
> Failing for the past 1 build (Since Failed#12682 )
> Took 54 ms.
> Error Message
> google.protobuf.json_format.ParseError: Unexpected type for Value message.
> Stacktrace
> self =  testMethod=test_nested>
> def test_nested(self):
>   with beam.Pipeline() as p:
> >   assert_that(p | FibTransform(6), equal_to([8]))
> apache_beam/transforms/external_test.py:250: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/transforms/ptransform.py:562: in __ror__
> result = p.apply(self, pvalueish, label)
> apache_beam/pipeline.py:651: in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> apache_beam/runners/runner.py:198: in apply
> return m(transform, input, options)
> apache_beam/runners/runner.py:228: in apply_PTransform
> return transform.expand(input)
> apache_beam/runners/portability/expansion_service_test.py:257: in expand
> expansion_service.ExpansionServiceServicer())
> apache_beam/pvalue.py:140: in __or__
> return self.pipeline.apply(ptransform, self)
> apache_beam/pipeline.py:598: in apply
> transform.transform, pvalueish, label or transform.label)
> apache_beam/pipeline.py:608: in apply
> return self.apply(transform, pvalueish)
> apache_beam/pipeline.py:651: in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> apache_beam/runners/runner.py:198: in apply
> return m(transform, input, options)
> apache_beam/runners/runner.py:228: in apply_PTransform
> return transform.expand(input)
> apache_beam/transforms/external.py:322: in expand
> pipeline_options=job_utils.pipeline_options_dict_to_struct(options))
> apache_beam/runners/job/utils.py:38: in pipeline_options_dict_to_struct
> v in options.items() if v is not None
> apache_beam/runners/job/utils.py:44: in dict_to_struct
> return json_format.ParseDict(dict_obj, struct_pb2.Struct())
> target/.tox-py37-cloud/py37-cloud/lib/python3.7/site-packages/google/protobuf/json_format.py:450:
>  in ParseDict
> parser.ConvertMessage(js_dict, message)
> target/.tox-py37-cloud/py37-cloud/lib/python3.7/site-packages/google/protobuf/json_format.py:479:
>  in ConvertMessage
> methodcaller(_WKTJSONMETHODS[full_name][1], value, message)(self)
> target/.tox-py37-cloud/py37-cloud/lib/python3.7/site-packages/google/protobuf/json_format.py:667:
>  in _ConvertStructMessage
> self._ConvertValueMessage(value[key], message.fields[key])
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> value =  0x7f35a4c00390>
> message = 
> def _ConvertValueMessage(self, value, message):
>   """Convert a JSON representation into Value message."""
>   if isinstance(value, dict):
> self._ConvertStructMessage(value, message.struct_value)
>   elif isinstance(value, list):
> self. _ConvertListValueMessage(value, message.list_value)
>   elif value is None:
> message.null_value = 0
>   elif isinstance(value, bool):
> message.bool_value = value
>   elif isinstance(value, six.string_types):
> message.string_value = value
>   elif isinstance(value, _INT_OR_FLOAT):
> message.number_value = value
>   else:
> >   raise ParseError('Unexpected type for Value message.')
> E   google.protobuf.json_format.ParseError: Unexpected type for Value 
> message.
> target/.tox-py37-cloud/py37-cloud/lib/python3.7/site-packages/google/protobuf/json_format.py:647:
>  ParseError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102734#comment-17102734
 ] 

Ismaël Mejía edited comment on BEAM-9933 at 5/8/20, 4:54 PM:
-

Spark has a [checker of unsupported operations that matches 
this|https://github.com/apache/spark/blob/0fb607ef3744e073a09423460a79e3325b24b6ad/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L128]
This also explained in the [Spark structured streaming programming 
guide|https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations]
It can be somehow avoided by workaround it with bytecode manipulation but this 
is not an ideal solution.


was (Author: iemejia):
Spark has a checker of unsupported operations that matches this
https://github.com/apache/spark/blob/0fb607ef3744e073a09423460a79e3325b24b6ad/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala#L128
It can be somehow avoided by workaround it with bytecode manipulation but this 
is not an ideal solution.

> Support Streaming in Spark Structured Streaming Runner
> --
>
> Key: BEAM-9933
> URL: https://issues.apache.org/jira/browse/BEAM-9933
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: structured-streaming
>
> The current version of the Spark Structured Streaming Runner only support 
> Bounded IO. This ticket is to address the support of full Streaming.
> This issue is blocked because of issues on Spark with multiple aggregations 
> SPARK-26655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9907) apache_beam.transforms.external_test.ExternalTransformTest.test_nested flaky

2020-05-08 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102737#comment-17102737
 ] 

Brian Hulette commented on BEAM-9907:
-

closed! thanks

> apache_beam.transforms.external_test.ExternalTransformTest.test_nested flaky
> 
>
> Key: BEAM-9907
> URL: https://issues.apache.org/jira/browse/BEAM-9907
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Ning Kang
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Example test failures:
> https://builds.apache.org/job/beam_PreCommit_Python_Commit/12682/
> https://builds.apache.org/job/beam_PreCommit_Python_Commit/12684/
> A stacktrace
> {code:bash}
> apache_beam.transforms.external_test.ExternalTransformTest.test_nested (from 
> py37-cloud)
> Failing for the past 1 build (Since Failed#12682 )
> Took 54 ms.
> Error Message
> google.protobuf.json_format.ParseError: Unexpected type for Value message.
> Stacktrace
> self =  testMethod=test_nested>
> def test_nested(self):
>   with beam.Pipeline() as p:
> >   assert_that(p | FibTransform(6), equal_to([8]))
> apache_beam/transforms/external_test.py:250: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/transforms/ptransform.py:562: in __ror__
> result = p.apply(self, pvalueish, label)
> apache_beam/pipeline.py:651: in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> apache_beam/runners/runner.py:198: in apply
> return m(transform, input, options)
> apache_beam/runners/runner.py:228: in apply_PTransform
> return transform.expand(input)
> apache_beam/runners/portability/expansion_service_test.py:257: in expand
> expansion_service.ExpansionServiceServicer())
> apache_beam/pvalue.py:140: in __or__
> return self.pipeline.apply(ptransform, self)
> apache_beam/pipeline.py:598: in apply
> transform.transform, pvalueish, label or transform.label)
> apache_beam/pipeline.py:608: in apply
> return self.apply(transform, pvalueish)
> apache_beam/pipeline.py:651: in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> apache_beam/runners/runner.py:198: in apply
> return m(transform, input, options)
> apache_beam/runners/runner.py:228: in apply_PTransform
> return transform.expand(input)
> apache_beam/transforms/external.py:322: in expand
> pipeline_options=job_utils.pipeline_options_dict_to_struct(options))
> apache_beam/runners/job/utils.py:38: in pipeline_options_dict_to_struct
> v in options.items() if v is not None
> apache_beam/runners/job/utils.py:44: in dict_to_struct
> return json_format.ParseDict(dict_obj, struct_pb2.Struct())
> target/.tox-py37-cloud/py37-cloud/lib/python3.7/site-packages/google/protobuf/json_format.py:450:
>  in ParseDict
> parser.ConvertMessage(js_dict, message)
> target/.tox-py37-cloud/py37-cloud/lib/python3.7/site-packages/google/protobuf/json_format.py:479:
>  in ConvertMessage
> methodcaller(_WKTJSONMETHODS[full_name][1], value, message)(self)
> target/.tox-py37-cloud/py37-cloud/lib/python3.7/site-packages/google/protobuf/json_format.py:667:
>  in _ConvertStructMessage
> self._ConvertValueMessage(value[key], message.fields[key])
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> value =  0x7f35a4c00390>
> message = 
> def _ConvertValueMessage(self, value, message):
>   """Convert a JSON representation into Value message."""
>   if isinstance(value, dict):
> self._ConvertStructMessage(value, message.struct_value)
>   elif isinstance(value, list):
> self. _ConvertListValueMessage(value, message.list_value)
>   elif value is None:
> message.null_value = 0
>   elif isinstance(value, bool):
> message.bool_value = value
>   elif isinstance(value, six.string_types):
> message.string_value = value
>   elif isinstance(value, _INT_OR_FLOAT):
> message.number_value = value
>   else:
> >   raise ParseError('Unexpected type for Value message.')
> E   google.protobuf.json_format.ParseError: Unexpected type for Value 
> message.
> target/.tox-py37-cloud/py37-cloud/lib/python3.7/site-packages/google/protobuf/json_format.py:647:
>  ParseError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9933) Support Streaming in Spark Structured Streaming Runner

2020-05-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102724#comment-17102724
 ] 

Ismaël Mejía edited comment on BEAM-9933 at 5/8/20, 4:55 PM:
-

I did [an experiment to reproduce the double aggregation issue in pure 
Spark|https://github.com/iemejia/spark-playground]


was (Author: iemejia):
I did an experiment to reproduce the aggregation issue in pure Spark
https://github.com/iemejia/spark-playground


> Support Streaming in Spark Structured Streaming Runner
> --
>
> Key: BEAM-9933
> URL: https://issues.apache.org/jira/browse/BEAM-9933
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: structured-streaming
>
> The current version of the Spark Structured Streaming Runner only support 
> Bounded IO. This ticket is to address the support of full Streaming.
> This issue is blocked because of issues on Spark with multiple aggregations 
> SPARK-26655



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9771) colab links in example notebooks don't work

2020-05-08 Thread David Cavazos (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cavazos resolved BEAM-9771.
-
Fix Version/s: 2.22.0
   Resolution: Fixed

> colab links in example notebooks don't work
> ---
>
> Key: BEAM-9771
> URL: https://issues.apache.org/jira/browse/BEAM-9771
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python
>Reporter: Ahmet Altay
>Assignee: David Cavazos
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example:
> https://github.com/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/map-py.ipynb
> Error:
> Notebook not found
> There was an error loading this notebook. Ensure that the file is accessible 
> and try again.
> Ensure that you have permission to view this notebook in GitHub and authorize 
> Colaboratory to use the GitHub API.
> https://github.com/apache/beam/blob/master/Users/dcavazos/src/beam/examples/notebooks/documentation/transforms/python/elementwise/map-py.ipynb
> I believe this is true for all files in that folder at least. I did not check 
> other places.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7389) Colab examples for element-wise transforms (Python)

2020-05-08 Thread David Cavazos (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cavazos resolved BEAM-7389.
-
Fix Version/s: 2.16.0
   Resolution: Fixed

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 76.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=432197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432197
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 08/May/20 17:34
Start Date: 08/May/20 17:34
Worklog Time Spent: 10m 
  Work Description: chadrik commented on a change in pull request #11632:
URL: https://github.com/apache/beam/pull/11632#discussion_r422273845



##
File path: sdks/python/apache_beam/dataframe/convert.py
##
@@ -16,13 +16,23 @@
 
 from __future__ import absolute_import
 
+import typing
+
 import inspect
 
 from apache_beam import pvalue
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frame_base
 from apache_beam.dataframe import transforms
 
+if typing.TYPE_CHECKING:
+  # pylint: disable=ungrouped-imports
+  from typing import Any
+  from typing import Dict
+  from typing import Tuple
+  from typing import Union

Review comment:
   What's the lint error?   Is it because of the unused `typing` import?  
   
   I'm confused because unguarded typing imports are used all over the beam 
codebase without any lint errors.  Check `pipeline`, `pipeline_context`, 
`pipeline_options`, for starters. 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432197)
Time Spent: 82h 20m  (was: 82h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 82h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4782) Enforce KV coders for MultiMap side inputs

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4782?focusedWorklogId=432198&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432198
 ]

ASF GitHub Bot logged work on BEAM-4782:


Author: ASF GitHub Bot
Created on: 08/May/20 17:36
Start Date: 08/May/20 17:36
Worklog Time Spent: 10m 
  Work Description: ibzib opened a new pull request #11643:
URL: https://github.com/apache/beam/pull/11643


   Just some minor code cleanup. R: @udim 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_Po

[jira] [Commented] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2020-05-08 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102781#comment-17102781
 ] 

Udi Meiri commented on BEAM-3713:
-

Yes, see #7949

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2020-05-08 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102780#comment-17102780
 ] 

Udi Meiri commented on BEAM-3713:
-

Yes, see #7949

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=432202&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432202
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 08/May/20 17:48
Start Date: 08/May/20 17:48
Worklog Time Spent: 10m 
  Work Description: mszb commented on a change in pull request #11210:
URL: https://github.com/apache/beam/pull/11210#discussion_r422280614



##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio_test.py
##
@@ -499,6 +499,7 @@ def test_batch_byte_size(
   # and each bach should contains 25 mutations.
   res = (
   p | beam.Create(mutation_group)
+  | 'combine to list' >> beam.combiners.ToList()

Review comment:
   The user does not have to add ToList transform in the production 
pipeline. I only added this to test the batch process.
   The previous implementation of batching (without ToList transform) was as 
per the java implementation but without the sorting of the transactions by 
table and primary key (this is also documented as a feature to be added later). 

##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py
##
@@ -1008,31 +1007,30 @@ def _reset_count(self):
 self._cells = 0
 
   def process(self, element):
-mg_info = element.info
+for elem in element:

Review comment:
   Make sense, in that case, we don't need to alter the connector code 
anymore, it was working as expected. Thanks, @chamikaramj for the feedback as 
it is always helpful.
   I'll remove the changes from the spanner io connector and update the IT test 
code for the assertion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432202)
Time Spent: 10.5h  (was: 10h 20m)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=432203&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432203
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 08/May/20 17:53
Start Date: 08/May/20 17:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#11210:
URL: https://github.com/apache/beam/pull/11210#discussion_r422282706



##
File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py
##
@@ -1008,31 +1007,30 @@ def _reset_count(self):
 self._cells = 0
 
   def process(self, element):
-mg_info = element.info
+for elem in element:

Review comment:
   Thanks. Lemme know when this is ready for another look. Also lets 
trigger the IT with new changes to make sure it passes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432203)
Time Spent: 10h 40m  (was: 10.5h)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=432206&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432206
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 08/May/20 17:56
Start Date: 08/May/20 17:56
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8457:
URL: https://github.com/apache/beam/pull/8457#issuecomment-625940125


   +1 for starting a new PR. It's surprising to hear that Jenkins IT trigger 
does not capture your updates. Hopefully you'll not run into this in the new 
PR. If you do prob. worth an email to the dev list to check if someone else has 
run into that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432206)
Time Spent: 47h  (was: 46h 50m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 47h
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9883) Generalize SDF-validating restrictions

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9883?focusedWorklogId=432209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432209
 ]

ASF GitHub Bot logged work on BEAM-9883:


Author: ASF GitHub Bot
Created on: 08/May/20 18:10
Start Date: 08/May/20 18:10
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11605:
URL: https://github.com/apache/beam/pull/11605#issuecomment-625946118


   LGTM thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432209)
Time Spent: 50m  (was: 40m)

> Generalize SDF-validating restrictions
> --
>
> Key: BEAM-9883
> URL: https://issues.apache.org/jira/browse/BEAM-9883
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We have some restrictions written for the purpose of validating that SDFs 
> work in sdf_invokers_test.go, but they can be improved and generalized. The 
> main improvement is changing the validation approach so that the restriction 
> keeps track of each method it's had called on it. Then this can be 
> generalized so that it can be used in upcoming integration tests to validate 
> which methods are being called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4782) Enforce KV coders for MultiMap side inputs

2020-05-08 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102815#comment-17102815
 ] 

Kyle Weaver commented on BEAM-4782:
---

Looks like this was fixed by https://github.com/apache/beam/pull/8654.

> Enforce KV coders for MultiMap side inputs
> --
>
> Key: BEAM-4782
> URL: https://issues.apache.org/jira/browse/BEAM-4782
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Non-Python runners don't understand that the (default) FastPrimitivesCoder 
> may be a KV coder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4782) Enforce KV coders for MultiMap side inputs

2020-05-08 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-4782.
---
Fix Version/s: 2.14.0
 Assignee: Udi Meiri  (was: Robert Bradshaw)
   Resolution: Fixed

> Enforce KV coders for MultiMap side inputs
> --
>
> Key: BEAM-4782
> URL: https://issues.apache.org/jira/browse/BEAM-4782
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Non-Python runners don't understand that the (default) FastPrimitivesCoder 
> may be a KV coder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=432213&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432213
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 08/May/20 18:32
Start Date: 08/May/20 18:32
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r422302302



##
File path: website/www/site/content/en/blog/capability-matrix.md
##
@@ -0,0 +1,604 @@
+---
+title:  "Clarifying & Formalizing Runner Capabilities"
+date:   2016-03-17 11:00:00 -0700
+categories:
+  - beam
+  - capability
+aliases:
+  - /beam/capability/2016/03/17/capability-matrix.html
+authors:
+  - fjp
+  - takidau
+
+capability-matrix-snapshot:

Review comment:
   Would it be possible to keep this in it's separate yaml file and just 
reference it here rather than in-lining? That would reduce the diff some.
   
   Not a blocker, just a nice-to-have
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432213)
Time Spent: 8h 50m  (was: 8h 40m)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=432214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432214
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 08/May/20 18:38
Start Date: 08/May/20 18:38
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r422305429



##
File path: build.gradle
##
@@ -86,14 +86,18 @@ rat {
 // JDBC package config files
 "**/META-INF/services/java.sql.Driver",
 
-// Ruby build files
+// Website build files
 "**/Gemfile.lock",
 "**/Rakefile",
 "**/.htaccess",
-"website/src/_sass/_bootstrap.scss",
-"website/src/_sass/bootstrap/**/*",
-"website/src/js/bootstrap*.js",
-"website/src/js/bootstrap/**/*",
+"website/www/site/assets/scss/_bootstrap.scss",
+"website/www/site/assets/scss/bootstrap/**/*",
+"website/www/site/static/js/bootstrap*.js",
+"website/www/site/static/js/bootstrap/**/*",
+"website/www/site/static/.htaccess",

Review comment:
   nit: I think this is redundant because of the "**/.htaccess" above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432214)
Time Spent: 9h  (was: 8h 50m)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9835) test_multimap_multiside_input failing on Spark Python

2020-05-08 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102843#comment-17102843
 ] 

Kyle Weaver commented on BEAM-9835:
---

I am leaving this as a starter task for an incoming intern.

The failure can be reproduced by running the following command in your local 
Beam repo:

./gradlew :sdks:python:test-suites:portable:py2:sparkValidatesRunner 
-Ptests="test_multimap_multiside_input"

This test uses the same PCollection as a side input multiple times. The reason 
this test fails is that, since the Spark portable runner keys broadcasts [1] by 
PCollection ID, we end up with duplicate keys. Since PCollections are 
immutable, it is only necessary to broadcast a PCollection once, no matter how 
many times it is used as a side input.

Extra credit: does this same bug affect the classic Spark runner? If so, that 
should be fixed as well.

[1] 
https://spark.apache.org/docs/2.4.5/api/java/org/apache/spark/broadcast/Broadcast.html

> test_multimap_multiside_input failing on Spark Python
> -
>
> Key: BEAM-9835
> URL: https://issues.apache.org/jira/browse/BEAM-9835
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Priority: Major
>
> beam_PostCommit_Python_VR_Spark is red.
> 18:32:46 ERROR: test_multimap_multiside_input (__main__.SparkRunnerTest)
> 18:32:46 
> --
> 18:32:46 Traceback (most recent call last):
> 18:32:46   File 
> "apache_beam/runners/portability/fn_api_runner/fn_runner_test.py", line 265, 
> in test_multimap_multiside_input
> 18:32:46 equal_to([('a', [1, 3], [1, 2, 3]), ('b', [2], [1, 2, 3])]))
> 18:32:46   File "apache_beam/pipeline.py", line 529, in __exit__
> 18:32:46 self.run().wait_until_finish()
> 18:32:46   File "apache_beam/runners/portability/portable_runner.py", line 
> 571, in wait_until_finish
> 18:32:46 (self._job_id, self._state, self._last_error_message()))
> 18:32:46 RuntimeError: Pipeline 
> test_multimap_multiside_input_1588026700.62_3808162b-fc6a-4eb0-be3a-3efd819560f7
>  failed in state FAILED: java.lang.IllegalArgumentException: Multiple entries 
> with same key: 
> ref_PCollection_PCollection_21=(Broadcast(37),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))
>  and 
> ref_PCollection_PCollection_21=(Broadcast(36),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (BEAM-9835) test_multimap_multiside_input failing on Spark Python

2020-05-08 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-9835:
--
Comment: was deleted

(was: I am leaving this as a starter task for an incoming intern.

The failure can be reproduced by running the following command in your local 
Beam repo:

./gradlew :sdks:python:test-suites:portable:py2:sparkValidatesRunner 
-Ptests="test_multimap_multiside_input"

This test uses the same PCollection as a side input multiple times. The reason 
this test fails is that, since the Spark portable runner keys broadcasts [1] by 
PCollection ID, we end up with duplicate keys. Since PCollections are 
immutable, it is only necessary to broadcast a PCollection once, no matter how 
many times it is used as a side input.

Extra credit: does this same bug affect the classic Spark runner? If so, that 
should be fixed as well.

[1] 
https://spark.apache.org/docs/2.4.5/api/java/org/apache/spark/broadcast/Broadcast.html)

> test_multimap_multiside_input failing on Spark Python
> -
>
> Key: BEAM-9835
> URL: https://issues.apache.org/jira/browse/BEAM-9835
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Priority: Major
>
> beam_PostCommit_Python_VR_Spark is red.
> 18:32:46 ERROR: test_multimap_multiside_input (__main__.SparkRunnerTest)
> 18:32:46 
> --
> 18:32:46 Traceback (most recent call last):
> 18:32:46   File 
> "apache_beam/runners/portability/fn_api_runner/fn_runner_test.py", line 265, 
> in test_multimap_multiside_input
> 18:32:46 equal_to([('a', [1, 3], [1, 2, 3]), ('b', [2], [1, 2, 3])]))
> 18:32:46   File "apache_beam/pipeline.py", line 529, in __exit__
> 18:32:46 self.run().wait_until_finish()
> 18:32:46   File "apache_beam/runners/portability/portable_runner.py", line 
> 571, in wait_until_finish
> 18:32:46 (self._job_id, self._state, self._last_error_message()))
> 18:32:46 RuntimeError: Pipeline 
> test_multimap_multiside_input_1588026700.62_3808162b-fc6a-4eb0-be3a-3efd819560f7
>  failed in state FAILED: java.lang.IllegalArgumentException: Multiple entries 
> with same key: 
> ref_PCollection_PCollection_21=(Broadcast(37),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))
>  and 
> ref_PCollection_PCollection_21=(Broadcast(36),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=432221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432221
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 08/May/20 19:15
Start Date: 08/May/20 19:15
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r422323757



##
File path: website/Dockerfile
##
@@ -1,33 +1,65 @@
-###
-#  Licensed to the Apache Software Foundation (ASF) under one
-#  or more contributor license agreements.  See the NOTICE file
-#  distributed with this work for additional information
-#  regarding copyright ownership.  The ASF licenses this file
-#  to you under the Apache License, Version 2.0 (the
-#  "License"); you may not use this file except in compliance
-#  with the License.  You may obtain a copy of the License at
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
 #
-#  http://www.apache.org/licenses/LICENSE-2.0
+#   http://www.apache.org/licenses/LICENSE-2.0
 #
-#  Unless required by applicable law or agreed to in writing, software
-#  distributed under the License is distributed on an "AS IS" BASIS,
-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
-# limitations under the License.
-###
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
 
-# This image contains Ruby and dependencies required to build and test the Beam
-# website. It is used by tasks in build.gradle.

Review comment:
   nit: could you add a comment like this at the start of the new 
Dockerfile?
   
   Also some comments above each of the "RUN" statements below saying what 
they're doing would be nice. They're a little inscrutable by themselves, but 
some comments like "Install misc deps", "Install node", "Install yarn" would 
make it easy to inspect





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432221)
Time Spent: 9h 10m  (was: 9h)

> Migrate the Beam website from Jekyll to Hugo to enable localization of the 
> site content
> ---
>
> Key: BEAM-9876
> URL: https://issues.apache.org/jira/browse/BEAM-9876
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Enable internationalization of the Apache Beam website to increase the reach 
> of the project, and facilitate adoption and growth of its community.
> The proposal was to do this by migrating the current Apache Beam website from 
> Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making 
> it easier both for contributors and maintainers support the 
> internationalization effort.
> The further discussion on implementation can be viewed here  [2]
> [1] 
> [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E]
> [2] 
> [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9876?focusedWorklogId=43&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-43
 ]

ASF GitHub Bot logged work on BEAM-9876:


Author: ASF GitHub Bot
Created on: 08/May/20 19:17
Start Date: 08/May/20 19:17
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r422324477



##
File path: website/Dockerfile
##
@@ -1,33 +1,65 @@
-###
-#  Licensed to the Apache Software Foundation (ASF) under one
-#  or more contributor license agreements.  See the NOTICE file
-#  distributed with this work for additional information
-#  regarding copyright ownership.  The ASF licenses this file
-#  to you under the Apache License, Version 2.0 (the
-#  "License"); you may not use this file except in compliance
-#  with the License.  You may obtain a copy of the License at
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
 #
-#  http://www.apache.org/licenses/LICENSE-2.0
+#   http://www.apache.org/licenses/LICENSE-2.0
 #
-#  Unless required by applicable law or agreed to in writing, software
-#  distributed under the License is distributed on an "AS IS" BASIS,
-#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#  See the License for the specific language governing permissions and
-# limitations under the License.
-###
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
 
-# This image contains Ruby and dependencies required to build and test the Beam
-# website. It is used by tasks in build.gradle.
+FROM debian:stretch-slim
 
-FROM ruby:2.5
+SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
 
-WORKDIR /ruby
-RUN gem install bundler
-# Update buildDockerImage's inputs.files if you change this list.
-ADD Gemfile Gemfile.lock /ruby/
-RUN bundle install --deployment --path $GEM_HOME
+ENV DEBIAN_FRONTEND=noninteractive \
+LANGUAGE=C.UTF-8 \
+LANG=C.UTF-8 \
+LC_ALL=C.UTF-8 \
+LC_CTYPE=C.UTF-8 \
+LC_MESSAGES=C.UTF-8
 
-# Required for website testing using HTMLProofer.
-ENV LC_ALL C.UTF-8
+RUN apt-get update \
+&& apt-get install -y --no-install-recommends \
+ca-certificates \
+curl \
+git \
+gnupg2 \
+gosu \
+lynx \
+&& apt-get autoremove -yqq --purge \
+&& apt-get clean \
+&& rm -rf /var/lib/apt/lists/*
 
-CMD sleep 3600
+RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \
+&& apt-get update \
+&& apt-get install -y --no-install-recommends \
+nodejs \
+&& apt-get autoremove -yqq --purge \
+&& apt-get clean \
+&& rm -rf /var/lib/apt/lists/*
+
+RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add - \
+&& echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee 
/etc/apt/sources.list.d/yarn.list \
+&& apt-get update \
+&& apt-get install -y --no-install-recommends yarn \
+&& apt-get autoremove -yqq --purge \
+&& apt-get clean \
+&& rm -rf /var/lib/apt/lists/*
+
+RUN HUGOHOME="$(mktemp -d)" \
+&& export HUGOHOME \
+&& curl -sL 
https://github.com/gohugoio/hugo/releases/download/v0.68.3/hugo_extended_0.68.3_Linux-64bit.tar.gz
 > "${HUGOHOME}/hugo.tar.gz" \
+&& tar -xzvf "${HUGOHOME}/hugo.tar.gz" hugo \
+&& mv hugo /usr/local/bin/hugo \
+&& chmod +x /usr/local/bin/hugo \
+&& rm -r "${HUGOHOME}"

Review comment:
   Why not install from the debian repo with apt-get? 
https://gohugo.io/getting-started/installing/#debian-and-ubuntu
   
   If we keep it this way it would be nice to pull the version number out into 
a variable so it's easy to upgrade.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (wa

[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=432223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432223
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 08/May/20 19:24
Start Date: 08/May/20 19:24
Worklog Time Spent: 10m 
  Work Description: pulasthi commented on pull request #10888:
URL: https://github.com/apache/beam/pull/10888#issuecomment-625977538


   @iemejia Hope you are doing well. I just wanted to follow up with you if you 
had time to work on the pull request.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432223)
Time Spent: 14h 40m  (was: 14.5h)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
> Fix For: 2.22.0
>
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-5073) Enable SortRemoveRule

2020-05-08 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reopened BEAM-5073:

  Assignee: (was: Rui Wang)

> Enable SortRemoveRule
> -
>
> Key: BEAM-5073
> URL: https://issues.apache.org/jira/browse/BEAM-5073
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>
> Enable SortRemoveRule by using RelCollationTraitDef.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-5073) Enable SortRemoveRule

2020-05-08 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-5073:
---
Status: Open  (was: Triage Needed)

> Enable SortRemoveRule
> -
>
> Key: BEAM-5073
> URL: https://issues.apache.org/jira/browse/BEAM-5073
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>
> Enable SortRemoveRule by using RelCollationTraitDef.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=432224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432224
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 08/May/20 19:29
Start Date: 08/May/20 19:29
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11521:
URL: https://github.com/apache/beam/pull/11521#issuecomment-625979601


   Run XVR_Flink PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432224)
Time Spent: 20h 50m  (was: 20h 40m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7505) Create SideInput Python Load Test Jenkins Job

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7505?focusedWorklogId=432226&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432226
 ]

ASF GitHub Bot logged work on BEAM-7505:


Author: ASF GitHub Bot
Created on: 08/May/20 19:39
Start Date: 08/May/20 19:39
Worklog Time Spent: 10m 
  Work Description: tysonjh commented on a change in pull request #11236:
URL: https://github.com/apache/beam/pull/11236#discussion_r422315222



##
File path: .test-infra/jenkins/SideInputTestSuite.groovy
##
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Sextend(template)tware Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy extend(template) the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, 
sextend(template)tware
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import static LoadTestConfig.extendTemplate
+
+
+class SideInputTestSuite {
+static def configurations = { LoadTestConfig template -> [
+extendTemplate(template) {
+title 'SideInput 2MB 100 byte records: global window'

Review comment:
   Changing the number of records for the main input to be 1 is fine for 
some of the tests but not all of them. For example, for tests that are expected 
to access the entire side input, if the side input is a list or iterable, then 
having 1 input record would be fine. We can assume that that 1 input record 
iterates through the entire side input.
   
   If however the side input is a dict, to simulate accessing all the side 
input records, the main input and side input should have the same number of 
records.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432226)
Time Spent: 7h 20m  (was: 7h 10m)

> Create SideInput Python Load Test Jenkins Job
> -
>
> Key: BEAM-7505
> URL: https://issues.apache.org/jira/browse/BEAM-7505
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9835) test_multimap_multiside_input failing on Spark Python

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9835?focusedWorklogId=432227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432227
 ]

ASF GitHub Bot logged work on BEAM-9835:


Author: ASF GitHub Bot
Created on: 08/May/20 19:43
Start Date: 08/May/20 19:43
Worklog Time Spent: 10m 
  Work Description: ibzib opened a new pull request #11644:
URL: https://github.com/apache/beam/pull/11644


   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.a

[jira] [Assigned] (BEAM-9835) test_multimap_multiside_input failing on Spark Python

2020-05-08 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver reassigned BEAM-9835:
-

Assignee: Kyle Weaver

> test_multimap_multiside_input failing on Spark Python
> -
>
> Key: BEAM-9835
> URL: https://issues.apache.org/jira/browse/BEAM-9835
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> beam_PostCommit_Python_VR_Spark is red.
> 18:32:46 ERROR: test_multimap_multiside_input (__main__.SparkRunnerTest)
> 18:32:46 
> --
> 18:32:46 Traceback (most recent call last):
> 18:32:46   File 
> "apache_beam/runners/portability/fn_api_runner/fn_runner_test.py", line 265, 
> in test_multimap_multiside_input
> 18:32:46 equal_to([('a', [1, 3], [1, 2, 3]), ('b', [2], [1, 2, 3])]))
> 18:32:46   File "apache_beam/pipeline.py", line 529, in __exit__
> 18:32:46 self.run().wait_until_finish()
> 18:32:46   File "apache_beam/runners/portability/portable_runner.py", line 
> 571, in wait_until_finish
> 18:32:46 (self._job_id, self._state, self._last_error_message()))
> 18:32:46 RuntimeError: Pipeline 
> test_multimap_multiside_input_1588026700.62_3808162b-fc6a-4eb0-be3a-3efd819560f7
>  failed in state FAILED: java.lang.IllegalArgumentException: Multiple entries 
> with same key: 
> ref_PCollection_PCollection_21=(Broadcast(37),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))
>  and 
> ref_PCollection_PCollection_21=(Broadcast(36),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9835) test_multimap_multiside_input failing on Spark Python

2020-05-08 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-9835:
--
Status: Open  (was: Triage Needed)

> test_multimap_multiside_input failing on Spark Python
> -
>
> Key: BEAM-9835
> URL: https://issues.apache.org/jira/browse/BEAM-9835
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> beam_PostCommit_Python_VR_Spark is red.
> 18:32:46 ERROR: test_multimap_multiside_input (__main__.SparkRunnerTest)
> 18:32:46 
> --
> 18:32:46 Traceback (most recent call last):
> 18:32:46   File 
> "apache_beam/runners/portability/fn_api_runner/fn_runner_test.py", line 265, 
> in test_multimap_multiside_input
> 18:32:46 equal_to([('a', [1, 3], [1, 2, 3]), ('b', [2], [1, 2, 3])]))
> 18:32:46   File "apache_beam/pipeline.py", line 529, in __exit__
> 18:32:46 self.run().wait_until_finish()
> 18:32:46   File "apache_beam/runners/portability/portable_runner.py", line 
> 571, in wait_until_finish
> 18:32:46 (self._job_id, self._state, self._last_error_message()))
> 18:32:46 RuntimeError: Pipeline 
> test_multimap_multiside_input_1588026700.62_3808162b-fc6a-4eb0-be3a-3efd819560f7
>  failed in state FAILED: java.lang.IllegalArgumentException: Multiple entries 
> with same key: 
> ref_PCollection_PCollection_21=(Broadcast(37),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))
>  and 
> ref_PCollection_PCollection_21=(Broadcast(36),WindowedValue$FullWindowedValueCoder(KvCoder(ByteArrayCoder,VarLongCoder),GlobalWindow$Coder))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9639) Abstract bundle execution logic from stage execution logic

2020-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9639?focusedWorklogId=432228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432228
 ]

ASF GitHub Bot logged work on BEAM-9639:


Author: ASF GitHub Bot
Created on: 08/May/20 19:46
Start Date: 08/May/20 19:46
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11270:
URL: https://github.com/apache/beam/pull/11270#discussion_r422338193



##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py
##
@@ -240,6 +240,30 @@ def test_multimap_side_input(self):
   lambda k, d: (k, sorted(d[k])), beam.pvalue.AsMultiMap(side)),
   equal_to([('a', [1, 3]), ('b', [2])]))
 
+  def test_multimap_multiside_input(self):

Review comment:
   Thanks for reporting Boyuan, this was a flaw with the Spark runner. Fix: 
#11644





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 432228)
Time Spent: 6h 10m  (was: 6h)

> Abstract bundle execution logic from stage execution logic
> --
>
> Key: BEAM-9639
> URL: https://issues.apache.org/jira/browse/BEAM-9639
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The FnApiRunner currently works on a per-stage manner, and does not abstract 
> single-bundle execution much. This work item is to clearly define the code to 
> execute a single bundle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >