[jira] [Commented] (BEAM-10168) Add Github "publish release" to release guide

2020-06-01 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121291#comment-17121291
 ] 

Udi Meiri commented on BEAM-10168:
--

I uploaded my GPG public key used to sign the release and I get the same 
unverified message for 2.18. Probably because the key is tied to my @apache.org 
addr but I create commits using my @google.com addr.

> Add Github "publish release" to release guide
> -
>
> Key: BEAM-10168
> URL: https://issues.apache.org/jira/browse/BEAM-10168
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> Github does not recognize tags as full-fledged releases unless they are 
> published through the Github API/UI. We need to add this step to the release 
> guide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10158) [Python] Reuse a shared unbounded thread pool

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10158?focusedWorklogId=439698&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439698
 ]

ASF GitHub Bot logged work on BEAM-10158:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 19:49
Start Date: 01/Jun/20 19:49
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11867:
URL: https://github.com/apache/beam/pull/11867#issuecomment-637066948


   LGTM.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439698)
Time Spent: 1.5h  (was: 1h 20m)

> [Python] Reuse a shared unbounded thread pool
> -
>
> Key: BEAM-10158
> URL: https://issues.apache.org/jira/browse/BEAM-10158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> During testing we create a lot of thread pools many of which we don't 
> shutdown which can lead to thread exhaustion on some machiens.
>  
> Swapping to use a shared thread pool will decrease the memory overhead for 
> these unused threads and allow for greater reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=439702&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439702
 ]

ASF GitHub Bot logged work on BEAM-9785:


Author: ASF GitHub Bot
Created on: 01/Jun/20 20:03
Start Date: 01/Jun/20 20:03
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-637073369


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439702)
Time Spent: 2h  (was: 1h 50m)

> Add PostCommit suite for Python 3.8
> ---
>
> Key: BEAM-9785
> URL: https://issues.apache.org/jira/browse/BEAM-9785
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: yoshiki obata
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add PostCommit suites for Python 3.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=439703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439703
 ]

ASF GitHub Bot logged work on BEAM-9785:


Author: ASF GitHub Bot
Created on: 01/Jun/20 20:04
Start Date: 01/Jun/20 20:04
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-637073561


   run seed job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439703)
Time Spent: 2h 10m  (was: 2h)

> Add PostCommit suite for Python 3.8
> ---
>
> Key: BEAM-9785
> URL: https://issues.apache.org/jira/browse/BEAM-9785
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: yoshiki obata
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add PostCommit suites for Python 3.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9094) Support setting some options such as endpoint_url and credential infos for AWS S3 Filesystem in Python SDKs

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9094?focusedWorklogId=439704&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439704
 ]

ASF GitHub Bot logged work on BEAM-9094:


Author: ASF GitHub Bot
Created on: 01/Jun/20 20:08
Start Date: 01/Jun/20 20:08
Worklog Time Spent: 10m 
  Work Description: barrettpoth commented on pull request #10560:
URL: https://github.com/apache/beam/pull/10560#issuecomment-637075672


   Are there any updates on this? Is there a way to set the endpoint_url in 
order to use local s3 like minio?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439704)
Time Spent: 1h 50m  (was: 1h 40m)

> Support setting some options such as endpoint_url and credential infos for 
> AWS S3 Filesystem in Python SDKs
> ---
>
> Key: BEAM-9094
> URL: https://issues.apache.org/jira/browse/BEAM-9094
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas
>Affects Versions: 2.19.0
>Reporter: Keunhyun Oh
>Assignee: Keunhyun Oh
>Priority: P3
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> AWS S3 File System is implemented in BEAM-2572.
> To use local s3 like minio, It is need to support setting some options such 
> as endpoint_url and credential infos.
> In my idea, it can be implemented to use environment variables.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9094) Support setting some options such as endpoint_url and credential infos for AWS S3 Filesystem in Python SDKs

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9094?focusedWorklogId=439710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439710
 ]

ASF GitHub Bot logged work on BEAM-9094:


Author: ASF GitHub Bot
Created on: 01/Jun/20 20:17
Start Date: 01/Jun/20 20:17
Worklog Time Spent: 10m 
  Work Description: barrettpoth-eog commented on pull request #10560:
URL: https://github.com/apache/beam/pull/10560#issuecomment-637079645


   Are there any updates on this? Is there a way to set the endpoint_url in 
order to use local s3 like minio?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439710)
Time Spent: 2h  (was: 1h 50m)

> Support setting some options such as endpoint_url and credential infos for 
> AWS S3 Filesystem in Python SDKs
> ---
>
> Key: BEAM-9094
> URL: https://issues.apache.org/jira/browse/BEAM-9094
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas
>Affects Versions: 2.19.0
>Reporter: Keunhyun Oh
>Assignee: Keunhyun Oh
>Priority: P3
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> AWS S3 File System is implemented in BEAM-2572.
> To use local s3 like minio, It is need to support setting some options such 
> as endpoint_url and credential infos.
> In my idea, it can be implemented to use environment variables.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9094) Support setting some options such as endpoint_url and credential infos for AWS S3 Filesystem in Python SDKs

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9094?focusedWorklogId=439711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439711
 ]

ASF GitHub Bot logged work on BEAM-9094:


Author: ASF GitHub Bot
Created on: 01/Jun/20 20:18
Start Date: 01/Jun/20 20:18
Worklog Time Spent: 10m 
  Work Description: barrettpoth removed a comment on pull request #10560:
URL: https://github.com/apache/beam/pull/10560#issuecomment-637075672


   Are there any updates on this? Is there a way to set the endpoint_url in 
order to use local s3 like minio?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439711)
Time Spent: 2h 10m  (was: 2h)

> Support setting some options such as endpoint_url and credential infos for 
> AWS S3 Filesystem in Python SDKs
> ---
>
> Key: BEAM-9094
> URL: https://issues.apache.org/jira/browse/BEAM-9094
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas
>Affects Versions: 2.19.0
>Reporter: Keunhyun Oh
>Assignee: Keunhyun Oh
>Priority: P3
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> AWS S3 File System is implemented in BEAM-2572.
> To use local s3 like minio, It is need to support setting some options such 
> as endpoint_url and credential infos.
> In my idea, it can be implemented to use environment variables.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10158) [Python] Reuse a shared unbounded thread pool

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10158?focusedWorklogId=439715&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439715
 ]

ASF GitHub Bot logged work on BEAM-10158:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 20:27
Start Date: 01/Jun/20 20:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11867:
URL: https://github.com/apache/beam/pull/11867#issuecomment-637084591


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439715)
Time Spent: 1h 40m  (was: 1.5h)

> [Python] Reuse a shared unbounded thread pool
> -
>
> Key: BEAM-10158
> URL: https://issues.apache.org/jira/browse/BEAM-10158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> During testing we create a lot of thread pools many of which we don't 
> shutdown which can lead to thread exhaustion on some machiens.
>  
> Swapping to use a shared thread pool will decrease the memory overhead for 
> these unused threads and allow for greater reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=439716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439716
 ]

ASF GitHub Bot logged work on BEAM-9785:


Author: ASF GitHub Bot
Created on: 01/Jun/20 20:29
Start Date: 01/Jun/20 20:29
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-637085217


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439716)
Time Spent: 2h 20m  (was: 2h 10m)

> Add PostCommit suite for Python 3.8
> ---
>
> Key: BEAM-9785
> URL: https://issues.apache.org/jira/browse/BEAM-9785
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: yoshiki obata
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Add PostCommit suites for Python 3.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=439718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439718
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 20:32
Start Date: 01/Jun/20 20:32
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-637086679


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439718)
Time Spent: 3h 50m  (was: 3h 40m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10112) Add python sdk state and timer examples to website

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10112?focusedWorklogId=439719&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439719
 ]

ASF GitHub Bot logged work on BEAM-10112:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 20:35
Start Date: 01/Jun/20 20:35
Worklog Time Spent: 10m 
  Work Description: y1chi opened a new pull request #11882:
URL: https://github.com/apache/beam/pull/11882


   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://build

[jira] [Work logged] (BEAM-10112) Add python sdk state and timer examples to website

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10112?focusedWorklogId=439720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439720
 ]

ASF GitHub Bot logged work on BEAM-10112:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 20:36
Start Date: 01/Jun/20 20:36
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #11882:
URL: https://github.com/apache/beam/pull/11882#issuecomment-637088326


   R: @angoenka 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439720)
Time Spent: 20m  (was: 10m)

> Add python sdk state and timer examples to website
> --
>
> Key: BEAM-10112
> URL: https://issues.apache.org/jira/browse/BEAM-10112
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=439725&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439725
 ]

ASF GitHub Bot logged work on BEAM-9785:


Author: ASF GitHub Bot
Created on: 01/Jun/20 20:48
Start Date: 01/Jun/20 20:48
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-637093914


   @epicfaace looks like the newly added postcommit fails, PTAL: 
https://builds.apache.org/job/beam_PostCommit_Python38_PR/1/console



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439725)
Time Spent: 2.5h  (was: 2h 20m)

> Add PostCommit suite for Python 3.8
> ---
>
> Key: BEAM-9785
> URL: https://issues.apache.org/jira/browse/BEAM-9785
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: yoshiki obata
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Add PostCommit suites for Python 3.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10166) Improve execution time errors

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10166?focusedWorklogId=439728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439728
 ]

ASF GitHub Bot logged work on BEAM-10166:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 20:53
Start Date: 01/Jun/20 20:53
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11881:
URL: https://github.com/apache/beam/pull/11881#issuecomment-637096008


   Run Go PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439728)
Time Spent: 0.5h  (was: 20m)

> Improve execution time errors
> -
>
> Key: BEAM-10166
> URL: https://issues.apache.org/jira/browse/BEAM-10166
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Priority: P3
>  Labels: beginner, n00b, starter
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Go SDK uses errors returned by DoFns to signal failures to process 
> bundles, and terminate bundle processing. However, if the preceding DoFn uses 
> emitters, rather than error returns, the code has no choice to panic to avoid 
> user code handling or ignoring the cross DoFn error (which could cause 
> dataloss or other correctness problems). 
> All bundle executions are wrapped in `callNoPanic` to prevent worker 
> termination on such panics, and orderly terminate just the affected bundle 
> instead.`callNoPanic` uses Go's built in recover mechanism to get the error 
> and provide a stack trace.
> We can do better.
> The value returned by recover is just an interface{} which means we could 
> detect the specific type of error it is. In particular, we could have the 
> exec package have an error that we can detect. If the recovered value is that 
> error, then we could use that to provide a clearer error message  than a 
> panic stack trace.
> Such an error wrapper would contain: the error in question, the user DoFn 
> that caused it, the debug id of the DoFn node (To be related back to the 
> plan.)
> Then in `callNoPanic` we could detect this error wrapper and produce a 
> clearer error message based on the existing plan. If not, we can maintain the 
> current behavior. This latter part is necessary to handle panics originating 
> in user code. 
> To avoid mistaken user use which would breach this protocol, we're best off 
> keeping the wrapper unexported from the exec package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10158) [Python] Reuse a shared unbounded thread pool

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10158?focusedWorklogId=439731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439731
 ]

ASF GitHub Bot logged work on BEAM-10158:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 20:57
Start Date: 01/Jun/20 20:57
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11867:
URL: https://github.com/apache/beam/pull/11867#issuecomment-637097925


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439731)
Time Spent: 1h 50m  (was: 1h 40m)

> [Python] Reuse a shared unbounded thread pool
> -
>
> Key: BEAM-10158
> URL: https://issues.apache.org/jira/browse/BEAM-10158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> During testing we create a lot of thread pools many of which we don't 
> shutdown which can lead to thread exhaustion on some machiens.
>  
> Swapping to use a shared thread pool will decrease the memory overhead for 
> these unused threads and allow for greater reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9388) Consider using github actions for building python wheels and more (aka. Transition from Travis)

2020-06-01 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121328#comment-17121328
 ] 

Ahmet Altay commented on BEAM-9388:
---

Adding more details:

We can add a Beam setup for running github actions:

- One of these actions is to take Beam repo master branch at a recent commit; 
build wheel files; Optionally publish them to a temporary GCS location.
- It is possible to manually trigger the same job on the release branch. This 
version needs to stage its output to a given GCS location. Signing is out of 
scope.
- Action will produce the same wheel set as 
(https://github.com/apache/beam-wheels) (e.g. different python version 
linux/mac) and an additional wheel version for windows.
- Same github action also produces a tarball of the sdk
- https://github.com/apache/beam-wheels - is deprecated. (i.e. removed from 
release notes, https://github.com/apache/beam-wheels is deleted or has a readme 
to not use it)
- Beam docs are updated to explain how to use this Github action.

> Consider using github actions for building python wheels and more (aka. 
> Transition from Travis)
> ---
>
> Key: BEAM-9388
> URL: https://issues.apache.org/jira/browse/BEAM-9388
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-py-core
>Reporter: Ahmet Altay
>Priority: P2
>
> Context on the mailing list: 
> https://lists.apache.org/thread.html/r4a7d34e64a34e9fe589d06aec74d9b464d252c516fe96c35b2d6c9ae%40%3Cdev.beam.apache.org%3E
> github actions instead of travis to for building python wheels during 
> releases. This will have the following advantages:
> - We will eliminate one repo. (If you don't know, we have 
> https://github.com/apache/beam-wheels for the sole purpose of building wheels 
> file.)
> - Workflow will be stored in the same repo. This will prevent bit rot that is 
> only discovered at release times. (happened a few times, although usually 
> easy to fix.)
> - github actions supports ubuntu, mac, windows environments. We could try to 
> build wheels for windows as well. (Travis also supports the same environments 
> but we only use linux and mac environments. Maybe there are other blockers 
> for building wheels for Windows.)
> - We could do more, like daily python builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-06-01 Thread Damon Douglas (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damon Douglas updated BEAM-9679:

Description: 
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
|Combine Simple Function|[11866|https://github.com/apache/beam/pull/11866]| 
Closed|
|CombineFn| |Open|
|Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |

  was:
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
|Combine:
Simple Function|[
11866|https://github.com/apache/beam/pull/11866]| 
Closed|
|Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |


> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
> |Combine Simple Function|[11866|https://github.com/apache/beam/pull/11866]| 
> Closed|
> |CombineFn| |Open|
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9953) Beam ZetaSQL supports multiple statements in a query

2020-06-01 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121333#comment-17121333
 ] 

Kyle Weaver commented on BEAM-9953:
---

While it's true we only need to get table names from the last SELECT statement, 
parsing out the last SELECT statement is non-trivial. analyzeNextStatement 
depends on the extracted tables. We might be able to do this:
 # analyze next statement
 # if analyze succeeded, continue.
 # if analyze failed due to "table not found," extract tables and re-analyze 
with the extracted tables.

But that seems hacky.

> Beam ZetaSQL supports multiple statements in a query
> 
>
> Key: BEAM-9953
> URL: https://issues.apache.org/jira/browse/BEAM-9953
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kyle Weaver
>Priority: P2
>
> One example of multiple  statements query:
> {code:java}
> CREATE FUNCTION fun_a (param_1 INT64); SELECT fun_a(10);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=439734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439734
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 01/Jun/20 21:04
Start Date: 01/Jun/20 21:04
Worklog Time Spent: 10m 
  Work Description: damondouglas opened a new pull request #11883:
URL: https://github.com/apache/beam/pull/11883


   This pull requests adds a Combine/CombineFn lesson to the Go SDK katas.  I 
would like to request the following reviewers:
   
   (R: @lostluck )
   (R: @henryken )
   
   If accepted, please wait until the [Stepik 
course](https://stepik.org/course/70387) is updated before finally merging this 
PR.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompleted

[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-06-01 Thread Damon Douglas (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damon Douglas updated BEAM-9679:

Description: 
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
|Combine Simple Function|[11866|https://github.com/apache/beam/pull/11866]| 
Closed|
|CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open|
|Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |

  was:
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
|Combine Simple Function|[11866|https://github.com/apache/beam/pull/11866]| 
Closed|
|CombineFn| |Open|
|Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |


> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
> |Combine Simple Function|[11866|https://github.com/apache/beam/pull/11866]| 
> Closed|
> |CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open|
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-06-01 Thread Damon Douglas (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damon Douglas updated BEAM-9679:

Description: 
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
|Combine Simple 
Function|[11866|https://github.com/apache/beam/pull/11866]|Closed|
|CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open|
|Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |

  was:
A kata devoted to core beam transforms patterns after 
[https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
 where the take away is an individual's ability to master the following using 
an Apache Beam pipeline using the Golang SDK.

 
||Transform||Pull Request||Status||
|Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
|GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
|CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
|Combine Simple Function|[11866|https://github.com/apache/beam/pull/11866]| 
Closed|
|CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open|
|Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
|Partition| | |
|Side Input| | |
|Side Output| | |
|Branching| | |
|Composite Transform| | |
|DoFn Additional Parameters| | |


> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
> |Combine Simple 
> Function|[11866|https://github.com/apache/beam/pull/11866]|Closed|
> |CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open|
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8201) clean up the current container API

2020-06-01 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw resolved BEAM-8201.
---
Fix Version/s: 2.21.0
   Resolution: Fixed

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.21.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10154) Stray version number in SQL overview

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10154?focusedWorklogId=439739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439739
 ]

ASF GitHub Bot logged work on BEAM-10154:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 21:11
Start Date: 01/Jun/20 21:11
Worklog Time Spent: 10m 
  Work Description: ibzib merged pull request #11865:
URL: https://github.com/apache/beam/pull/11865


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439739)
Time Spent: 20m  (was: 10m)

> Stray version number in SQL overview
> 
>
> Key: BEAM-10154
> URL: https://issues.apache.org/jira/browse/BEAM-10154
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P4
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Not clear what it means. Probably should just delete it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9953) Beam ZetaSQL supports multiple statements in a query

2020-06-01 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121347#comment-17121347
 ] 

Rui Wang commented on BEAM-9953:


Agreed. Though hacker way can help us develop faster for SQL UDF, having 
"extractTableNamesFromNextStatement" will make this feature production-ready. 

> Beam ZetaSQL supports multiple statements in a query
> 
>
> Key: BEAM-9953
> URL: https://issues.apache.org/jira/browse/BEAM-9953
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kyle Weaver
>Priority: P2
>
> One example of multiple  statements query:
> {code:java}
> CREATE FUNCTION fun_a (param_1 INT64); SELECT fun_a(10);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=439745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439745
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 01/Jun/20 21:24
Start Date: 01/Jun/20 21:24
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11884:
URL: https://github.com/apache/beam/pull/11884#issuecomment-637109865


   R: @aaltay 
   R: @rohdesamuel 
   
   PTAL, thx!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439745)
Time Spent: 23h  (was: 22h 50m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 23h
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10024?focusedWorklogId=439753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439753
 ]

ASF GitHub Bot logged work on BEAM-10024:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 21:41
Start Date: 01/Jun/20 21:41
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11739:
URL: https://github.com/apache/beam/pull/11739#issuecomment-637128985


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439753)
Time Spent: 1h 10m  (was: 1h)

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10024) Spark runner failing testOutputTimestampDefault

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10024?focusedWorklogId=439754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439754
 ]

ASF GitHub Bot logged work on BEAM-10024:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 21:42
Start Date: 01/Jun/20 21:42
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11739:
URL: https://github.com/apache/beam/pull/11739#discussion_r433506182



##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
##
@@ -3539,6 +3539,7 @@ public void onTimer() {}
 @Category({
   ValidatesRunner.class,
   UsesTimersInParDo.class,
+  UsesUnboundedPCollections.class,

Review comment:
   I split the test into batch and streaming variants, PTAL





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439754)
Time Spent: 1h 20m  (was: 1h 10m)

> Spark runner failing testOutputTimestampDefault
> ---
>
> Key: BEAM-10024
> URL: https://issues.apache.org/jira/browse/BEAM-10024
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: currently-failing
> Fix For: 2.22.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is causing postcommit to fail
> java.lang.UnsupportedOperationException: Found TimerId annotations on 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests$12, but DoFn cannot yet 
> be used with timers in the SparkRunner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=439761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439761
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 01/Jun/20 21:50
Start Date: 01/Jun/20 21:50
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #11870:
URL: https://github.com/apache/beam/pull/11870#discussion_r433509338



##
File path: sdks/go/pkg/beam/testing/passert/count.go
##
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package passert
+
+import (
+   "fmt"
+
+   "github.com/apache/beam/sdks/go/pkg/beam"
+   "github.com/apache/beam/sdks/go/pkg/beam/core/typex"
+)
+
+func Count(s beam.Scope, col beam.PCollection, name string, count int) {
+   s = s.Scope(fmt.Sprintf("passert.Count(%v)", name))
+
+   if typex.IsKV(col.Type()) {
+   col = beam.DropKey(s, col)
+   }
+   counted := beam.Combine(s, &elmCountCombineFn{}, col)
+   Equals(s, counted, count)

Review comment:
   Can the Sum transform be re-used instead?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439761)
Time Spent: 3h 50m  (was: 3h 40m)

> Create Go SDK synthetic sources.
> 
>
> Key: BEAM-9951
> URL: https://issues.apache.org/jira/browse/BEAM-9951
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: P2
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Create synthetic sources for the Go SDK like 
> [Java|https://github.com/apache/beam/tree/master/sdks/java/io/synthetic/src/main/java/org/apache/beam/sdk/io/synthetic]
>  and 
> [Python|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/synthetic_pipeline.py]
>  have.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9388) Consider using github actions for building python wheels and more (aka. Transition from Travis)

2020-06-01 Thread Bruce Arctor (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121372#comment-17121372
 ] 

Bruce Arctor commented on BEAM-9388:


[~TobKed] – Very happy to collaborate, though it looks like you are well on 
your way; let's figure out what's sensible, as I wouldn't want to slow you 
down.  While I'm interested in helping with builds/automation, it's also quite 
valuable for this to get accomplished to minimize manual effort and increase 
what is possible for Beam.  

 

[~altay] – That's helpful/clarifying.

 

> Consider using github actions for building python wheels and more (aka. 
> Transition from Travis)
> ---
>
> Key: BEAM-9388
> URL: https://issues.apache.org/jira/browse/BEAM-9388
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-py-core
>Reporter: Ahmet Altay
>Priority: P2
>
> Context on the mailing list: 
> https://lists.apache.org/thread.html/r4a7d34e64a34e9fe589d06aec74d9b464d252c516fe96c35b2d6c9ae%40%3Cdev.beam.apache.org%3E
> github actions instead of travis to for building python wheels during 
> releases. This will have the following advantages:
> - We will eliminate one repo. (If you don't know, we have 
> https://github.com/apache/beam-wheels for the sole purpose of building wheels 
> file.)
> - Workflow will be stored in the same repo. This will prevent bit rot that is 
> only discovered at release times. (happened a few times, although usually 
> easy to fix.)
> - github actions supports ubuntu, mac, windows environments. We could try to 
> build wheels for windows as well. (Travis also supports the same environments 
> but we only use linux and mac environments. Maybe there are other blockers 
> for building wheels for Windows.)
> - We could do more, like daily python builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=439767&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439767
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 01/Jun/20 22:12
Start Date: 01/Jun/20 22:12
Worklog Time Spent: 10m 
  Work Description: KevinGG edited a comment on pull request #11884:
URL: https://github.com/apache/beam/pull/11884#issuecomment-637109865


   R: @robertwb 
   R: @rohdesamuel 
   
   PTAL, thx!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439767)
Time Spent: 23h 10m  (was: 23h)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Time Spent: 23h 10m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10158) [Python] Reuse a shared unbounded thread pool

2020-06-01 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-10158.
--
Fix Version/s: 2.23.0
   Resolution: Fixed

> [Python] Reuse a shared unbounded thread pool
> -
>
> Key: BEAM-10158
> URL: https://issues.apache.org/jira/browse/BEAM-10158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> During testing we create a lot of thread pools many of which we don't 
> shutdown which can lead to thread exhaustion on some machiens.
>  
> Swapping to use a shared thread pool will decrease the memory overhead for 
> these unused threads and allow for greater reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10158) [Python] Reuse a shared unbounded thread pool

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10158?focusedWorklogId=439769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439769
 ]

ASF GitHub Bot logged work on BEAM-10158:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 22:16
Start Date: 01/Jun/20 22:16
Worklog Time Spent: 10m 
  Work Description: lukecwik merged pull request #11867:
URL: https://github.com/apache/beam/pull/11867


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439769)
Time Spent: 2h  (was: 1h 50m)

> [Python] Reuse a shared unbounded thread pool
> -
>
> Key: BEAM-10158
> URL: https://issues.apache.org/jira/browse/BEAM-10158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> During testing we create a lot of thread pools many of which we don't 
> shutdown which can lead to thread exhaustion on some machiens.
>  
> Swapping to use a shared thread pool will decrease the memory overhead for 
> these unused threads and allow for greater reuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=439770&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439770
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 22:17
Start Date: 01/Jun/20 22:17
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-637154715







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439770)
Time Spent: 4h  (was: 3h 50m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10169) ParDo* functions should declare the correct output N in their error message

2020-06-01 Thread Robert Burke (Jira)
Robert Burke created BEAM-10169:
---

 Summary: ParDo* functions should declare the correct output N in 
their error message
 Key: BEAM-10169
 URL: https://issues.apache.org/jira/browse/BEAM-10169
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Robert Burke


User report noted the confusion in the error if you use a DoFn with 0 outputs 
with beam.ParDo instead of beam.ParDo0. 

In that case, a panic stack trace is followed by the cryptic: "expected 1 
output. Found: []"

We can do better.

While we can't change the return signature dynamically (that's for ParDoN 
only), we can instead clearly indicate: 
*  the DoFn in question.
* the number of outputs the DoFn has
* and recommend using ParDo0, ParDo, ParDo2,...ParDo7,  or ParDoN, as 
appropriate.

https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 would 
need to change as well as any of the specific cases that follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10170) TextBasedReader may not respect the source end offset

2020-06-01 Thread Jessica Wise (Jira)
Jessica Wise created BEAM-10170:
---

 Summary: TextBasedReader may not respect the source end offset
 Key: BEAM-10170
 URL: https://issues.apache.org/jira/browse/BEAM-10170
 Project: Beam
  Issue Type: Bug
  Components: beam-model
Reporter: Jessica Wise


[TextBasedReader|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L88]
 is backed by a TextSource, which may have a start and end offset. If the end 
offset does not correspond to a delimiter, the TextBasedReader will not respect 
the end offset and will instead read past the end offset to the next instance 
of a delimiter. See 
[TextBasedReader#findDelimiterBounds|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L176]
 which finds the end of the next record to read: this method will "consume the 
channel till either EOF or the delimiter bounds are found."  I believe this is 
a bug because this method should also check for the end offset, not just EOF or 
a delimiter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10170) TextBasedReader may not respect the source end offset

2020-06-01 Thread Jessica Wise (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jessica Wise updated BEAM-10170:

Description: 
[TextBasedReader|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L88]
 is backed by a TextSource, which may have a start and end offset. If the end 
offset does not correspond to a delimiter, the TextBasedReader will not respect 
the end offset and will instead read past the end offset to the next instance 
of a delimiter. See 
[TextBasedReader#findDelimiterBounds|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L176]
 which finds the end of the next record to read: this method will "consume the 
channel till either EOF or the delimiter bounds are found."  I believe this is 
a bug and that this method should also check for the end offset, not just EOF 
or a delimiter.  (was: 
[TextBasedReader|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L88]
 is backed by a TextSource, which may have a start and end offset. If the end 
offset does not correspond to a delimiter, the TextBasedReader will not respect 
the end offset and will instead read past the end offset to the next instance 
of a delimiter. See 
[TextBasedReader#findDelimiterBounds|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L176]
 which finds the end of the next record to read: this method will "consume the 
channel till either EOF or the delimiter bounds are found."  I believe this is 
a bug because this method should also check for the end offset, not just EOF or 
a delimiter.)

> TextBasedReader may not respect the source end offset
> -
>
> Key: BEAM-10170
> URL: https://issues.apache.org/jira/browse/BEAM-10170
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Jessica Wise
>Priority: P2
>
> [TextBasedReader|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L88]
>  is backed by a TextSource, which may have a start and end offset. If the end 
> offset does not correspond to a delimiter, the TextBasedReader will not 
> respect the end offset and will instead read past the end offset to the next 
> instance of a delimiter. See 
> [TextBasedReader#findDelimiterBounds|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextSource.java#L176]
>  which finds the end of the next record to read: this method will "consume 
> the channel till either EOF or the delimiter bounds are found."  I believe 
> this is a bug and that this method should also check for the end offset, not 
> just EOF or a delimiter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10063) Run pandas doctests for Beam dataframes API.

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10063?focusedWorklogId=439784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439784
 ]

ASF GitHub Bot logged work on BEAM-10063:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 23:01
Start Date: 01/Jun/20 23:01
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11785:
URL: https://github.com/apache/beam/pull/11785#discussion_r433527036



##
File path: sdks/python/apache_beam/dataframe/doctests.py
##
@@ -242,6 +242,11 @@ def __init__(self, env, use_beam=True, **kwargs):
 **kwargs)
 
   def run(self, test, **kwargs):
+for example in test.examples:
+  if example.exc_msg is None:
+# Don't fail doctests that raise this error.
+example.exc_msg = (
+'apache_beam.dataframe.frame_base.WontImplementError: ...')

Review comment:
   The doctest docs say that `exc_msg != None` indicates an example is 
_expected_ to generate an exception matching the description. In practice I 
guess it really means the example is _allowed_ to throw such an exception?

##
File path: sdks/python/apache_beam/dataframe/frames.py
##
@@ -24,7 +24,8 @@
 
 @frame_base.DeferredFrame._register_for(pd.Series)
 class DeferredSeries(frame_base.DeferredFrame):
-  pass
+  def __array__(self, dtype=None):
+raise frame_base.WontImplementError('non-deferred')

Review comment:
   Can you make this message more descriptive?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439784)
Time Spent: 40m  (was: 0.5h)

> Run pandas doctests for Beam dataframes API.
> 
>
> Key: BEAM-10063
> URL: https://issues.apache.org/jira/browse/BEAM-10063
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121391#comment-17121391
 ] 

Kenneth Knowles edited comment on BEAM-10115 at 6/1/20, 11:05 PM:
--

I looked around a bit more and found 
https://stackoverflow.com/questions/59815620/gcloud-upload-httplib2-redirectmissinglocation-redirected-but-the-response-is-m
 which points to 
https://github.com/googleapis/google-api-python-client/issues/803. The issue is 
an incompatibility with an httplib2 version 0.16.0. Pinning is a workaround. I 
have not confirmed this is the Beam problem.


was (Author: kenn):
I looked around a bit more and found 
https://stackoverflow.com/questions/59815620/gcloud-upload-httplib2-redirectmissinglocation-redirected-but-the-response-is-m
 which points to 
https://github.com/googleapis/google-api-python-client/issues/803. The issue is 
an incompatibility with an httplib2 version. Pinning is a workaround. I have 
not confirmed this is the Beam problem.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121391#comment-17121391
 ] 

Kenneth Knowles commented on BEAM-10115:


I looked around a bit more and found 
https://stackoverflow.com/questions/59815620/gcloud-upload-httplib2-redirectmissinglocation-redirected-but-the-response-is-m
 which points to 
https://github.com/googleapis/google-api-python-client/issues/803. The issue is 
an incompatibility with an httplib2 version. Pinning is a workaround. I have 
not confirmed this is the Beam problem.

> Staging requirements.txt fails but staging setup.py succeeds
> 
>
> Key: BEAM-10115
> URL: https://issues.apache.org/jira/browse/BEAM-10115
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kenneth Knowles
>Priority: P2
>
> User reports on StackOverflow: 
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> The issue appears to be a problem with staging, and a difference between 
> using `requirements.txt` and `setup.py` for some reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=439788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439788
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 23:14
Start Date: 01/Jun/20 23:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-637174452


   Existing spark failure is due to BEAM-10024



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439788)
Time Spent: 4h 10m  (was: 4h)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=439790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439790
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 23:15
Start Date: 01/Jun/20 23:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-637174817


   This is ready for review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439790)
Time Spent: 4.5h  (was: 4h 20m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=439789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439789
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 23:15
Start Date: 01/Jun/20 23:15
Worklog Time Spent: 10m 
  Work Description: lukecwik removed a comment on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-637086679







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439789)
Time Spent: 4h 20m  (was: 4h 10m)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10036) More flexible dataframes partitioning.

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10036?focusedWorklogId=439798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439798
 ]

ASF GitHub Bot logged work on BEAM-10036:
-

Author: ASF GitHub Bot
Created on: 01/Jun/20 23:34
Start Date: 01/Jun/20 23:34
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11766:
URL: https://github.com/apache/beam/pull/11766#discussion_r433537957



##
File path: sdks/python/apache_beam/dataframe/partitionings.py
##
@@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from typing import Any
+from typing import Iterable
+from typing import TypeVar
+
+import pandas as pd
+
+Frame = TypeVar('Frame', bound=pd.core.generic.NDFrame)
+
+
+class Partitioning(object):
+  """A class representing a (consistent) partitioning of dataframe objects.
+  """
+  def is_subpartition_of(self, other):
+# type: (Partitioning) -> bool
+
+"""Returns whether self is a sub-partition of other.
+
+Specifically, returns whether something partitioned by self is necissarily
+also partitioned by other.
+"""
+raise NotImplementedError
+
+  def partition_fn(self, df):
+# type: (Frame) -> Iterable[Tuple[Any, Frame]]
+
+"""A callable that actually performs the partitioning of a Frame df.
+
+This will be invoked via a FlatMap in conjunction with a GroupKey to
+achieve the desired partitioning.
+"""
+raise NotImplementedError
+
+
+class Index(Partitioning):
+  """A partitioning by index (either fully or partially).
+
+  If the set of "levels" of the index to consider is not specified, the entire
+  index is used.
+
+  These form a partial order, given by
+
+  Nothing() < Index([i]) < Index([i, j]) < ... < Index() < Singleton()

Review comment:
   This ordering is determined by `is_subpartition_of` correct? I wonder if 
there's a way to clearly say that in this docstring?

##
File path: sdks/python/apache_beam/dataframe/frames_test.py
##
@@ -23,6 +23,7 @@
 
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frame_base
+from apache_beam.dataframe import frames  # pylint: disable=unused-import

Review comment:
   What is this for?

##
File path: sdks/python/apache_beam/dataframe/partitionings.py
##
@@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from typing import Any
+from typing import Iterable
+from typing import TypeVar
+
+import pandas as pd
+
+Frame = TypeVar('Frame', bound=pd.core.generic.NDFrame)
+
+
+class Partitioning(object):
+  """A class representing a (consistent) partitioning of dataframe objects.
+  """
+  def is_subpartition_of(self, other):
+# type: (Partitioning) -> bool
+
+"""Returns whether self is a sub-partition of other.
+
+Specifically, returns whether something partitioned by self is necissarily
+also partitioned by other.
+"""
+raise NotImplementedError
+
+  def partition_fn(self, df):
+# type: (Frame) -> Iterable[Tuple[Any, Frame]]
+
+"""A callable that actually performs the partitioning of a Frame df.
+
+This will be invoked via a FlatMap in conjunction with a GroupKey to
+achieve the desired partitioning.
+"""
+raise NotImplementedError
+
+

[jira] [Assigned] (BEAM-10068) Modify behavior of Dynamic Destinations

2020-06-01 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-10068:


Assignee: Pablo Estrada

> Modify behavior of Dynamic Destinations
> ---
>
> Key: BEAM-10068
> URL: https://issues.apache.org/jira/browse/BEAM-10068
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Mihir Borkar
>Assignee: Pablo Estrada
>Priority: P2
>
> The writeDynamic() method, implementing Dynamic Destinations writes files per 
> destination per window per pane. 
> This leads to an increase in the number of files generated.
> The request is as follows:
> A way to make it possible for the user to modify the behavior of Dynamic 
> Destinations to control the number of output files being produced.
> a.) We can consider adding user-configurable parameters like writers per 
> bundle, increasing number of records processed per bundle
> and/or
> b.) Introduce a method implementing Dynamic Destinations but more dependent 
> on the data passing through the pipeline, instead of windows/panes.
> So instead of splitting every output file into roughly the number of 
> destinations being written to, we let the user configure how output files 
> should be divided across destinations.
> Links:
> [1] 
> [https://beam.apache.org/releases/javadoc/2.19.0/index.html?org/apache/beam/sdk/io/FileIO.html]
> [2] 
> [https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10068) Modify behavior of Dynamic Destinations

2020-06-01 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-10068:
-
Issue Type: New Feature  (was: Improvement)

> Modify behavior of Dynamic Destinations
> ---
>
> Key: BEAM-10068
> URL: https://issues.apache.org/jira/browse/BEAM-10068
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Mihir Borkar
>Priority: P2
>
> The writeDynamic() method, implementing Dynamic Destinations writes files per 
> destination per window per pane. 
> This leads to an increase in the number of files generated.
> The request is as follows:
> A way to make it possible for the user to modify the behavior of Dynamic 
> Destinations to control the number of output files being produced.
> a.) We can consider adding user-configurable parameters like writers per 
> bundle, increasing number of records processed per bundle
> and/or
> b.) Introduce a method implementing Dynamic Destinations but more dependent 
> on the data passing through the pipeline, instead of windows/panes.
> So instead of splitting every output file into roughly the number of 
> destinations being written to, we let the user configure how output files 
> should be divided across destinations.
> Links:
> [1] 
> [https://beam.apache.org/releases/javadoc/2.19.0/index.html?org/apache/beam/sdk/io/FileIO.html]
> [2] 
> [https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10068) Modify behavior of Dynamic Destinations

2020-06-01 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121408#comment-17121408
 ] 

Pablo Estrada commented on BEAM-10068:
--

Made this into a new feature -type issue. [~mborkar] can you tell if this 
corresponds to a feature allowing `specifying per-destination numShards`?

> Modify behavior of Dynamic Destinations
> ---
>
> Key: BEAM-10068
> URL: https://issues.apache.org/jira/browse/BEAM-10068
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Mihir Borkar
>Assignee: Reuven Lax
>Priority: P2
>
> The writeDynamic() method, implementing Dynamic Destinations writes files per 
> destination per window per pane. 
> This leads to an increase in the number of files generated.
> The request is as follows:
> A way to make it possible for the user to modify the behavior of Dynamic 
> Destinations to control the number of output files being produced.
> a.) We can consider adding user-configurable parameters like writers per 
> bundle, increasing number of records processed per bundle
> and/or
> b.) Introduce a method implementing Dynamic Destinations but more dependent 
> on the data passing through the pipeline, instead of windows/panes.
> So instead of splitting every output file into roughly the number of 
> destinations being written to, we let the user configure how output files 
> should be divided across destinations.
> Links:
> [1] 
> [https://beam.apache.org/releases/javadoc/2.19.0/index.html?org/apache/beam/sdk/io/FileIO.html]
> [2] 
> [https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10068) Modify behavior of Dynamic Destinations

2020-06-01 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-10068:


Assignee: Reuven Lax  (was: Pablo Estrada)

> Modify behavior of Dynamic Destinations
> ---
>
> Key: BEAM-10068
> URL: https://issues.apache.org/jira/browse/BEAM-10068
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Mihir Borkar
>Assignee: Reuven Lax
>Priority: P2
>
> The writeDynamic() method, implementing Dynamic Destinations writes files per 
> destination per window per pane. 
> This leads to an increase in the number of files generated.
> The request is as follows:
> A way to make it possible for the user to modify the behavior of Dynamic 
> Destinations to control the number of output files being produced.
> a.) We can consider adding user-configurable parameters like writers per 
> bundle, increasing number of records processed per bundle
> and/or
> b.) Introduce a method implementing Dynamic Destinations but more dependent 
> on the data passing through the pipeline, instead of windows/panes.
> So instead of splitting every output file into roughly the number of 
> destinations being written to, we let the user configure how output files 
> should be divided across destinations.
> Links:
> [1] 
> [https://beam.apache.org/releases/javadoc/2.19.0/index.html?org/apache/beam/sdk/io/FileIO.html]
> [2] 
> [https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10027) Support for Kotlin-based Beam Katas

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10027?focusedWorklogId=439824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439824
 ]

ASF GitHub Bot logged work on BEAM-10027:
-

Author: ASF GitHub Bot
Created on: 02/Jun/20 00:36
Start Date: 02/Jun/20 00:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11761:
URL: https://github.com/apache/beam/pull/11761#issuecomment-637199445


   woah very exciting!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439824)
Time Spent: 13.5h  (was: 13h 20m)

> Support for Kotlin-based Beam Katas
> ---
>
> Key: BEAM-10027
> URL: https://issues.apache.org/jira/browse/BEAM-10027
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
>   Original Estimate: 8h
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Currently, there are a series of examples available demonstrating the use of 
> Apache Beam with Kotlin. It would be nice to have support for the same Beam 
> Katas that exist for Python, Go, and Java to also support Kotlin. 
> The port itself shouldn't be that involved since it can still target the JVM, 
> so it would likely just require the inclusion for Kotlin dependencies and a 
> conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9926) Certain code examples for programming guide are not showing

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9926?focusedWorklogId=439835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439835
 ]

ASF GitHub Bot logged work on BEAM-9926:


Author: ASF GitHub Bot
Created on: 02/Jun/20 01:22
Start Date: 02/Jun/20 01:22
Worklog Time Spent: 10m 
  Work Description: rosetn commented on a change in pull request #11790:
URL: https://github.com/apache/beam/pull/11790#discussion_r433569849



##
File path: website/www/site/content/en/documentation/programming-guide.md
##
@@ -527,7 +530,7 @@ The graph of this pipeline looks like the following:
 *Figure 1: A linear pipeline with three sequential transforms.*
 
 However, note that a transform *does not consume or otherwise alter* the input
-collection--remember that a `PCollection` is immutable by definition. This 
means
+collection - remember that a `PCollection` is immutable by definition. This 
means

Review comment:
   Em dash with no spaces between words is the correct usage here: —

##
File path: website/www/site/content/en/documentation/programming-guide.md
##
@@ -1842,6 +1847,9 @@ transform's intermediate data changes type multiple times.
 {{< github_sample 
"/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py"
 pipeline_monitoring_composite >}}
 {{< /highlight >}}
 
+Note that because `Count` is itself a composite transform,

Review comment:
   I'd either format this as a note (see other notes on this page) or 
remove "Note that"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439835)
Time Spent: 1.5h  (was: 1h 20m)

> Certain code examples for programming guide are not showing
> ---
>
> Key: BEAM-9926
> URL: https://issues.apache.org/jira/browse/BEAM-9926
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Ashwin Ramaswami
>Priority: P2
> Attachments: screenshot-1.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Seems like the code examples for the entire State section are missing. See 
> [https://beam.apache.org/documentation/programming-guide/#state-and-timers]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=439838&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439838
 ]

ASF GitHub Bot logged work on BEAM-8451:


Author: ASF GitHub Bot
Created on: 02/Jun/20 01:27
Start Date: 02/Jun/20 01:27
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #11706:
URL: https://github.com/apache/beam/pull/11706#issuecomment-637213391


   Run Website_Stage_GCS PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439838)
Time Spent: 2h 20m  (was: 2h 10m)

> Interactive Beam example failing from stack overflow
> 
>
> Key: BEAM-8451
> URL: https://issues.apache.org/jira/browse/BEAM-8451
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python, runner-py-interactive
>Reporter: Igor Durovic
>Assignee: Chun Yang
>Priority: P2
> Fix For: 2.18.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
>  
> RecursionError: maximum recursion depth exceeded in __instancecheck__
> at 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405]
>  
> This occurred after the execution of the last cell in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input

2020-06-01 Thread Prathap Kumar Parvathareddy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prathap Kumar Parvathareddy reassigned BEAM-10148:
--

Assignee: Prathap Kumar Parvathareddy

> Data loaded using ReadAllFromText is not projected properly as side input
> -
>
> Key: BEAM-10148
> URL: https://issues.apache.org/jira/browse/BEAM-10148
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-files
>Affects Versions: 2.19.0
> Environment: Runner: Dataflow Runner
> Beam SDK:  Python 2.19
>Reporter: Prathap Kumar Parvathareddy
>Assignee: Prathap Kumar Parvathareddy
>Priority: P2
>
> *Context:*
> Data Enrichment Pattern with 2 Sources. 
> Source 1:  Google Cloud PubSub delivering JSON messages
> Source 2:  Google Cloud Storage Files (acts as a Side Input) 
> *Steps:* 
> 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based 
> on file path inside a PCollection and convert each record as a tuple with 2 
> fields
> 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
> the main input that is being loaded from PubSub.
> 3. Expectation is that the side inputs should be available but for some 
> reason AsDict is not containing all the data that was loaded from GCS
>  
> *Possible Issues:*
>       Below are few possible issues that can be ruled out as I already 
> validated them.
>  # Window Mismatches -  Main Input window is within the scope of Side Input 
> window.
>  # Delay in updating Side Input State -  Pipeline has just 1 VM and Side 
> Input has only 8 json messages and total size of side input is around 40 KB
>  
> *Troubleshooting:*
>  # Working fine in a DirectRunner   
>  # Validated that ReadAllFromText transform is loading all the data properly 
> from multiple files with subsequent transform building KV as well. However 
> when the output of KV transform is used as a sideinput using ASDict() for 
> some reason certain elements are skipped and not available inside look up.
> *Code* 
>   Will update the link pointing to complete code shortly which helps in 
> providing more visibility.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input

2020-06-01 Thread Prathap Kumar Parvathareddy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prathap Kumar Parvathareddy reassigned BEAM-10148:
--

Assignee: (was: Prathap Kumar Parvathareddy)

> Data loaded using ReadAllFromText is not projected properly as side input
> -
>
> Key: BEAM-10148
> URL: https://issues.apache.org/jira/browse/BEAM-10148
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-files
>Affects Versions: 2.19.0
> Environment: Runner: Dataflow Runner
> Beam SDK:  Python 2.19
>Reporter: Prathap Kumar Parvathareddy
>Priority: P2
>
> *Context:*
> Data Enrichment Pattern with 2 Sources. 
> Source 1:  Google Cloud PubSub delivering JSON messages
> Source 2:  Google Cloud Storage Files (acts as a Side Input) 
> *Steps:* 
> 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based 
> on file path inside a PCollection and convert each record as a tuple with 2 
> fields
> 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to 
> the main input that is being loaded from PubSub.
> 3. Expectation is that the side inputs should be available but for some 
> reason AsDict is not containing all the data that was loaded from GCS
>  
> *Possible Issues:*
>       Below are few possible issues that can be ruled out as I already 
> validated them.
>  # Window Mismatches -  Main Input window is within the scope of Side Input 
> window.
>  # Delay in updating Side Input State -  Pipeline has just 1 VM and Side 
> Input has only 8 json messages and total size of side input is around 40 KB
>  
> *Troubleshooting:*
>  # Working fine in a DirectRunner   
>  # Validated that ReadAllFromText transform is loading all the data properly 
> from multiple files with subsequent transform building KV as well. However 
> when the output of KV transform is used as a sideinput using ASDict() for 
> some reason certain elements are skipped and not available inside look up.
> *Code* 
>   Will update the link pointing to complete code shortly which helps in 
> providing more visibility.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=439876&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439876
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 02/Jun/20 03:54
Start Date: 02/Jun/20 03:54
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-637255622


   From what I can tell, comments are addressed properly already. So merging 
this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439876)
Remaining Estimate: 85h  (was: 85h 10m)
Time Spent: 11h  (was: 10h 50m)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 11h
>  Remaining Estimate: 85h
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=439877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439877
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 02/Jun/20 03:55
Start Date: 02/Jun/20 03:55
Worklog Time Spent: 10m 
  Work Description: amaliujia merged pull request #11610:
URL: https://github.com/apache/beam/pull/11610


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439877)
Remaining Estimate: 84h 50m  (was: 85h)
Time Spent: 11h 10m  (was: 11h)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 11h 10m
>  Remaining Estimate: 84h 50m
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-758) Per-step, per-execution nonce

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-758:
-
Labels: stale-assigned  (was: )

> Per-step, per-execution nonce
> -
>
> Key: BEAM-758
> URL: https://issues.apache.org/jira/browse/BEAM-758
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: Not applicable
>Reporter: Dan Halperin
>Assignee: Sam McVeety
>Priority: P2
>  Labels: stale-assigned
>
> In the forthcoming runner API, a user will be able to save a pipeline to JSON 
> and then run it repeatedly.
> Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random 
> value (nonce). These values are typically generated at apply time, so that 
> they are deterministic (don't change across retries of DoFns) and global (are 
> the same across all workers).
> However, once the runner API lands the existing code would result in the same 
> nonce being reused across jobs. Other possible solutions:
> * Generate nonce in {{Create(1) | ParDo}} then use this as a side input. 
> Should work, as along as side inputs are actually checkpointed. But does not 
> work for {{BoundedSource}}.
> * If a nonce is only needed for the lifetime of one bundle, can be generated 
> in {{startBundle}} and used in {{finishBundle}} [or {{tearDown}}].
> * Add some context somewhere that lets user code access unique step name, and 
> somehow generate a nonce consistently e.g. by hashing. Will usually work, but 
> this is similarly not available to sources.
> Another Q: I'm not sure we have a good way to generate nonces in unbounded 
> pipelines -- we probably need one. This would enable us to, e.g., use 
> {{BigQueryIO.Write}} in an unbounded pipeline [if we had, e.g., exactly-once 
> triggering per window]. Or generalizing to multiple firings...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=439878&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-439878
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 02/Jun/20 03:55
Start Date: 02/Jun/20 03:55
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-637255904


   This really becomes a big contribution at the end! Thanks @darshanj!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 439878)
Remaining Estimate: 84h 40m  (was: 84h 50m)
Time Spent: 11h 20m  (was: 11h 10m)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: P2
>   Original Estimate: 96h
>  Time Spent: 11h 20m
>  Remaining Estimate: 84h 40m
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9780) Add a DICOM IO Connector for Google Cloud Healthcare API

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121551#comment-17121551
 ] 

Kenneth Knowles commented on BEAM-9780:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Add a DICOM IO Connector for Google Cloud Healthcare API
> 
>
> Key: BEAM-9780
> URL: https://issues.apache.org/jira/browse/BEAM-9780
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: P3
>  Labels: stale-assigned
>
> Add IO Transforms for the DICOM store in the [Google Cloud Healthcare 
> API|https://cloud.google.com/healthcare/docs/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9709) timezone off by 8 hours

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121572#comment-17121572
 ] 

Kenneth Knowles commented on BEAM-9709:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> timezone off by 8 hours
> ---
>
> Key: BEAM-9709
> URL: https://issues.apache.org/jira/browse/BEAM-9709
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Robin Qiu
>Priority: P4
>  Labels: stale-assigned, zetasql-compliance
>
> two failures in shard 13, one failure in shard 19
> {code}
> Expected: ARRAY>[{2014-01-31 00:00:00+00}]
>   Actual: ARRAY>[{2014-01-31 08:00:00+00}], 
> {code}
> {code}
> select timestamp(date '2014-01-31')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9689) Update wordcount webpage with Spark/Flink + Go

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9689:
--
Labels: stale-assigned  (was: )

> Update wordcount webpage with Spark/Flink + Go
> --
>
> Key: BEAM-9689
> URL: https://issues.apache.org/jira/browse/BEAM-9689
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P3
>  Labels: stale-assigned
>
> Currently says "This runner is not yet available for the Go SDK." which is no 
> longer true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9643) Add user-facing Go SDF documentation.

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9643:
--
Labels: stale-assigned  (was: )

> Add user-facing Go SDF documentation.
> -
>
> Key: BEAM-9643
> URL: https://issues.apache.org/jira/browse/BEAM-9643
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This means adding the documentation about how to use SDFs and the contracts 
> of all the SDF methods to the Go SDK code, as well as updating the Go SDF 
> design doc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9546) Support for batching a schema-aware PCollection and processing as a Dataframe

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121626#comment-17121626
 ] 

Kenneth Knowles commented on BEAM-9546:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Support for batching a schema-aware PCollection and processing as a Dataframe
> -
>
> Key: BEAM-9546
> URL: https://issues.apache.org/jira/browse/BEAM-9546
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: P2
>  Labels: stale-assigned
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9839) OnTimerContext should not create a new one when processing each element/timer in FnApiDoFnRunner

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9839:
--
Labels: stale-assigned  (was: )

> OnTimerContext should not create a new one when processing each element/timer 
> in FnApiDoFnRunner
> 
>
> Key: BEAM-9839
> URL: https://issues.apache.org/jira/browse/BEAM-9839
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Rehman Murad Ali
>Assignee: Rehman Murad Ali
>Priority: P2
>  Labels: stale-assigned
>
> {color:#24292e}The intent of these Context objects was to not create a new 
> one when processing each element/timer and instead to reference a member 
> variable as can be seen in:{color}
>  
> Discussed here :
> https://github.com/apache/beam/pull/11154/#discussion_r416023080
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9863) AvroUtils is converting incorrectly LogicalType Timestamps from long into Joda DateTimes

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9863:
--
Labels: stale-assigned  (was: )

> AvroUtils is converting incorrectly LogicalType Timestamps from long into 
> Joda DateTimes
> 
>
> Key: BEAM-9863
> URL: https://issues.apache.org/jira/browse/BEAM-9863
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0, 2.21.0
>Reporter: Ismaël Mejía
>Assignee: Reuven Lax
>Priority: P2
>  Labels: stale-assigned
>
> Copied from the mailing list report:
> I think the method AvroUtils.toBeamSchema has a not expected side effect. 
> I found out that, if you invoke it and then you run a pipeline of 
> GenericRecords containing a timestamp (l tried with logical-type 
> timestamp-millis), Beam converts such timestamp from long to 
> org.joda.time.DateTime. Even if you don't apply any transformation to the 
> pipeline.
> Do you think it's a bug? 
> More details on how to reproduce here:
> https://lists.apache.org/thread.html/r43fb2896e496b7493a962207eb3b95360abc30b9d091b26f110264d0%40%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9863) AvroUtils is converting incorrectly LogicalType Timestamps from long into Joda DateTimes

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121542#comment-17121542
 ] 

Kenneth Knowles commented on BEAM-9863:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> AvroUtils is converting incorrectly LogicalType Timestamps from long into 
> Joda DateTimes
> 
>
> Key: BEAM-9863
> URL: https://issues.apache.org/jira/browse/BEAM-9863
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0, 2.21.0
>Reporter: Ismaël Mejía
>Assignee: Reuven Lax
>Priority: P2
>  Labels: stale-assigned
>
> Copied from the mailing list report:
> I think the method AvroUtils.toBeamSchema has a not expected side effect. 
> I found out that, if you invoke it and then you run a pipeline of 
> GenericRecords containing a timestamp (l tried with logical-type 
> timestamp-millis), Beam converts such timestamp from long to 
> org.joda.time.DateTime. Even if you don't apply any transformation to the 
> pipeline.
> Do you think it's a bug? 
> More details on how to reproduce here:
> https://lists.apache.org/thread.html/r43fb2896e496b7493a962207eb3b95360abc30b9d091b26f110264d0%40%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121565#comment-17121565
 ] 

Kenneth Knowles commented on BEAM-9742:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Add ability to pass FluentBackoff to JdbcIo.Write
> -
>
> Key: BEAM-9742
> URL: https://issues.apache.org/jira/browse/BEAM-9742
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Akshay Iyangar
>Assignee: Akshay Iyangar
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, the FluentBackoff is hardcoded with `maxRetries` and 
> `initialBackoff` .
> It would be helpful if the client were able to pass these values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9861) BigQueryStorageStreamSource fails with split fractions of 0.0 or 1.0

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121543#comment-17121543
 ] 

Kenneth Knowles commented on BEAM-9861:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> BigQueryStorageStreamSource fails with split fractions of 0.0 or 1.0
> 
>
> Key: BEAM-9861
> URL: https://issues.apache.org/jira/browse/BEAM-9861
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Kenneth Jung
>Assignee: Kenneth Jung
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9322) Python SDK ignores manually set PCollection tags

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9322:
--
Labels: stale-assigned  (was: )

> Python SDK ignores manually set PCollection tags
> 
>
> Key: BEAM-9322
> URL: https://issues.apache.org/jira/browse/BEAM-9322
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: P1
>  Labels: stale-assigned
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The Python SDK currently ignores any tags set on PCollections manually when 
> applying PTransforms when adding the PCollection to the PTransform 
> [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]].
>  In the 
> [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]]
>  method, the tag is set to None for all PValues, meaning the output tags are 
> set to an enumeration index over the PCollection outputs. The tags are not 
> propagated to correctly which can be a problem on relying on the output 
> PCollection tags to match the user set values.
> The fix is to correct BEAM-1833, and always pass in the tags. However, that 
> doesn't fix the problem for nested PCollections. If you have a dict of lists 
> of PCollections, what should their tags be correctly set to? In order to fix 
> this, first propagate the correct tag then talk with the community about the 
> best auto-generated tags.
> Some users may rely on the old implementation, so a flag will be created: 
> "force_generated_pcollection_output_ids" and be default set to False. If 
> True, this will go to the old implementation and generate tags for 
> PCollections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9640) Track PCollection watermark across bundle executions

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9640:
--
Labels: stale-assigned  (was: )

> Track PCollection watermark across bundle executions
> 
>
> Key: BEAM-9640
> URL: https://issues.apache.org/jira/browse/BEAM-9640
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This can be done without relying on the watermark manager for execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9787) Send clear error to users trying to use BigQuerySource on FnApi pipelines on Python SDK

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9787:
--
Labels: stale-assigned  (was: )

> Send clear error to users trying to use BigQuerySource on FnApi pipelines on 
> Python SDK
> ---
>
> Key: BEAM-9787
> URL: https://issues.apache.org/jira/browse/BEAM-9787
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9850) Key should be available in @OnTimer methods (Spark Runner)

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121539#comment-17121539
 ] 

Kenneth Knowles commented on BEAM-9850:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

>  Key should be available in @OnTimer methods (Spark Runner)
> ---
>
> Key: BEAM-9850
> URL: https://issues.apache.org/jira/browse/BEAM-9850
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Reporter: Rehman Murad Ali
>Assignee: Rehman Murad Ali
>Priority: P2
>  Labels: stale-assigned
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in the state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121689#comment-17121689
 ] 

Kenneth Knowles commented on BEAM-9286:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9626) pymongo should be an optional requirement

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121605#comment-17121605
 ] 

Kenneth Knowles commented on BEAM-9626:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> pymongo should be an optional requirement
> -
>
> Key: BEAM-9626
> URL: https://issues.apache.org/jira/browse/BEAM-9626
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The pymongo driver is installed by default, but as the number of IO 
> connectors in the python sdk grows, I don't think this is the precedent we 
> want to set.  We already have "extra" packages for gcp, aws, and interactive, 
> we should also add one for mongo. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9682) Windowing | Go SDK Code Katas

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9682:
--
Labels: stale-assigned  (was: )

> Windowing | Go SDK Code Katas
> -
>
> Key: BEAM-9682
> URL: https://issues.apache.org/jira/browse/BEAM-9682
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Labels: stale-assigned
>
> A kata devoted to windowing patterned after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Windowing].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9770:
--
Labels: pipeline-patterns stale-assigned  (was: pipeline-patterns)

> Add BigQuery DeadLetter pattern to Patterns Page
> 
>
> Key: BEAM-9770
> URL: https://issues.apache.org/jira/browse/BEAM-9770
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Reza ardeshir rokni
>Priority: P4
>  Labels: pipeline-patterns, stale-assigned
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9598) _CustomBigQuerySource checks valueprovider when it's not needed

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121615#comment-17121615
 ] 

Kenneth Knowles commented on BEAM-9598:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> _CustomBigQuerySource checks valueprovider when it's not needed
> ---
>
> Key: BEAM-9598
> URL: https://issues.apache.org/jira/browse/BEAM-9598
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9763) Make _ReadFromBigQuery public

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121556#comment-17121556
 ] 

Kenneth Knowles commented on BEAM-9763:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Make _ReadFromBigQuery public
> -
>
> Key: BEAM-9763
> URL: https://issues.apache.org/jira/browse/BEAM-9763
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This means removing the underscore from it, but keeping it tagged as 
> experimental.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9781) Cross language test: -v: unary operator expected

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121550#comment-17121550
 ] 

Kenneth Knowles commented on BEAM-9781:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Cross language test:  -v: unary operator expected
> -
>
> Key: BEAM-9781
> URL: https://issues.apache.org/jira/browse/BEAM-9781
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P3
>  Labels: stale-assigned
>
> /Users/kcweaver/go/src/github.com/apache/beam/sdks/python/scripts/run_job_server.sh:
>  line 71: [: -v: unary operator expected
> This happens on my Mac:
> GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)
> Copyright (C) 2007 Free Software Foundation, Inc.
> But does not happen on my Linux desktop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9777) Support UNNEST(ARRAY[STRUCT()])

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9777:
--
Labels: stale-assigned  (was: )

> Support UNNEST(ARRAY[STRUCT()])
> ---
>
> Key: BEAM-9777
> URL: https://issues.apache.org/jira/browse/BEAM-9777
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Kenneth Knowles
>Priority: P2
>  Labels: stale-assigned
>
> It would be nice to be able to UNNEST an array of structs. For example:
> {code:sql}
> WITH pcol AS
>   (SELECT 1 AS key, ARRAY[STRUCT("abc" AS name, "rst" AS slot),STRUCT("def" 
> AS name, "uvw" AS slot)] AS promo
>   UNION ALL
>   SELECT 2 AS key, ARRAY[STRUCT("ghi" AS name, "xyz" AS slot)] AS promo)
> SELECT * FROM pcol, UNNEST(pcol.promo);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9680) Common Transforms | Go SDK Code Katas

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121585#comment-17121585
 ] 

Kenneth Knowles commented on BEAM-9680:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Common Transforms | Go SDK Code Katas
> -
>
> Key: BEAM-9680
> URL: https://issues.apache.org/jira/browse/BEAM-9680
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Labels: stale-assigned
>
> A kata devoted to common Apache beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Common%20Transforms].
>   The take away for an individual is to master the following using the Golang 
> SDK.
>  * Aggregation
>  * Filter
>  * Key/Value (i.e. 
> [https://beam.apache.org/releases/javadoc/2.19.0/org/apache/beam/sdk/values/KV.html])



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9687) Names of temporary files created by interactive runner include characters invalid on some platforms.

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9687:
--
Labels: stale-assigned  (was: )

> Names of temporary files created by interactive runner include characters 
> invalid on some platforms.
> 
>
> Key: BEAM-9687
> URL: https://issues.apache.org/jira/browse/BEAM-9687
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Sam Rohde
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Temporary files created by interactive runner in streaming scenarios include 
> pipe '|' characters, which are not allowed in filenames on Windows platform. 
> This causes test failures on a Windows platform:
> 
> python setup.py nosetests --tests 
> apache_beam/runners/interactive/pipeline_instrument_test.py:PipelineInstrumentTest.test_instrument_example_unbounded_pipeline_to_multiple_read_cache
> ==
> ERROR: Tests that the instrumenter works for multiple unbounded sources.
> --
> Traceback (most recent call last):
>   File 
> "C:\projects\apache_beam\runners\interactive\pipeline_instrument_test.py", 
> line 698, in test_instrument_example_unbounded_pipeline_to_multiple_read_cache
> self._mock_write_cache([b''], cache_key)
>   File 
> "C:\projects\apache_beam\runners\interactive\pipeline_instrument_test.py", 
> line 227, in _mock_write_cache
> ie.current_env().cache_manager().write(values, *labels)
>   File 
> "C:\projects\apache_beam\runners\interactive\caching\streaming_cache.py", 
> line 323, in write
> with open(filepath, 'ab') as f:
> IOError: [Errno 22] invalid mode ('ab') or filename: 
> 'c:\\users\\deft-t~1\\appdata\\local\\temp\\2\\interactive-temp-xwg5qi\\full\\pcoll_1|149781752|149781920|1
> 49231600'
> 
> [1] 
> https://github.com/apache/beam/blob/e6b37c44d542969b6104fc97ee6f25b6f7d2ddba/sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py#L323



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9356) Flink python test logs are too noisy

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121674#comment-17121674
 ] 

Kenneth Knowles commented on BEAM-9356:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Flink python test logs are too noisy
> 
>
> Key: BEAM-9356
> URL: https://issues.apache.org/jira/browse/BEAM-9356
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P3
>  Labels: portability-flink, stale-assigned, testing
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When running Python tests on the Flink runner, all the info logs from the 
> Flink local cluster are printed to the test log, which creates a lot of 
> noise. Especially severe for Flink Python PVR tests, which have 30+MB log 
> files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9700) Support ValueProvider for HCatalogIO

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121575#comment-17121575
 ] 

Kenneth Knowles commented on BEAM-9700:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Support ValueProvider for HCatalogIO
> 
>
> Key: BEAM-9700
> URL: https://issues.apache.org/jira/browse/BEAM-9700
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hcatalog
>Reporter: chie hayashida
>Assignee: chie hayashida
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'd like to integrate modules which use hive as input for 
> [DataflowTemplates|[https://github.com/GoogleCloudPlatform/DataflowTemplates]].
> But current HCatalogIO.java doesn't support ValueProvider.
> I'd like to add integration to support ValueProvider for HCatalogIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9785) Add PostCommit suite for Python 3.8

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9785:
--
Labels: stale-assigned  (was: )

> Add PostCommit suite for Python 3.8
> ---
>
> Key: BEAM-9785
> URL: https://issues.apache.org/jira/browse/BEAM-9785
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: yoshiki obata
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Add PostCommit suites for Python 3.8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9869:
--
Labels: stale-assigned  (was: )

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9709) timezone off by 8 hours

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9709:
--
Labels: stale-assigned zetasql-compliance  (was: zetasql-compliance)

> timezone off by 8 hours
> ---
>
> Key: BEAM-9709
> URL: https://issues.apache.org/jira/browse/BEAM-9709
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Robin Qiu
>Priority: P4
>  Labels: stale-assigned, zetasql-compliance
>
> two failures in shard 13, one failure in shard 19
> {code}
> Expected: ARRAY>[{2014-01-31 00:00:00+00}]
>   Actual: ARRAY>[{2014-01-31 08:00:00+00}], 
> {code}
> {code}
> select timestamp(date '2014-01-31')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9729) Cleanup bundle registration now that SDKs can pull.

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9729:
--
Labels: stale-assigned  (was: )

> Cleanup bundle registration now that SDKs can pull.
> ---
>
> Key: BEAM-9729
> URL: https://issues.apache.org/jira/browse/BEAM-9729
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Once all runners (in particular dataflow) support pull descriptors, we can 
> clean things up by removing the push registration code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9649) beam_python_mongoio_load_test started failing due to mismatched results

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9649:
--
Labels: stale-assigned  (was: )

> beam_python_mongoio_load_test started failing due to mismatched results
> ---
>
> Key: BEAM-9649
> URL: https://issues.apache.org/jira/browse/BEAM-9649
> Project: Beam
>  Issue Type: Task
>  Components: io-py-mongodb
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: P1
>  Labels: stale-assigned
> Attachments: j5vwSDNmTBK.png, mHP2wb3rdTG.png
>
>
> The load tests fail sometimes with a mismatched sum result for example
> [https://builds.apache.org/job/beam_python_mongoio_load_test/438/console]
> Seems sometimes the Read operation is not able to fetch all the data.
> !j5vwSDNmTBK.png|width=1005,height=752!
> !mHP2wb3rdTG.png|width=994,height=780!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9612) Implement job metrics in Spark uber jar job server

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121610#comment-17121610
 ] 

Kenneth Knowles commented on BEAM-9612:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Implement job metrics in Spark uber jar job server
> --
>
> Key: BEAM-9612
> URL: https://issues.apache.org/jira/browse/BEAM-9612
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: portability-spark, stale-assigned
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9850) Key should be available in @OnTimer methods (Spark Runner)

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9850:
--
Labels: stale-assigned  (was: )

>  Key should be available in @OnTimer methods (Spark Runner)
> ---
>
> Key: BEAM-9850
> URL: https://issues.apache.org/jira/browse/BEAM-9850
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Reporter: Rehman Murad Ali
>Assignee: Rehman Murad Ali
>Priority: P2
>  Labels: stale-assigned
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in the state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9683) Triggers | Go SDK Code Katas

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9683:
--
Labels: stale-assigned  (was: )

> Triggers | Go SDK Code Katas
> 
>
> Key: BEAM-9683
> URL: https://issues.apache.org/jira/browse/BEAM-9683
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Labels: stale-assigned
>
> A kata devoted to triggers in Apache Beam patterned after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Triggers].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9712) setting default timezone doesn't work

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121570#comment-17121570
 ] 

Kenneth Knowles commented on BEAM-9712:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> setting default timezone doesn't work
> -
>
> Key: BEAM-9712
> URL: https://issues.apache.org/jira/browse/BEAM-9712
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Robin Qiu
>Priority: P4
>  Labels: stale-assigned, zetasql-compliance
>
> several failures in shard 14
> (note: fixing the internal tests requires plumbing through the timezone 
> config.)
> {code}
> [name=timestamp_to_string_1]
> select [cast(timestamp "2015-01-28" as string),
> cast(timestamp "2015-01-28 00:00:00" as string),
> cast(timestamp "2015-01-28 00:00:00.0" as string),
> cast(timestamp "2015-01-28 00:00:00.00" as string),
> cast(timestamp "2015-01-28 00:00:00.000" as string),
> cast(timestamp "2015-01-28 00:00:00." as string),
> cast(timestamp "2015-01-28 00:00:00.0" as string),
> cast(timestamp "2015-01-28 00:00:00.00" as string)]
> --
> ARRAY>>[
>   {ARRAY[
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45"
>]}
> ]
> {code}
> {code}
> [default_time_zone=Pacific/Chatham]
> [name=timestamp_to_string_1]
> select [cast(timestamp "2015-01-28" as string),
> cast(timestamp "2015-01-28 00:00:00" as string),
> cast(timestamp "2015-01-28 00:00:00.0" as string),
> cast(timestamp "2015-01-28 00:00:00.00" as string),
> cast(timestamp "2015-01-28 00:00:00.000" as string),
> cast(timestamp "2015-01-28 00:00:00." as string),
> cast(timestamp "2015-01-28 00:00:00.0" as string),
> cast(timestamp "2015-01-28 00:00:00.00" as string)]
> --
> ARRAY>>[
>   {ARRAY[
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45",
>  "2015-01-28 00:00:00+13:45"
>]}
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9690) Go build failing: undefined: primitives.Reshuffle(KV)

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121577#comment-17121577
 ] 

Kenneth Knowles commented on BEAM-9690:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Go build failing: undefined: primitives.Reshuffle(KV)
> -
>
> Key: BEAM-9690
> URL: https://issues.apache.org/jira/browse/BEAM-9690
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Kyle Weaver
>Assignee: Robert Burke
>Priority: P2
>  Labels: stale-assigned
>
> Go SDK build is failing on head (1d3e3ef9ffb4aaa913dc223d92626ca9f0f43207). I 
> tried ./gradlew sdks:go:clean but it didn't seem to make a difference.
> Logs:
> ./gradlew :sdks:go:container:docker
> Resolving dependencies...
> # github.com/apache/beam/sdks/go/test/integration
> .gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/integration/driver.go:67:27:
>  undefined: primitives.Reshuffle
> .gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/integration/driver.go:68:29:
>  undefined: primitives.ReshuffleKV
> > Task :sdks:go:buildDarwinAmd64 FAILED
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':sdks:go:buildDarwinAmd64'.
> > Build failed due to return code 2 of: 
>   Command:
>/Users/kcweaver/.gradle/go/binary/1.12/go/bin/go build -o 
> ./build/bin/integration github.com/apache/beam/sdks/go/test/integration
>   Env:
>GOEXE=
>
> GOPATH=/Users/kcweaver/go/src/github.com/apache/beam/sdks/go/.gogradle/project_gopath
>GOROOT=/Users/kcweaver/.gradle/go/binary/1.12/go
>GOOS=darwin
>GOARCH=amd64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9500) Refactor load tests

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9500:
--
Labels: stale-assigned  (was: )

> Refactor load tests
> ---
>
> Key: BEAM-9500
> URL: https://issues.apache.org/jira/browse/BEAM-9500
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Piotr Szuberski
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> `make {{LoadTest}} parameterized instead of a base class.
> Each subclass of {{LoadTest}} is really just the main {{loadTest}} function 
> and really that function is about the same as writing a {{PTransform}}. If 
> you eliminate subclassing you can have {{LoadTest}} own the pipeline setup 
> with so it will never be possible to forget or mess up 
> {{readSourceFromOptions}} and {{ParDo.of(runtimeMonitor)}}. It will be less 
> repeat boilerplate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9137) beam_PostCommit_Py_ValCont should run with dataflow_worker_jar

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121720#comment-17121720
 ] 

Kenneth Knowles commented on BEAM-9137:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> beam_PostCommit_Py_ValCont should run with dataflow_worker_jar
> --
>
> Key: BEAM-9137
> URL: https://issues.apache.org/jira/browse/BEAM-9137
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Valentyn Tymofieiev
>Priority: P2
>  Labels: stale-assigned
>
> For the first failure, please refer to 
> https://builds.apache.org/job/beam_PostCommit_Py_ValCont/5172/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9571) Add withMongoClientProvider to MongoDbIO

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9571:
--
Labels: stale-assigned  (was: )

> Add withMongoClientProvider to MongoDbIO
> 
>
> Key: BEAM-9571
> URL: https://issues.apache.org/jira/browse/BEAM-9571
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-mongodb
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: P3
>  Labels: stale-assigned
> Fix For: 2.23.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Users of MongoDB may want to provide more details or configure advanced 
> parameteres like SSL validations explicitly. We can offer a method 
> `.withMongoClientProvider` that provides clients as needed as we do in other 
> IOs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9646) [Java] PTransform that integrates Cloud Vision functionality

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121598#comment-17121598
 ] 

Kenneth Knowles commented on BEAM-9646:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> [Java] PTransform that integrates Cloud Vision functionality
> 
>
> Key: BEAM-9646
> URL: https://issues.apache.org/jira/browse/BEAM-9646
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9527) apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerSplitTest.test_split_crazy_sdf is flaky

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121630#comment-17121630
 ] 

Kenneth Knowles commented on BEAM-9527:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerSplitTest.test_split_crazy_sdf
>  is flaky
> -
>
> Key: BEAM-9527
> URL: https://issues.apache.org/jira/browse/BEAM-9527
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Boyuan Zhang
>Priority: P2
>  Labels: stale-assigned
>
> {noformat}
> self =  0x7fe494edb450>
> split_manager = 
> inputs = {'ref_PCollection_PCollection_3_split/Read': 
> ['\x7f\xdf;dZ\x1c\xac\t\x00\x00\x00\x01\x0f\x08V\xff\x80\x02capache_beam\nOffsetRange\nq\x01)\x81q\x02}q\x03(U\x04stopq\x04K\x05U\x05startq\x05K\x00ub.\x01\x00@\x14\x00\x00\x00\x00\x00\x00']}
> process_bundle_id = 'bundle_2575'
> def _generate_splits_for_testing(self,
>  split_manager,
>  inputs,  # type: Mapping[str, 
> PartitionableBuffer]
>  process_bundle_id):
>   # type: (...) -> List[beam_fn_api_pb2.ProcessBundleSplitResponse]
>   split_results = []  # type: 
> List[beam_fn_api_pb2.ProcessBundleSplitResponse]
>   read_transform_id, buffer_data = only_element(inputs.items())
>   byte_stream = b''.join(buffer_data)
>   num_elements = len(
>   list(
>   self._get_input_coder_impl(read_transform_id).decode_all(
>   byte_stream)))
> 
>   # Start the split manager in case it wants to set any breakpoints.
>   split_manager_generator = split_manager(num_elements)
>   try:
> split_fraction = next(split_manager_generator)
> done = False
>   except StopIteration:
> done = True
> 
>   # Send all the data.
>   self._send_input_to_worker(
>   process_bundle_id, read_transform_id, [byte_stream])
> 
>   assert self._worker_handler is not None
> 
>   # Execute the requested splits.
>   while not done:
> if split_fraction is None:
>   split_result = None
> else:
>   split_request = beam_fn_api_pb2.InstructionRequest(
>   process_bundle_split=beam_fn_api_pb2.ProcessBundleSplitRequest(
>   instruction_id=process_bundle_id,
>   desired_splits={
>   read_transform_id: beam_fn_api_pb2.
>   ProcessBundleSplitRequest.DesiredSplit(
>   fraction_of_remainder=split_fraction,
>   estimated_input_elements=num_elements)
>   }))
>   split_response = self._worker_handler.control_conn.push(
>   split_request).get()  # type: 
> beam_fn_api_pb2.InstructionResponse
>   for t in (0.05, 0.1, 0.2):
> waiting = ('Instruction not running', 'not yet scheduled')
> if any(msg in split_response.error for msg in waiting):
>   time.sleep(t)
>   split_response = self._worker_handler.control_conn.push(
>   split_request).get()
>   if 'Unknown process bundle' in split_response.error:
> # It may have finished too fast.
> split_result = None
>   elif split_response.error:
> >   raise RuntimeError(split_response.error)
> E   RuntimeError: Traceback (most recent call last):
> E File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 190, in _execute
> E   response = task()
> E File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 229, in 
> E   lambda: self.create_worker().do_instruction(request), request)
> E File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 416, in do_instruction
> E   getattr(request, request_type), request.instruction_id)
> E File 
> "/home/jenkin

[jira] [Updated] (BEAM-9830) Flink and Samza pipeline options are incompatible

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9830:
--
Labels: stale-assigned  (was: )

> Flink and Samza pipeline options are incompatible
> -
>
> Key: BEAM-9830
> URL: https://issues.apache.org/jira/browse/BEAM-9830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-samza
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Labels: stale-assigned
>
> When both the Flink and Samza runners are included in the same application, 
> an exception occurs because they define getMaxBundleSize with different types 
> (long and Long):
> Caused by: java.lang.IllegalArgumentException: methods with same signature 
> getMaxBundleSize() but incompatible return types: long and others
> Originally reported here: 
> https://stackoverflow.com/questions/61441333/conflict-with-runner-dependencies-in-beam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9781) Cross language test: -v: unary operator expected

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9781:
--
Labels: stale-assigned  (was: )

> Cross language test:  -v: unary operator expected
> -
>
> Key: BEAM-9781
> URL: https://issues.apache.org/jira/browse/BEAM-9781
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P3
>  Labels: stale-assigned
>
> /Users/kcweaver/go/src/github.com/apache/beam/sdks/python/scripts/run_job_server.sh:
>  line 71: [: -v: unary operator expected
> This happens on my Mac:
> GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)
> Copyright (C) 2007 Free Software Foundation, Inc.
> But does not happen on my Linux desktop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9641) Support ZetaSQL DATE functions in BeamSQL

2020-06-01 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121596#comment-17121596
 ] 

Kenneth Knowles commented on BEAM-9641:
---

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Support ZetaSQL DATE functions in BeamSQL
> -
>
> Key: BEAM-9641
> URL: https://issues.apache.org/jira/browse/BEAM-9641
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Robin Qiu
>Assignee: Robin Qiu
>Priority: P2
>  Labels: stale-assigned, zetasql-compliance
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >