[jira] [Commented] (BEAM-9045) Implement an Ignite runner using Apache Ignite compute grid

2020-06-06 Thread Saikat Maitra (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127470#comment-17127470
 ] 

Saikat Maitra commented on BEAM-9045:
-

[~kenn]

 
I was hoping to connect and talk if the tasks for runners can be divided into 
smaller jira stories. I was thinking of implementing each feature as mentioned 
in capability matrix like ParDo, GroupByKey etc in separate PR and work 
accordingly.
 
So, if you think that the runner can be implemented in smaller tasks and I 
should pick a specific feature first and if you have suggestions about which 
specific feature I can pick up, please let me know.
 
Regards,
Saikat

> Implement an Ignite runner using Apache Ignite compute grid
> ---
>
> Key: BEAM-9045
> URL: https://issues.apache.org/jira/browse/BEAM-9045
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Saikat Maitra
>Assignee: Saikat Maitra
>Priority: P2
>  Labels: stale-assigned
>
> Implement an Ignite runner using Apache Ignite compute grid.
> Runner guide [https://beam.apache.org/contribute/runner-guide/]
> Capability Matrix 
> [https://beam.apache.org/documentation/runners/capability-matrix/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=442376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442376
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 07/Jun/20 00:35
Start Date: 07/Jun/20 00:35
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8457:
URL: https://github.com/apache/beam/pull/8457#issuecomment-640137292


   Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442376)
Time Spent: 48h 20m  (was: 48h 10m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 48h 20m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=442375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442375
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 07/Jun/20 00:27
Start Date: 07/Jun/20 00:27
Worklog Time Spent: 10m 
  Work Description: mf2199 commented on pull request #8457:
URL: https://github.com/apache/beam/pull/8457#issuecomment-640136676


   @chamikaramj The other PR uses a different branch. I'm gonna update it then.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442375)
Time Spent: 48h 10m  (was: 48h)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 48h 10m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=442340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442340
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 07/Jun/20 00:25
Start Date: 07/Jun/20 00:25
Worklog Time Spent: 10m 
  Work Description: chamikaramj closed pull request #8457:
URL: https://github.com/apache/beam/pull/8457


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442340)
Time Spent: 47h 50m  (was: 47h 40m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 47h 50m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=442343&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442343
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 07/Jun/20 00:25
Start Date: 07/Jun/20 00:25
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8457:
URL: https://github.com/apache/beam/pull/8457#issuecomment-640136370


   I reopened that PR and triggered tests. Please address any failures. Let's 
continue the review there.
   
   Closing this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442343)
Time Spent: 48h  (was: 47h 50m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 48h
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10101) Add a HttpIO / HttpFileSystem

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10101?focusedWorklogId=442317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442317
 ]

ASF GitHub Bot logged work on BEAM-10101:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 22:04
Start Date: 06/Jun/20 22:04
Worklog Time Spent: 10m 
  Work Description: epicfaace commented on pull request #11824:
URL: https://github.com/apache/beam/pull/11824#issuecomment-640123988


   @pabloem I've addressed your changes and also made the implementation a bit 
cleaner; please take a look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442317)
Time Spent: 1h 50m  (was: 1h 40m)

> Add a HttpIO / HttpFileSystem
> -
>
> Key: BEAM-10101
> URL: https://issues.apache.org/jira/browse/BEAM-10101
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Ashwin Ramaswami
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add HttpIO (and a related HttpFileSystem and HttpsFileSystem), which can 
> download files from a particular http:// or https:// URL. HttpIO cannot 
> upload / write to files, though, because there's no standardized way to write 
> to files using HTTP.
> Sample usage:
>  
> {code:python}
> (
> p
> | 
> ReadFromText("https://raw.githubusercontent.com/apache/beam/5ff5313f0913ec81d31ad306400ad30c0a928b34/NOTICE";)
> | WriteToText("output.txt", shard_name_template="", num_shards=0)
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9502) SchemaCoder assigns random UUID, causes Dataflow's compatibility check to fail

2020-06-06 Thread Cameron Morgan (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cameron Morgan reassigned BEAM-9502:


Assignee: (was: Cameron Morgan)

> SchemaCoder assigns random UUID, causes Dataflow's compatibility check to fail
> --
>
> Key: BEAM-9502
> URL: https://issues.apache.org/jira/browse/BEAM-9502
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Reporter: Yaron Neuman
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> After fe4b7794, _Schema.equals_ comparing only the UUIDs for faster 
> comparison.
>  After 0b3b18c6 _SchemaCoder_ forcing random UUID when schema.uuid is null.
> thus, when trying to update (--update) a Dataflow job with row schemas in 
> user-code, the compatibility check will fail because SchemaCoder produce 
> another random UUID.
>  
> The user can set the UUID after creating the Schema, but not with 
> Schema.Builder
>  and I'm afraid most users, that are not aware to the internal 
> implementation, won't do that.
>  
> In my branch, I added _.withUUID_ and _.withRandomUUID_ to _Schema.Builder_
> But I think a better solution will be to calculate the UUID based on the 
> schema itself.
> any thoughts?
> [~reuvenlax]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10210) Blog Post for Kotlin-based Beam Katas

2020-06-06 Thread Rion Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rion Williams updated BEAM-10210:
-
   Component/s: website
Labels: blog kotlin  (was: )
Remaining Estimate: 2h  (was: 7h 50m)

> Blog Post for Kotlin-based Beam Katas
> -
>
> Key: BEAM-10210
> URL: https://issues.apache.org/jira/browse/BEAM-10210
> Project: Beam
>  Issue Type: Improvement
>  Components: katas, website
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
>  Labels: blog, kotlin
> Fix For: 2.23.0
>
>   Original Estimate: 8h
>  Time Spent: 10m
>  Remaining Estimate: 2h
>
> Recently BEAM-10027 introduced a Kotlin version of the existing Beam Katas 
> for Java, Python, and Go. It was recommended that a blog post be written for 
> the Apache Blog to notify those interested in Beam about the release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10210) Blog Post for Kotlin-based Beam Katas

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10210?focusedWorklogId=442304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442304
 ]

ASF GitHub Bot logged work on BEAM-10210:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 19:40
Start Date: 06/Jun/20 19:40
Worklog Time Spent: 10m 
  Work Description: rionmonster opened a new pull request #11944:
URL: https://github.com/apache/beam/pull/11944


   A brief blog post announcing the recent addition of the Kotlin-based Katas 
to the existing Beam Katas family along with a short introduction to Kotlin 
itself.
   
   CC: @henryken / @pabloem 
   
   I don't know if either one of you would like to review over this and if 
there are any additional areas that I'd need to update related to blog posts 
(this is the first I've written) such as the `authors.yaml` that I adjusted.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [X] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesR

[jira] [Updated] (BEAM-10210) Blog Post for Kotlin-based Beam Katas

2020-06-06 Thread Rion Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rion Williams updated BEAM-10210:
-
Description: Recently BEAM-10027 introduced a Kotlin version of the 
existing Beam Katas for Java, Python, and Go. It was recommended that a blog 
post be written for the Apache Blog to notify those interested in Beam about 
the release.  (was: Currently, there are a series of examples available 
demonstrating the use of Apache Beam with Kotlin. It would be nice to have 
support for the same Beam Katas that exist for Python, Go, and Java to also 
support Kotlin. 

The port itself shouldn't be that involved since it can still target the JVM, 
so it would likely just require the inclusion for Kotlin dependencies and a 
conversion for all of the existing Java examples. )

> Blog Post for Kotlin-based Beam Katas
> -
>
> Key: BEAM-10210
> URL: https://issues.apache.org/jira/browse/BEAM-10210
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
> Fix For: 2.23.0
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Recently BEAM-10027 introduced a Kotlin version of the existing Beam Katas 
> for Java, Python, and Go. It was recommended that a blog post be written for 
> the Apache Blog to notify those interested in Beam about the release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10210) Blog Post for Kotlin-based Beam Katas

2020-06-06 Thread Rion Williams (Jira)
Rion Williams created BEAM-10210:


 Summary: Blog Post for Kotlin-based Beam Katas
 Key: BEAM-10210
 URL: https://issues.apache.org/jira/browse/BEAM-10210
 Project: Beam
  Issue Type: Improvement
  Components: katas
Reporter: Rion Williams
Assignee: Rion Williams
 Fix For: 2.23.0


Currently, there are a series of examples available demonstrating the use of 
Apache Beam with Kotlin. It would be nice to have support for the same Beam 
Katas that exist for Python, Go, and Java to also support Kotlin. 

The port itself shouldn't be that involved since it can still target the JVM, 
so it would likely just require the inclusion for Kotlin dependencies and a 
conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10210) Blog Post for Kotlin-based Beam Katas

2020-06-06 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10210:
---
Status: Open  (was: Triage Needed)

> Blog Post for Kotlin-based Beam Katas
> -
>
> Key: BEAM-10210
> URL: https://issues.apache.org/jira/browse/BEAM-10210
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
> Fix For: 2.23.0
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Currently, there are a series of examples available demonstrating the use of 
> Apache Beam with Kotlin. It would be nice to have support for the same Beam 
> Katas that exist for Python, Go, and Java to also support Kotlin. 
> The port itself shouldn't be that involved since it can still target the JVM, 
> so it would likely just require the inclusion for Kotlin dependencies and a 
> conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-10027) Support for Kotlin-based Beam Katas

2020-06-06 Thread Rion Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rion Williams closed BEAM-10027.

Resolution: Done

> Support for Kotlin-based Beam Katas
> ---
>
> Key: BEAM-10027
> URL: https://issues.apache.org/jira/browse/BEAM-10027
> Project: Beam
>  Issue Type: Improvement
>  Components: katas
>Reporter: Rion Williams
>Assignee: Rion Williams
>Priority: P2
> Fix For: 2.23.0
>
>   Original Estimate: 8h
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Currently, there are a series of examples available demonstrating the use of 
> Apache Beam with Kotlin. It would be nice to have support for the same Beam 
> Katas that exist for Python, Go, and Java to also support Kotlin. 
> The port itself shouldn't be that involved since it can still target the JVM, 
> so it would likely just require the inclusion for Kotlin dependencies and a 
> conversion for all of the existing Java examples. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10209) Add without_defaults to Mean

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10209?focusedWorklogId=442289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442289
 ]

ASF GitHub Bot logged work on BEAM-10209:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 16:40
Start Date: 06/Jun/20 16:40
Worklog Time Spent: 10m 
  Work Description: InigoSJ commented on pull request #11943:
URL: https://github.com/apache/beam/pull/11943#issuecomment-640087218


   Modification: 
   
   Changed from GlobalWindows to FixedWindows, since GlobalWindows doesn't 
require .without_defautls()



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442289)
Time Spent: 0.5h  (was: 20m)

> Add without_defaults to Mean 
> -
>
> Key: BEAM-10209
> URL: https://issues.apache.org/jira/browse/BEAM-10209
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.21.0
>Reporter: Inigo San Jose Visiers
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When using Windows and a Global Combiner with need to use 
> `without_defaults()`, this is not possible when using the built-in combiner 
> `Mean`, and the workaround is to do 
> `CombineGlobally(MeanCombineFn()).without_defaults()`. 
> Adding the option to use .without_defaults() directly would help both the 
> code readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10209) Add without_defaults to Mean

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10209?focusedWorklogId=442290&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442290
 ]

ASF GitHub Bot logged work on BEAM-10209:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 16:41
Start Date: 06/Jun/20 16:41
Worklog Time Spent: 10m 
  Work Description: InigoSJ edited a comment on pull request #11943:
URL: https://github.com/apache/beam/pull/11943#issuecomment-640087218


   Modification: 
   
   Changed from GlobalWindows to FixedWindows in the combiners_test, since 
GlobalWindows doesn't require .without_defautls()



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442290)
Time Spent: 40m  (was: 0.5h)

> Add without_defaults to Mean 
> -
>
> Key: BEAM-10209
> URL: https://issues.apache.org/jira/browse/BEAM-10209
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.21.0
>Reporter: Inigo San Jose Visiers
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When using Windows and a Global Combiner with need to use 
> `without_defaults()`, this is not possible when using the built-in combiner 
> `Mean`, and the workaround is to do 
> `CombineGlobally(MeanCombineFn()).without_defaults()`. 
> Adding the option to use .without_defaults() directly would help both the 
> code readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=442073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442073
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 05/Jun/20 23:39
Start Date: 05/Jun/20 23:39
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11941:
URL: https://github.com/apache/beam/pull/11941#issuecomment-639897770


   R: @boyuanzz @youngoli 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442073)
Time Spent: 38h 20m  (was: 38h 10m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability, stale-assigned
>  Time Spent: 38h 20m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10169) ParDo* functions should declare the correct output N in their error message

2020-06-06 Thread Aaron Tillekeratne (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127136#comment-17127136
 ] 

Aaron Tillekeratne commented on BEAM-10169:
---

I like it; referencing the DoFn is  great idea.

I agree with the edge cases, it does stop the code from being an elegant string 
format. Let me conjure up some go code and we can have a review.

Thanks

> ParDo* functions should declare the correct output N in their error message
> ---
>
> Key: BEAM-10169
> URL: https://issues.apache.org/jira/browse/BEAM-10169
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Aaron Tillekeratne
>Priority: P3
>  Labels: noob, starter
>
> User report noted the confusion in the error if you use a DoFn with 0 outputs 
> with beam.ParDo instead of beam.ParDo0. 
> In that case, a panic stack trace is followed by the cryptic: "expected 1 
> output. Found: []"
> We can do better.
> While we can't change the return signature dynamically (that's for ParDoN 
> only), we can instead clearly indicate: 
> *  the DoFn in question.
> * the number of outputs the DoFn has
> * and recommend using ParDo0, ParDo, ParDo2,...ParDo7,  or ParDoN, as 
> appropriate.
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/pardo.go#L361 
> would need to change as well as any of the specific cases that follow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=442063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442063
 ]

ASF GitHub Bot logged work on BEAM-10093:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 22:27
Start Date: 05/Jun/20 22:27
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #11820:
URL: https://github.com/apache/beam/pull/11820#discussion_r436191652



##
File path: 
sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/sql/SqlBoundedSideInputJoinTest.java
##
@@ -48,166 +47,182 @@
 import org.junit.Before;
 import org.junit.Rule;
 import org.junit.Test;
-import org.junit.experimental.categories.Category;
+import org.junit.experimental.runners.Enclosed;
 import org.junit.runner.RunWith;
 import org.junit.runners.JUnit4;
 
 /** Test the various NEXMark queries yield results coherent with their models. 
*/
-@RunWith(JUnit4.class)
+@RunWith(Enclosed.class)
 public class SqlBoundedSideInputJoinTest {
 
-  @Rule public TestPipeline p = TestPipeline.create();
+  private abstract static class SqlBoundedSideInputJoinTestCases {
 
-  @Before
-  public void setupPipeline() {
-NexmarkUtils.setupPipeline(NexmarkUtils.CoderStrategy.HAND, p);
-  }
+protected abstract SqlBoundedSideInputJoin getQuery(NexmarkConfiguration 
configuration);
+
+@Rule public TestPipeline p = TestPipeline.create();
+
+@Before
+public void setupPipeline() {
+  NexmarkUtils.setupPipeline(NexmarkUtils.CoderStrategy.HAND, p);
+}
 
-  /** Test {@code query} matches {@code model}. */
-  private  void queryMatchesModel(
-  String name,
-  NexmarkConfiguration config,
-  NexmarkQueryTransform query,
-  NexmarkQueryModel model,
-  boolean streamingMode)
-  throws Exception {
-
-ResourceId sideInputResourceId =
-FileSystems.matchNewResource(
-String.format(
-"%s/JoinToFiles-%s", p.getOptions().getTempLocation(), new 
Random().nextInt()),
-false);
-config.sideInputUrl = sideInputResourceId.toString();
-
-try {
+/** Test {@code query} matches {@code model}. */
+private  void queryMatchesModel(
+String name,
+NexmarkConfiguration config,
+NexmarkQueryTransform query,
+NexmarkQueryModel model,
+boolean streamingMode)
+throws Exception {
+
+  ResourceId sideInputResourceId =
+  FileSystems.matchNewResource(
+  String.format(
+  "%s/JoinToFiles-%s", p.getOptions().getTempLocation(), new 
Random().nextInt()),
+  false);
+  config.sideInputUrl = sideInputResourceId.toString();
+
+  try {
+PCollection> sideInput = 
NexmarkUtils.prepareSideInput(p, config);
+query.setSideInput(sideInput);
+
+PCollection events =
+p.apply(
+name + ".Read",
+streamingMode
+? NexmarkUtils.streamEventsSource(config)
+: NexmarkUtils.batchEventsSource(config));
+
+PCollection> results =
+(PCollection>) events.apply(new 
NexmarkQuery<>(config, query));
+PAssert.that(results).satisfies(model.assertionFor());
+PipelineResult result = p.run();
+result.waitUntilFinish();
+  } finally {
+NexmarkUtils.cleanUpSideInput(config);
+  }
+}
+
+/**
+ * A smoke test that the count of input bids and outputs are the same, to 
help diagnose
+ * flakiness in more complex tests.
+ */
+@Test
+public void inputOutputSameEvents() throws Exception {
+  NexmarkConfiguration config = NexmarkConfiguration.DEFAULT.copy();
+  config.sideInputType = NexmarkUtils.SideInputType.DIRECT;
+  config.numEventGenerators = 1;
+  config.numEvents = 5000;
+  config.sideInputRowCount = 10;
+  config.sideInputNumShards = 3;
   PCollection> sideInput = 
NexmarkUtils.prepareSideInput(p, config);
-  query.setSideInput(sideInput);
-
-  PCollection events =
-  p.apply(
-  name + ".Read",
-  streamingMode
-  ? NexmarkUtils.streamEventsSource(config)
-  : NexmarkUtils.batchEventsSource(config));
-
-  PCollection> results =
-  (PCollection>) events.apply(new 
NexmarkQuery<>(config, query));
-  PAssert.that(results).satisfies(model.assertionFor());
-  PipelineResult result = p.run();
-  result.waitUntilFinish();
-} finally {
-  NexmarkUtils.cleanUpSideInput(config);
+
+  try {
+PCollection input = 
p.apply(NexmarkUtils.batchEventsSource(config));
+PCollection justBids = input.apply(NexmarkQueryUtil.JUST_BIDS);
+PCollection bidCount = justBids.apply("Count Bids", 
Count.globally());
+
+NexmarkQueryTransform query = getQuery

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=442072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442072
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 05/Jun/20 23:39
Start Date: 05/Jun/20 23:39
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #11941:
URL: https://github.com/apache/beam/pull/11941


   getInitialRestriction/splitAndSize should not be wrapped with 
startBundle/FinishBundle invocations.
   Instead of copying the stateAccessor initialization (used for side inputs) I 
made it so that it was initialized only once and cleaned up the 
caches/references in the finalizeState call.
   
   Tested within Google using runner_v2.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_Val

[jira] [Work logged] (BEAM-10197) Support type hints for frozenset

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10197?focusedWorklogId=442061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442061
 ]

ASF GitHub Bot logged work on BEAM-10197:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 22:22
Start Date: 05/Jun/20 22:22
Worklog Time Spent: 10m 
  Work Description: udim edited a comment on pull request #11939:
URL: https://github.com/apache/beam/pull/11939#issuecomment-639864241


   arbitrary comment to trigger tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442061)
Time Spent: 40m  (was: 0.5h)

> Support type hints for frozenset 
> -
>
> Key: BEAM-10197
> URL: https://issues.apache.org/jira/browse/BEAM-10197
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Saavan Nanavati
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Beam's internal typing system currently supports type hints for set but not 
> frozenset.
>  
> This Jira ticket will add type annotation support for both frozenset and 
> typing.FrozenSet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10208) add cross-language KafkaIO integration test

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10208?focusedWorklogId=442183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442183
 ]

ASF GitHub Bot logged work on BEAM-10208:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 03:01
Start Date: 06/Jun/20 03:01
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11942:
URL: https://github.com/apache/beam/pull/11942#issuecomment-639965704


   R: @chamikaramj 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442183)
Time Spent: 20m  (was: 10m)

> add cross-language KafkaIO integration test
> ---
>
> Key: BEAM-10208
> URL: https://issues.apache.org/jira/browse/BEAM-10208
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> add cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10197) Support type hints for frozenset

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10197?focusedWorklogId=442060&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442060
 ]

ASF GitHub Bot logged work on BEAM-10197:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 22:21
Start Date: 05/Jun/20 22:21
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11939:
URL: https://github.com/apache/beam/pull/11939#issuecomment-639864241


   random comment to trigger tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442060)
Time Spent: 0.5h  (was: 20m)

> Support type hints for frozenset 
> -
>
> Key: BEAM-10197
> URL: https://issues.apache.org/jira/browse/BEAM-10197
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Saavan Nanavati
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Beam's internal typing system currently supports type hints for set but not 
> frozenset.
>  
> This Jira ticket will add type annotation support for both frozenset and 
> typing.FrozenSet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10208) add cross-language KafkaIO integration test

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10208?focusedWorklogId=442184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442184
 ]

ASF GitHub Bot logged work on BEAM-10208:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 03:04
Start Date: 06/Jun/20 03:04
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11942:
URL: https://github.com/apache/beam/pull/11942#issuecomment-639966084


   Please also run "Run Python 2 PostCommit" and "Run Python 3.7 PostCommit"



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442184)
Time Spent: 0.5h  (was: 20m)

> add cross-language KafkaIO integration test
> ---
>
> Key: BEAM-10208
> URL: https://issues.apache.org/jira/browse/BEAM-10208
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> add cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10145) Kafka IO performance tests leaving behind unused disks on apache-beam-testing

2020-06-06 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski resolved BEAM-10145.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Kafka IO performance tests leaving behind unused disks on apache-beam-testing
> -
>
> Key: BEAM-10145
> URL: https://issues.apache.org/jira/browse/BEAM-10145
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Udi Meiri
>Assignee: Kamil Wasilewski
>Priority: P2
> Fix For: Not applicable
>
> Attachments: VfRJduZigE1.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Sample disk description:
> {"kubernetes.io/created-for/pv/name":"pvc-97dd8abb-a0ac-11ea-aa65-42010a80013b","kubernetes.io/created-for/pvc/name":"data-pzoo-0","kubernetes.io/created-for/pvc/namespace":"beam-performancetests-kafka-io-826"}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10208) add cross-language KafkaIO integration test

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10208?focusedWorklogId=442182&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442182
 ]

ASF GitHub Bot logged work on BEAM-10208:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 03:00
Start Date: 06/Jun/20 03:00
Worklog Time Spent: 10m 
  Work Description: ihji opened a new pull request #11942:
URL: https://github.com/apache/beam/pull/11942


   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStruct

[jira] [Work logged] (BEAM-10209) Add without_defaults to Mean

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10209?focusedWorklogId=442249&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442249
 ]

ASF GitHub Bot logged work on BEAM-10209:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 13:22
Start Date: 06/Jun/20 13:22
Worklog Time Spent: 10m 
  Work Description: InigoSJ commented on pull request #11943:
URL: https://github.com/apache/beam/pull/11943#issuecomment-640060464


   R: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442249)
Time Spent: 20m  (was: 10m)

> Add without_defaults to Mean 
> -
>
> Key: BEAM-10209
> URL: https://issues.apache.org/jira/browse/BEAM-10209
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.21.0
>Reporter: Inigo San Jose Visiers
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using Windows and a Global Combiner with need to use 
> `without_defaults()`, this is not possible when using the built-in combiner 
> `Mean`, and the workaround is to do 
> `CombineGlobally(MeanCombineFn()).without_defaults()`. 
> Adding the option to use .without_defaults() directly would help both the 
> code readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=442056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442056
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 05/Jun/20 22:14
Start Date: 05/Jun/20 22:14
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on a change in pull request #11868:
URL: https://github.com/apache/beam/pull/11868#discussion_r436188166



##
File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
##
@@ -4805,6 +4805,93 @@ public void testTVFTumbleAggregation() {
 
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
   }
 
+  @Test

Review comment:
   Good point. It can be a good point to move all streaming tests to 
another place.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442056)
Time Spent: 9h 10m  (was: 9h)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9977?focusedWorklogId=442055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442055
 ]

ASF GitHub Bot logged work on BEAM-9977:


Author: ASF GitHub Bot
Created on: 05/Jun/20 22:11
Start Date: 05/Jun/20 22:11
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #11749:
URL: https://github.com/apache/beam/pull/11749#discussion_r436187245



##
File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaViaSDF.java
##
@@ -0,0 +1,697 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.kafka;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import com.google.auto.value.AutoValue;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.CoderRegistry;
+import org.apache.beam.sdk.io.range.OffsetRange;
+import org.apache.beam.sdk.options.ExperimentalOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.DoFn.Element;
+import org.apache.beam.sdk.transforms.DoFn.GetRestrictionCoder;
+import org.apache.beam.sdk.transforms.DoFn.OutputReceiver;
+import org.apache.beam.sdk.transforms.DoFn.ProcessElement;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import 
org.apache.beam.sdk.transforms.splittabledofn.GrowableOffsetRangeTracker;
+import org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker;
+import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker;
+import org.apache.beam.sdk.transforms.splittabledofn.WatermarkEstimator;
+import 
org.apache.beam.sdk.transforms.splittabledofn.WatermarkEstimators.MonotonicallyIncreasing;
+import org.apache.beam.sdk.values.PCollection;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Closeables;
+import org.apache.kafka.clients.consumer.Consumer;
+import org.apache.kafka.clients.consumer.ConsumerConfig;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.common.TopicPartition;
+import org.apache.kafka.common.serialization.Deserializer;
+import org.apache.kafka.common.utils.AppInfoParser;
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link PTransform} that takes a PCollection of {@link 
KafkaSourceDescription} as input and
+ * outputs a PCollection of {@link KafkaRecord}. The core implementation is 
based on {@code
+ * SplittableDoFn}. For more details about the concept of {@code 
SplittableDoFn}, please refer to
+ * the beam blog post: https://beam.apache.org/blog/splittable-do-fn/ and 
design
+ * doc:https://s.apache.org/beam-fn-api. The major difference from {@link 
KafkaIO.Read} is, {@link
+ * ReadFromKafkaViaSDF} doesn't require source descriptions(e.g., {@link
+ * KafkaIO.Read#getTopicPartitions()}, {@link KafkaIO.Read#getTopics()}, {@link
+ * KafkaIO.Read#getStartReadTime()}, etc.) during the pipeline construction 
time. Instead, the
+ * pipeline can populate these source descriptions during runtime. For 
example, the pipeline can
+ * query Kafka topics from BigQuery table and read these topics via {@link 
ReadFromKafkaViaSDF}.
+ *
+ * Common Kafka Consumer Configurations
+ *
+ * Most Kafka consumer configurations are similar to {@link KafkaIO.Read}:
+ *
+ * 
+ *   {@link ReadFromKafkaViaSDF#getConsumerConfig()} is the same as {@link
+ *   KafkaIO.Read#getConsumerConfig()}.
+ *   {@link ReadFromKafkaViaSDF#getConsumerFactoryFn()} is the same as 
{@link
+ *   KafkaIO.Read#getConsumerFactoryFn()}.
+ *   {@link ReadFromKafkaViaSDF#getOffsetConsumerConfig()}

[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2020-06-06 Thread Beam JIRA Bot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127226#comment-17127226
 ] 

Beam JIRA Bot commented on BEAM-3342:
-

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 47h 40m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10176) Support BigQuery type aliases in WriteToBigQuery with Avro format

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10176?focusedWorklogId=442170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442170
 ]

ASF GitHub Bot logged work on BEAM-10176:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 01:45
Start Date: 06/Jun/20 01:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11923:
URL: https://github.com/apache/beam/pull/11923#issuecomment-639949346


   Run Python 3.7 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442170)
Remaining Estimate: 2h 10m  (was: 2h 20m)
Time Spent: 50m  (was: 40m)

> Support BigQuery type aliases in WriteToBigQuery with Avro format
> -
>
> Key: BEAM-10176
> URL: https://issues.apache.org/jira/browse/BEAM-10176
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.23.0
>
>   Original Estimate: 3h
>  Time Spent: 50m
>  Remaining Estimate: 2h 10m
>
> Support BigQuery type aliases in WriteToBigQuery with avro temp file format. 
> The following aliases are missing:
> * {{STRUCT}} ({{==RECORD}})
> * {{FLOAT64}} ({{==FLOAT}})
> * {{INT64}} ({{==INTEGER}})
> Reference:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableFieldSchema.html#setType-java.lang.String-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2020-06-06 Thread Beam JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-3342:

Labels: stale-assigned  (was: )

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 47h 40m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=442181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442181
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 06/Jun/20 02:55
Start Date: 06/Jun/20 02:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11941:
URL: https://github.com/apache/beam/pull/11941#issuecomment-639964771


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442181)
Time Spent: 38.5h  (was: 38h 20m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability, stale-assigned
>  Time Spent: 38.5h
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10209) Add without_defaults to Mean

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10209?focusedWorklogId=442248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442248
 ]

ASF GitHub Bot logged work on BEAM-10209:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 13:20
Start Date: 06/Jun/20 13:20
Worklog Time Spent: 10m 
  Work Description: InigoSJ opened a new pull request #11943:
URL: https://github.com/apache/beam/pull/11943


   When need Windows and the Mean combiner in a Globally way, we cannot specify 
`without_defaults()`, forcing us to use the workaround of 
`CombineGlobally(MeanCombineFn()).without_defaults()`.
   
   This PR fix this, adding the `without_defaults()` to Mean in Python
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds

[jira] [Created] (BEAM-10208) add cross-language KafkaIO integration test

2020-06-06 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-10208:
--

 Summary: add cross-language KafkaIO integration test
 Key: BEAM-10208
 URL: https://issues.apache.org/jira/browse/BEAM-10208
 Project: Beam
  Issue Type: Improvement
  Components: cross-language, testing
Reporter: Heejong Lee


add cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10101) Add a HttpIO / HttpFileSystem

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10101?focusedWorklogId=442169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442169
 ]

ASF GitHub Bot logged work on BEAM-10101:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 01:38
Start Date: 06/Jun/20 01:38
Worklog Time Spent: 10m 
  Work Description: epicfaace commented on pull request #11824:
URL: https://github.com/apache/beam/pull/11824#issuecomment-639946797


   I'll be making changes soon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442169)
Time Spent: 1h 40m  (was: 1.5h)

> Add a HttpIO / HttpFileSystem
> -
>
> Key: BEAM-10101
> URL: https://issues.apache.org/jira/browse/BEAM-10101
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Ashwin Ramaswami
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add HttpIO (and a related HttpFileSystem and HttpsFileSystem), which can 
> download files from a particular http:// or https:// URL. HttpIO cannot 
> upload / write to files, though, because there's no standardized way to write 
> to files using HTTP.
> Sample usage:
>  
> {code:python}
> (
> p
> | 
> ReadFromText("https://raw.githubusercontent.com/apache/beam/5ff5313f0913ec81d31ad306400ad30c0a928b34/NOTICE";)
> | WriteToText("output.txt", shard_name_template="", num_shards=0)
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10201) Enhance JsonToRow to add Deadletter Support

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10201?focusedWorklogId=442159&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442159
 ]

ASF GitHub Bot logged work on BEAM-10201:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 00:28
Start Date: 06/Jun/20 00:28
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #11929:
URL: https://github.com/apache/beam/pull/11929#discussion_r436215155



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
##
@@ -116,4 +157,65 @@ private ObjectMapper objectMapper() {
   return this.objectMapper;
 }
   }
+
+  static class JsonToRowWithFailureCaptureFn
+  extends PTransform, PCollectionTuple> {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;
+private static final String METRIC_NAMESPACE = "JsonToRowFn";
+private static final String DEAD_LETTER_METRIC_NAME = 
"JsonToRowFn_ParseFailure";
+
+private Distribution jsonConversionErrors =
+Metrics.distribution(METRIC_NAMESPACE, DEAD_LETTER_METRIC_NAME);
+
+public static final TupleTag main = MAIN_TUPLE_TAG;
+public static final TupleTag deadLetter = DEAD_LETTER_TUPLE_TAG;
+
+PCollection deadLetterCollection;
+
+static JsonToRowWithFailureCaptureFn forSchema(Schema rowSchema) {
+  // Throw exception if this schema is not supported by RowJson
+  RowJson.verifySchemaSupported(rowSchema);
+  return new JsonToRowWithFailureCaptureFn(rowSchema);
+}
+
+private JsonToRowWithFailureCaptureFn(Schema schema) {
+  this.schema = schema;
+}
+
+@Override
+public PCollectionTuple expand(PCollection jsonStrings) {
+
+  return jsonStrings.apply(
+  ParDo.of(
+  new DoFn() {
+@ProcessElement
+public void processElement(ProcessContext context) {
+  try {
+context.output(jsonToRow(objectMapper(), 
context.element()));
+  } catch (Exception ex) {
+context.output(
+deadLetter,
+Row.withSchema(ERROR_ROW_SCHEMA)
+.addValue(context.element())
+.addValue(ex.getMessage())
+.build());

Review comment:
   (I am mostly leaning towards not doing this, but lmk what you think)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442159)
Time Spent: 40m  (was: 0.5h)

> Enhance JsonToRow to add Deadletter Support
> ---
>
> Key: BEAM-10201
> URL: https://issues.apache.org/jira/browse/BEAM-10201
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Reza ardeshir rokni
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current JsonToRow transform does not support Dead Letter pattern on parse 
> failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10201) Enhance JsonToRow to add Deadletter Support

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10201?focusedWorklogId=442157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442157
 ]

ASF GitHub Bot logged work on BEAM-10201:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 00:27
Start Date: 06/Jun/20 00:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #11929:
URL: https://github.com/apache/beam/pull/11929#discussion_r436214157



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
##
@@ -116,4 +157,65 @@ private ObjectMapper objectMapper() {
   return this.objectMapper;
 }
   }
+
+  static class JsonToRowWithFailureCaptureFn
+  extends PTransform, PCollectionTuple> {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;
+private static final String METRIC_NAMESPACE = "JsonToRowFn";
+private static final String DEAD_LETTER_METRIC_NAME = 
"JsonToRowFn_ParseFailure";
+
+private Distribution jsonConversionErrors =
+Metrics.distribution(METRIC_NAMESPACE, DEAD_LETTER_METRIC_NAME);
+
+public static final TupleTag main = MAIN_TUPLE_TAG;
+public static final TupleTag deadLetter = DEAD_LETTER_TUPLE_TAG;
+
+PCollection deadLetterCollection;
+
+static JsonToRowWithFailureCaptureFn forSchema(Schema rowSchema) {
+  // Throw exception if this schema is not supported by RowJson
+  RowJson.verifySchemaSupported(rowSchema);
+  return new JsonToRowWithFailureCaptureFn(rowSchema);
+}
+
+private JsonToRowWithFailureCaptureFn(Schema schema) {
+  this.schema = schema;
+}
+
+@Override
+public PCollectionTuple expand(PCollection jsonStrings) {
+
+  return jsonStrings.apply(
+  ParDo.of(
+  new DoFn() {
+@ProcessElement
+public void processElement(ProcessContext context) {
+  try {
+context.output(jsonToRow(objectMapper(), 
context.element()));
+  } catch (Exception ex) {
+context.output(
+deadLetter,
+Row.withSchema(ERROR_ROW_SCHEMA)
+.addValue(context.element())
+.addValue(ex.getMessage())
+.build());

Review comment:
   I guess this doesn't make sense, but - would it help to include the Row 
Schema that we tried(and failed) to use for this JSON string? Some users may 
not needed, and others can add it themselves in the downstream ParDo - but it's 
possible it may help. Thoughts?

##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
##
@@ -116,4 +157,65 @@ private ObjectMapper objectMapper() {
   return this.objectMapper;
 }
   }
+
+  static class JsonToRowWithFailureCaptureFn
+  extends PTransform, PCollectionTuple> {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;
+private static final String METRIC_NAMESPACE = "JsonToRowFn";
+private static final String DEAD_LETTER_METRIC_NAME = 
"JsonToRowFn_ParseFailure";
+
+private Distribution jsonConversionErrors =
+Metrics.distribution(METRIC_NAMESPACE, DEAD_LETTER_METRIC_NAME);
+
+public static final TupleTag main = MAIN_TUPLE_TAG;
+public static final TupleTag deadLetter = DEAD_LETTER_TUPLE_TAG;
+
+PCollection deadLetterCollection;
+
+static JsonToRowWithFailureCaptureFn forSchema(Schema rowSchema) {
+  // Throw exception if this schema is not supported by RowJson
+  RowJson.verifySchemaSupported(rowSchema);
+  return new JsonToRowWithFailureCaptureFn(rowSchema);
+}
+
+private JsonToRowWithFailureCaptureFn(Schema schema) {
+  this.schema = schema;
+}
+
+@Override
+public PCollectionTuple expand(PCollection jsonStrings) {
+
+  return jsonStrings.apply(
+  ParDo.of(
+  new DoFn() {
+@ProcessElement
+public void processElement(ProcessContext context) {
+  try {
+context.output(jsonToRow(objectMapper(), 
context.element()));
+  } catch (Exception ex) {
+context.output(
+deadLetter,
+Row.withSchema(ERROR_ROW_SCHEMA)
+.addValue(context.element())
+.addValue(ex.getMessage())
+.build());
+  }
+}
+  })
+  .withOutputTags(main, TupleTagList.of(deadLetter)));

Review comment:
   you can add the schema for the outputs here, so that users do

[jira] [Assigned] (BEAM-10208) add cross-language KafkaIO integration test

2020-06-06 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee reassigned BEAM-10208:
--

Assignee: Heejong Lee

> add cross-language KafkaIO integration test
> ---
>
> Key: BEAM-10208
> URL: https://issues.apache.org/jira/browse/BEAM-10208
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>
> add cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10208) add cross-language KafkaIO integration test

2020-06-06 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10208:
---
Status: Open  (was: Triage Needed)

> add cross-language KafkaIO integration test
> ---
>
> Key: BEAM-10208
> URL: https://issues.apache.org/jira/browse/BEAM-10208
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, testing
>Reporter: Heejong Lee
>Priority: P2
>
> add cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-06-06 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee resolved BEAM-9869.
---
Fix Version/s: 2.23.0
   Resolution: Fixed

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
> Fix For: 2.23.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10052) check hash and avoid duplicates when uploading artifact in Python Dataflow Runner

2020-06-06 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee resolved BEAM-10052.

Resolution: Fixed

> check hash and avoid duplicates when uploading artifact in Python Dataflow 
> Runner
> -
>
> Key: BEAM-10052
> URL: https://issues.apache.org/jira/browse/BEAM-10052
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> xlang pipeline could have many duplicated jars. it would be great if we check 
> hash and avoid duplicate uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-10125) adding cross-language KafkaIO integration test

2020-06-06 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee resolved BEAM-10125.

Fix Version/s: 2.23.0
   Resolution: Fixed

> adding cross-language KafkaIO integration test
> --
>
> Key: BEAM-10125
> URL: https://issues.apache.org/jira/browse/BEAM-10125
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> adding cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=442050&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442050
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 05/Jun/20 22:04
Start Date: 05/Jun/20 22:04
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on a change in pull request #11868:
URL: https://github.com/apache/beam/pull/11868#discussion_r436178923



##
File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
##
@@ -4805,6 +4805,93 @@ public void testTVFTumbleAggregation() {
 
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
   }
 
+  @Test

Review comment:
   +1. IIRC there are some other tests in this file that are testing our 
streaming extension. It makes sense to separate them to other file.

##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/TVFToPTransform.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl;
+
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+
+/** Provides a function that produces a PCollection based on TVF and upstream 
PCollection. */
+public interface TVFToPTransform {

Review comment:
   +1





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442050)
Time Spent: 9h  (was: 8h 50m)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10052) check hash and avoid duplicates when uploading artifact in Python Dataflow Runner

2020-06-06 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-10052:
---
Fix Version/s: 2.22.0

> check hash and avoid duplicates when uploading artifact in Python Dataflow 
> Runner
> -
>
> Key: BEAM-10052
> URL: https://issues.apache.org/jira/browse/BEAM-10052
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> xlang pipeline could have many duplicated jars. it would be great if we check 
> hash and avoid duplicate uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=442051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442051
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 05/Jun/20 22:04
Start Date: 05/Jun/20 22:04
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11932:
URL: https://github.com/apache/beam/pull/11932#discussion_r436178918



##
File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/ArtifactRetrievalService.java
##
@@ -89,34 +92,41 @@ public void resolveArtifacts(
   public void getArtifact(
   ArtifactApi.GetArtifactRequest request,
   StreamObserver responseObserver) {
-switch (request.getArtifact().getTypeUrn()) {
+try {
+  InputStream inputStream = getArtifact(request.getArtifact());
+  byte[] buffer = new byte[bufferSize];
+  int bytesRead;
+  while ((bytesRead = inputStream.read(buffer)) > 0) {
+responseObserver.onNext(
+ArtifactApi.GetArtifactResponse.newBuilder()
+.setData(ByteString.copyFrom(buffer, 0, bytesRead))
+.build());
+  }
+  responseObserver.onCompleted();
+} catch (IOException exn) {
+  exn.printStackTrace();
+  responseObserver.onError(exn);

Review comment:
   Should we have wrapped this exception in a StatusException as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442051)
Time Spent: 24h 20m  (was: 24h 10m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 24h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=442088&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442088
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 06/Jun/20 00:20
Start Date: 06/Jun/20 00:20
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8457:
URL: https://github.com/apache/beam/pull/8457#issuecomment-639912763


   > There are still failing tests on #11295. @mf2199 - What is the next step 
for this PR?
   
   PIng on this? What is our plan for this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442088)
Time Spent: 47h 40m  (was: 47.5h)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: P2
>  Time Spent: 47h 40m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9626) pymongo should be an optional requirement

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9626?focusedWorklogId=442209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442209
 ]

ASF GitHub Bot logged work on BEAM-9626:


Author: ASF GitHub Bot
Created on: 06/Jun/20 08:29
Start Date: 06/Jun/20 08:29
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #11256:
URL: https://github.com/apache/beam/pull/11256#issuecomment-640012351


   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442209)
Time Spent: 1h  (was: 50m)

> pymongo should be an optional requirement
> -
>
> Key: BEAM-9626
> URL: https://issues.apache.org/jira/browse/BEAM-9626
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The pymongo driver is installed by default, but as the number of IO 
> connectors in the python sdk grows, I don't think this is the precedent we 
> want to set.  We already have "extra" packages for gcp, aws, and interactive, 
> we should also add one for mongo. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9626) pymongo should be an optional requirement

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9626?focusedWorklogId=442208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442208
 ]

ASF GitHub Bot logged work on BEAM-9626:


Author: ASF GitHub Bot
Created on: 06/Jun/20 08:29
Start Date: 06/Jun/20 08:29
Worklog Time Spent: 10m 
  Work Description: stale[bot] closed pull request #11256:
URL: https://github.com/apache/beam/pull/11256


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442208)
Time Spent: 50m  (was: 40m)

> pymongo should be an optional requirement
> -
>
> Key: BEAM-9626
> URL: https://issues.apache.org/jira/browse/BEAM-9626
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: P3
>  Labels: stale-assigned
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The pymongo driver is installed by default, but as the number of IO 
> connectors in the python sdk grows, I don't think this is the precedent we 
> want to set.  We already have "extra" packages for gcp, aws, and interactive, 
> we should also add one for mongo. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6215) FlatMap no longer shows User function name

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6215?focusedWorklogId=442068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442068
 ]

ASF GitHub Bot logged work on BEAM-6215:


Author: ASF GitHub Bot
Created on: 05/Jun/20 23:08
Start Date: 05/Jun/20 23:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11940:
URL: https://github.com/apache/beam/pull/11940#issuecomment-639884926


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442068)
Time Spent: 0.5h  (was: 20m)

> FlatMap no longer shows User function name
> --
>
> Key: BEAM-6215
> URL: https://issues.apache.org/jira/browse/BEAM-6215
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:python}
> def my_function(element):
> ...
> pcoll | beam.FlatMap(my_function)
> {code}
> now has a stage name of {{FlatMap()}} rather than 
> {{FlatMap(my_function)}}. Not sure when this regression was introduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10101) Add a HttpIO / HttpFileSystem

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10101?focusedWorklogId=442070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442070
 ]

ASF GitHub Bot logged work on BEAM-10101:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 23:09
Start Date: 05/Jun/20 23:09
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11824:
URL: https://github.com/apache/beam/pull/11824#issuecomment-639885325


   @epicfaace lmk what are your plans for this PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442070)
Time Spent: 1.5h  (was: 1h 20m)

> Add a HttpIO / HttpFileSystem
> -
>
> Key: BEAM-10101
> URL: https://issues.apache.org/jira/browse/BEAM-10101
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Ashwin Ramaswami
>Assignee: Ashwin Ramaswami
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add HttpIO (and a related HttpFileSystem and HttpsFileSystem), which can 
> download files from a particular http:// or https:// URL. HttpIO cannot 
> upload / write to files, though, because there's no standardized way to write 
> to files using HTTP.
> Sample usage:
>  
> {code:python}
> (
> p
> | 
> ReadFromText("https://raw.githubusercontent.com/apache/beam/5ff5313f0913ec81d31ad306400ad30c0a928b34/NOTICE";)
> | WriteToText("output.txt", shard_name_template="", num_shards=0)
> )
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10201) Enhance JsonToRow to add Deadletter Support

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10201?focusedWorklogId=442069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442069
 ]

ASF GitHub Bot logged work on BEAM-10201:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 23:08
Start Date: 05/Jun/20 23:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11929:
URL: https://github.com/apache/beam/pull/11929#issuecomment-639885056


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442069)
Time Spent: 20m  (was: 10m)

> Enhance JsonToRow to add Deadletter Support
> ---
>
> Key: BEAM-10201
> URL: https://issues.apache.org/jira/browse/BEAM-10201
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Reza ardeshir rokni
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current JsonToRow transform does not support Dead Letter pattern on parse 
> failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10208) add cross-language KafkaIO integration test

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10208?focusedWorklogId=442192&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442192
 ]

ASF GitHub Bot logged work on BEAM-10208:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 04:56
Start Date: 06/Jun/20 04:56
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11942:
URL: https://github.com/apache/beam/pull/11942#issuecomment-639983521


   CC: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442192)
Time Spent: 40m  (was: 0.5h)

> add cross-language KafkaIO integration test
> ---
>
> Key: BEAM-10208
> URL: https://issues.apache.org/jira/browse/BEAM-10208
> Project: Beam
>  Issue Type: Improvement
>  Components: cross-language, testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> add cross-language KafkaIO integration test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10176) Support BigQuery type aliases in WriteToBigQuery with Avro format

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10176?focusedWorklogId=442087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442087
 ]

ASF GitHub Bot logged work on BEAM-10176:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 00:14
Start Date: 06/Jun/20 00:14
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11923:
URL: https://github.com/apache/beam/pull/11923#issuecomment-639911150


   Run Python 3.7 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442087)
Remaining Estimate: 2h 20m  (was: 2.5h)
Time Spent: 40m  (was: 0.5h)

> Support BigQuery type aliases in WriteToBigQuery with Avro format
> -
>
> Key: BEAM-10176
> URL: https://issues.apache.org/jira/browse/BEAM-10176
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.23.0
>
>   Original Estimate: 3h
>  Time Spent: 40m
>  Remaining Estimate: 2h 20m
>
> Support BigQuery type aliases in WriteToBigQuery with avro temp file format. 
> The following aliases are missing:
> * {{STRUCT}} ({{==RECORD}})
> * {{FLOAT64}} ({{==FLOAT}})
> * {{INT64}} ({{==INTEGER}})
> Reference:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableFieldSchema.html#setType-java.lang.String-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10145) Kafka IO performance tests leaving behind unused disks on apache-beam-testing

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10145?focusedWorklogId=442066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442066
 ]

ASF GitHub Bot logged work on BEAM-10145:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 23:02
Start Date: 05/Jun/20 23:02
Worklog Time Spent: 10m 
  Work Description: pabloem merged pull request #11931:
URL: https://github.com/apache/beam/pull/11931


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442066)
Time Spent: 1h 10m  (was: 1h)

> Kafka IO performance tests leaving behind unused disks on apache-beam-testing
> -
>
> Key: BEAM-10145
> URL: https://issues.apache.org/jira/browse/BEAM-10145
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Udi Meiri
>Assignee: Kamil Wasilewski
>Priority: P2
> Attachments: VfRJduZigE1.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Sample disk description:
> {"kubernetes.io/created-for/pv/name":"pvc-97dd8abb-a0ac-11ea-aa65-42010a80013b","kubernetes.io/created-for/pvc/name":"data-pzoo-0","kubernetes.io/created-for/pvc/namespace":"beam-performancetests-kafka-io-826"}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10197) Support type hints for frozenset

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10197?focusedWorklogId=442065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442065
 ]

ASF GitHub Bot logged work on BEAM-10197:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 23:01
Start Date: 05/Jun/20 23:01
Worklog Time Spent: 10m 
  Work Description: udim commented on a change in pull request #11939:
URL: https://github.com/apache/beam/pull/11939#discussion_r436187871



##
File path: sdks/python/apache_beam/typehints/typehints_test.py
##
@@ -612,54 +613,70 @@ def test_match_type_variables(self):
  hint.match_type_variables(typehints.Dict[int, str]))
 
 
-class SetHintTestCase(TypeHintTestCase):
+class BaseSetHintTest:
+  def __init__(self, string_type, py_type, beam_type, *args, **kwargs):

Review comment:
   You should omit `*args, **kwargs` if there aren't any more args allowed. 
This is the style we follow in this codebase. Here and below as well.
   
   For example: if some code called `BaseSetHintTest('Set', set, typehints.Set, 
typehints.FrozenSet)` would you rather it be silently ignored or raised as an 
incorrect number of arguments?

##
File path: sdks/python/apache_beam/typehints/typehints_test.py
##
@@ -612,54 +613,70 @@ def test_match_type_variables(self):
  hint.match_type_variables(typehints.Dict[int, str]))
 
 
-class SetHintTestCase(TypeHintTestCase):
+class BaseSetHintTest:

Review comment:
   If `BaseSetHintTest` inherited from `TypeHintTestCase` you could avoid 
multiple inheritance in the sub-classes below. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442065)
Time Spent: 50m  (was: 40m)

> Support type hints for frozenset 
> -
>
> Key: BEAM-10197
> URL: https://issues.apache.org/jira/browse/BEAM-10197
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Saavan Nanavati
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Beam's internal typing system currently supports type hints for set but not 
> frozenset.
>  
> This Jira ticket will add type annotation support for both frozenset and 
> typing.FrozenSet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9990) FhirIO should support conditional create / update methods

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9990?focusedWorklogId=442086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442086
 ]

ASF GitHub Bot logged work on BEAM-9990:


Author: ASF GitHub Bot
Created on: 06/Jun/20 00:12
Start Date: 06/Jun/20 00:12
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #11702:
URL: https://github.com/apache/beam/pull/11702#discussion_r436210077



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java
##
@@ -1173,4 +1276,339 @@ public void executeBundles(ProcessContext context) {
   }
 }
   }
+
+  /**
+   * Create resources fhir io . create resources.
+   *
+   * @param  the type parameter
+   * @param fhirStore the fhir store
+   * @return the fhir io . create resources
+   */
+  public static  FhirIO.CreateResources 
createResources(ValueProvider fhirStore) {
+return new CreateResources(fhirStore);
+  }
+
+  /**
+   * Create resources fhir io . create resources.
+   *
+   * @param  the type parameter
+   * @param fhirStore the fhir store
+   * @return the fhir io . create resources
+   */
+  public static  FhirIO.CreateResources createResources(String 
fhirStore) {
+return new CreateResources(fhirStore);
+  }
+  /**
+   * {@link PTransform} for Creating FHIR resources.
+   *
+   * 
https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/create
+   */
+  public static class CreateResources extends PTransform, 
Write.Result> {
+private final String fhirStore;
+private SerializableFunction ifNoneExistFunction;
+private SerializableFunction formatBodyFunction;
+private SerializableFunction typeFunction;
+private static final Logger LOG = 
LoggerFactory.getLogger(CreateResources.class);
+
+/**
+ * Instantiates a new Create resources transform.
+ *
+ * @param fhirStore the fhir store
+ */
+CreateResources(ValueProvider fhirStore) {
+  this.fhirStore = fhirStore.get();
+}
+
+/**
+ * Instantiates a new Create resources.
+ *
+ * @param fhirStore the fhir store
+ */
+CreateResources(String fhirStore) {
+  this.fhirStore = fhirStore;
+}
+
+/**
+ * This adds a {@link SerializableFunction} that reads an resource string 
and extracts an
+ * If-None-Exists query for conditional create. Typically this will just 
be extracting an ID to
+ * look for.
+ *
+ * 
https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/create
+ *
+ * @param ifNoneExistFunction the if none exist function
+ * @return the create resources
+ */
+public CreateResources withIfNotExistFunction(
+SerializableFunction ifNoneExistFunction) {
+  this.ifNoneExistFunction = ifNoneExistFunction;
+  return this;
+}
+
+/**
+ * This adds a {@link SerializableFunction} that reads an resource string 
and extracts an
+ * resource type.
+ *
+ * 
https://cloud.google.com/healthcare/docs/reference/rest/v1beta1/projects.locations.datasets.fhirStores.fhir/create
+ *
+ * @param typeFunction for extracting type from a resource.
+ * @return the create resources
+ */
+public CreateResources withTypeFunction(SerializableFunction 
typeFunction) {
+  this.typeFunction = typeFunction;
+  return this;
+}
+/**
+ * With format body function create resources.
+ *
+ * @param formatBodyFunction the format body function
+ * @return the create resources
+ */
+public CreateResources withFormatBodyFunction(

Review comment:
   I don't think I understand this function very well. It seems like a fn 
to format a resource properly in case its formatting is not correct? Could you 
detail the documentation for it?

##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java
##
@@ -155,17 +168,53 @@
  * FhirIO.Write.Result writeResult =
  * output.apply("Execute FHIR Bundles", 
FhirIO.executeBundles(options.getExistingFhirStore()));
  *
+ * // Alternatively you could use import for high throughput to a new store.
+ * FhirIO.Write.Result writeResult =
+ * output.apply("Import FHIR Resources", 
FhirIO.executeBundles(options.getNewFhirStore()));
+ * // [End Writing ]
+ *
  * PCollection> failedBundles = 
writeResult.getFailedInsertsWithErr();
  *
+ * // [Begin Writing to Dead Letter Queue]
  * failedBundles.apply("Write failed bundles to BigQuery",
  * BigQueryIO
  * .write()
  * .to(option.getBQFhirExecuteBundlesDeadLetterTable())
  * .withFormatFunction(new HealthcareIOErrorToTableRow()));
+ * // [End Writing to Dead Letter Queue]
+ *
+ * // Alternatively you may want to h

[jira] [Work logged] (BEAM-10176) Support BigQuery type aliases in WriteToBigQuery with Avro format

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10176?focusedWorklogId=442067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442067
 ]

ASF GitHub Bot logged work on BEAM-10176:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 23:02
Start Date: 05/Jun/20 23:02
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11923:
URL: https://github.com/apache/beam/pull/11923#issuecomment-639882291


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442067)
Remaining Estimate: 2.5h  (was: 2h 40m)
Time Spent: 0.5h  (was: 20m)

> Support BigQuery type aliases in WriteToBigQuery with Avro format
> -
>
> Key: BEAM-10176
> URL: https://issues.apache.org/jira/browse/BEAM-10176
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.23.0
>
>   Original Estimate: 3h
>  Time Spent: 0.5h
>  Remaining Estimate: 2.5h
>
> Support BigQuery type aliases in WriteToBigQuery with avro temp file format. 
> The following aliases are missing:
> * {{STRUCT}} ({{==RECORD}})
> * {{FLOAT64}} ({{==FLOAT}})
> * {{INT64}} ({{==INTEGER}})
> Reference:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableFieldSchema.html#setType-java.lang.String-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10209) Add without_defaults to Mean

2020-06-06 Thread Inigo San Jose Visiers (Jira)
Inigo San Jose Visiers created BEAM-10209:
-

 Summary: Add without_defaults to Mean 
 Key: BEAM-10209
 URL: https://issues.apache.org/jira/browse/BEAM-10209
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Affects Versions: 2.21.0
Reporter: Inigo San Jose Visiers


When using Windows and a Global Combiner with need to use `without_defaults()`, 
this is not possible when using the built-in combiner `Mean`, and the 
workaround is to do `CombineGlobally(MeanCombineFn()).without_defaults()`. 

Adding the option to use .without_defaults() directly would help both the code 
readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10201) Enhance JsonToRow to add Deadletter Support

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10201?focusedWorklogId=442165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442165
 ]

ASF GitHub Bot logged work on BEAM-10201:
-

Author: ASF GitHub Bot
Created on: 06/Jun/20 00:53
Start Date: 06/Jun/20 00:53
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #11929:
URL: https://github.com/apache/beam/pull/11929#discussion_r436220739



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
##
@@ -72,10 +79,44 @@
 @Experimental(Kind.SCHEMAS)
 public class JsonToRow {
 
+  private static final String LINE_FIELD_NAME = "line";
+  private static final String ERROR_FIELD_NAME = "err";
+
+  public static final Schema ERROR_ROW_SCHEMA =
+  Schema.of(
+  Field.of(LINE_FIELD_NAME, FieldType.STRING),
+  Field.of(ERROR_FIELD_NAME, FieldType.STRING));
+
+  public static final TupleTag MAIN_TUPLE_TAG = new TupleTag() {};
+  public static final TupleTag DEAD_LETTER_TUPLE_TAG = new 
TupleTag() {};
+
   public static PTransform, PCollection> 
withSchema(Schema rowSchema) {
 return JsonToRowFn.forSchema(rowSchema);
   }
 
+  /**
+   * Enable Dead letter support. If this value is set errors in the parsing 
layer are returned as
+   * Row objects of form: {@link JsonToRow#ERROR_ROW_SCHEMA} line : The 
original json string err :
+   * The error message from the parsing function.
+   *
+   * You can access the results by using:
+   *
+   * {@link JsonToRow#MAIN_TUPLE_TAG}
+   *
+   * {@Code PCollection personRows =
+   * results.get(JsonToRow.MAIN_TUPLE_TAG).setRowSchema(personSchema)}

Review comment:
   +1. If you output.a Row, you should be setting the schema in your 
transform.

##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
##
@@ -116,4 +157,65 @@ private ObjectMapper objectMapper() {
   return this.objectMapper;
 }
   }
+
+  static class JsonToRowWithFailureCaptureFn
+  extends PTransform, PCollectionTuple> {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;

Review comment:
   make final

##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
##
@@ -72,10 +79,44 @@
 @Experimental(Kind.SCHEMAS)
 public class JsonToRow {
 
+  private static final String LINE_FIELD_NAME = "line";
+  private static final String ERROR_FIELD_NAME = "err";
+
+  public static final Schema ERROR_ROW_SCHEMA =
+  Schema.of(
+  Field.of(LINE_FIELD_NAME, FieldType.STRING),
+  Field.of(ERROR_FIELD_NAME, FieldType.STRING));
+

Review comment:
   would be nicer to make these field names configurable, though with 
defaults.

##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
##
@@ -116,4 +157,65 @@ private ObjectMapper objectMapper() {
   return this.objectMapper;
 }
   }
+
+  static class JsonToRowWithFailureCaptureFn
+  extends PTransform, PCollectionTuple> {

Review comment:
   I think it would be cleaner to wrap this in a custom result class and 
not expose the TupleTags to users. Look 
org.apache.beam.sdk.io.gcp.bigquery.WriteResult for an example.

##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/JsonToRow.java
##
@@ -116,4 +157,65 @@ private ObjectMapper objectMapper() {
   return this.objectMapper;
 }
   }
+
+  static class JsonToRowWithFailureCaptureFn
+  extends PTransform, PCollectionTuple> {
+private transient volatile @Nullable ObjectMapper objectMapper;
+private Schema schema;
+private static final String METRIC_NAMESPACE = "JsonToRowFn";
+private static final String DEAD_LETTER_METRIC_NAME = 
"JsonToRowFn_ParseFailure";
+
+private Distribution jsonConversionErrors =
+Metrics.distribution(METRIC_NAMESPACE, DEAD_LETTER_METRIC_NAME);
+
+public static final TupleTag main = MAIN_TUPLE_TAG;
+public static final TupleTag deadLetter = DEAD_LETTER_TUPLE_TAG;
+
+PCollection deadLetterCollection;
+
+static JsonToRowWithFailureCaptureFn forSchema(Schema rowSchema) {
+  // Throw exception if this schema is not supported by RowJson
+  RowJson.verifySchemaSupported(rowSchema);
+  return new JsonToRowWithFailureCaptureFn(rowSchema);
+}
+
+private JsonToRowWithFailureCaptureFn(Schema schema) {
+  this.schema = schema;
+}
+
+@Override
+public PCollectionTuple expand(PCollection jsonStrings) {
+
+  return jsonStrings.apply(
+  ParDo.of(
+  new DoFn() {
+@ProcessElement
+public void processElement(ProcessContext context) {

Review comment:
   Why not use injected parameters instead of ProcessContext?