[jira] [Resolved] (BEAM-9079) release: gradle release task doesn't push commits upstream

2020-01-09 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-9079.
---
Fix Version/s: Not applicable
   Resolution: Not A Bug

This is intended, per the linked Jira.

> release: gradle release task doesn't push commits upstream
> --
>
> Key: BEAM-9079
> URL: https://issues.apache.org/jira/browse/BEAM-9079
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: Not applicable
>
>
> The release guide and build_release_candidate.sh both mention running 
> "gradlew release". It should be committing 2 commits and 1 RC tag.
> For example in 2.10.0:
> https://github.com/apache/beam/commit/34f9c131b908453a753098e95f2d2fa8e236d188
> https://github.com/apache/beam/commit/b87b09b0339e189a0a1e344dba6abce826a12b03
> https://github.com/apache/beam/commits/v2.10.0-RC3
> This seems to be happening locally for me, but it doesn't get pushed to 
> github.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369611
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 04:26
Start Date: 10/Jan/20 04:26
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365068235
 
 

 ##
 File path: sdks/python/apache_beam/pvalue.py
 ##
 @@ -82,7 +82,7 @@ class PValue(object):
   """
 
   def __init__(self,
-   pipeline,  # type: Pipeline
+   pipeline,  # type: Optional[Pipeline]
 
 Review comment:
   > Is it possible to instead suppress the warning at the callsite?
   
   yes, if the state where the pipeline is `None` is short-lived then this is 
probably preferable to adding lots more assertions.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369611)
Time Spent: 46h 40m  (was: 46.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 46h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9079) release: gradle release task doesn't push commits upstream

2020-01-09 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012445#comment-17012445
 ] 

Kenneth Knowles commented on BEAM-9079:
---

Notably, the release scripts have needed revision almost every release. They 
aren't at a level of reliability where we should have them modify the repo. And 
it isn't necessary. But it does seem I did not update the release guide 
corresponding to the changes to the scripts. Sorry!

> release: gradle release task doesn't push commits upstream
> --
>
> Key: BEAM-9079
> URL: https://issues.apache.org/jira/browse/BEAM-9079
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> The release guide and build_release_candidate.sh both mention running 
> "gradlew release". It should be committing 2 commits and 1 RC tag.
> For example in 2.10.0:
> https://github.com/apache/beam/commit/34f9c131b908453a753098e95f2d2fa8e236d188
> https://github.com/apache/beam/commit/b87b09b0339e189a0a1e344dba6abce826a12b03
> https://github.com/apache/beam/commits/v2.10.0-RC3
> This seems to be happening locally for me, but it doesn't get pushed to 
> github.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6595) build_release_candidate.sh should not push to apache org on github

2020-01-09 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012444#comment-17012444
 ] 

Kenneth Knowles commented on BEAM-6595:
---

What has happened is that we have been releasing RCs where the exact commit 
with the changed version is not pushed to GitHub.

> build_release_candidate.sh should not push to apache org on github
> --
>
> Key: BEAM-6595
> URL: https://issues.apache.org/jira/browse/BEAM-6595
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, the build_release_candidate.sh does many things beyond the build
>  - Edits files in place to update the version from SNAPSHOT to non-SNAPSHOT
>  - Makes a local commit
>  - Pushes commits to release branch
>  - Reverts on failure, pushes those to release branch
> Instead, the release manager should determine what gets pushed. It is less 
> fragile of a process and avoids cruft getting pushed and churning the branch. 
> The only thing the plugin is really good for is flipping SNAPSHOT away and 
> back. And it isn't even that great because that's Java only and other 
> languages are at non-SNAPSHOT anyhow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6595) build_release_candidate.sh should not push to apache org on github

2020-01-09 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012440#comment-17012440
 ] 

Kenneth Knowles commented on BEAM-6595:
---

See 
https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E

> build_release_candidate.sh should not push to apache org on github
> --
>
> Key: BEAM-6595
> URL: https://issues.apache.org/jira/browse/BEAM-6595
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, the build_release_candidate.sh does many things beyond the build
>  - Edits files in place to update the version from SNAPSHOT to non-SNAPSHOT
>  - Makes a local commit
>  - Pushes commits to release branch
>  - Reverts on failure, pushes those to release branch
> Instead, the release manager should determine what gets pushed. It is less 
> fragile of a process and avoids cruft getting pushed and churning the branch. 
> The only thing the plugin is really good for is flipping SNAPSHOT away and 
> back. And it isn't even that great because that's Java only and other 
> languages are at non-SNAPSHOT anyhow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9079) release: gradle release task doesn't push commits upstream

2020-01-09 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012439#comment-17012439
 ] 

Kenneth Knowles commented on BEAM-9079:
---

We should really not bother with the gradle release plugin. It doesn't add 
anything, and its way of operating is against how git really can work best.

https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E

> release: gradle release task doesn't push commits upstream
> --
>
> Key: BEAM-9079
> URL: https://issues.apache.org/jira/browse/BEAM-9079
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> The release guide and build_release_candidate.sh both mention running 
> "gradlew release". It should be committing 2 commits and 1 RC tag.
> For example in 2.10.0:
> https://github.com/apache/beam/commit/34f9c131b908453a753098e95f2d2fa8e236d188
> https://github.com/apache/beam/commit/b87b09b0339e189a0a1e344dba6abce826a12b03
> https://github.com/apache/beam/commits/v2.10.0-RC3
> This seems to be happening locally for me, but it doesn't get pushed to 
> github.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8685) Beam Dependency Update Request: com.google.auth:google-auth-library-oauth2-http

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8685?focusedWorklogId=369604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369604
 ]

ASF GitHub Bot logged work on BEAM-8685:


Author: ASF GitHub Bot
Created on: 10/Jan/20 04:08
Start Date: 10/Jan/20 04:08
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10508: [BEAM-8685] 
sdks/java: google_auth_version 0.19.0
URL: https://github.com/apache/beam/pull/10508#issuecomment-572864568
 
 
   21 successful checks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369604)
Time Spent: 3.5h  (was: 3h 20m)

> Beam Dependency Update Request: 
> com.google.auth:google-auth-library-oauth2-http
> ---
>
> Key: BEAM-8685
> URL: https://issues.apache.org/jira/browse/BEAM-8685
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:27.324449 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.18.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:03.844285 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.18.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:10:30.864371 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.18.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:09:38.646889 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.18.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-23 12:09:39.967215 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.19.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-30 14:05:10.534268 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.19.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-06 12:08:45.451960 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.19.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369603
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 04:07
Start Date: 10/Jan/20 04:07
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r364991139
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -808,7 +810,7 @@ class AppliedPTransform(object):
 
   def __init__(self,
parent,
-   transform,  # type: ptransform.PTransform
+   transform,  # type: Optional[ptransform.PTransform]
 
 Review comment:
   On the second issue, I should point that mypy knows that `proto.spec` is not 
`None` when we call `PTransform. from_runner_api(proto.spec, context)` (because 
`proto.spec` is always non-None), and we could _almost_ use that knowledge to 
solve this problem with `@overload`s of `PTransform.from_runner_api()`, like 
this:
   
   ```python
 @overload
 @classmethod
 def from_runner_api(cls,
 proto,  # type: None
 context  # type: PipelineContext
):
   # type: (...) -> None
   pass
   
 @overload
 @classmethod
 def from_runner_api(cls,
 proto,  # type: beam_runner_api_pb2.FunctionSpec
 context  # type: PipelineContext
):
   # type: (...) -> PTransform
   pass
 
 @classmethod
 def from_runner_api(cls, proto, context):
   if proto is None or not proto.urn:
 return None
   ```
   
   Unfortunately, typing can't track whether the value of `proto.urn` is an 
empty string, which means that the above overload strategy doesn't actually 
work.   Is there any chance that this could be changed to `if proto is None or 
proto.urn is None`?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369603)
Time Spent: 46.5h  (was: 46h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 46.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369602
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 04:05
Start Date: 10/Jan/20 04:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r364988919
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -808,7 +810,7 @@ class AppliedPTransform(object):
 
   def __init__(self,
parent,
-   transform,  # type: ptransform.PTransform
+   transform,  # type: Optional[ptransform.PTransform]
 
 Review comment:
   in `Pipeline.__init__`:
   
   ```python
   # Stack of transforms generated by nested apply() calls. The stack will
   # contain a root node as an enclosing (parent) node for top transforms.
   self.transforms_stack = [AppliedPTransform(None, None, '', None)]
   ```
   
   Best way to deal with this may be a special `RootAppliedTransform` subclass.
   
   ---
   
   It can also possibly be `None` in `AppliedPTransform.from_runner_api()`:
   
   ```python
   transform = ptransform.PTransform.from_runner_api(proto.spec, context)
   result = AppliedPTransform(
   parent=None,
   transform=transform,
   full_label=proto.unique_name,
   inputs=main_inputs)
   ```
   
   This is because `PTransform.from_runner_api()` returns `Optional[PTransform]`
   
   ```python
 @classmethod
 def from_runner_api(cls,
 proto,  # type: 
Optional[beam_runner_api_pb2.FunctionSpec]
 context  # type: PipelineContext
):
   # type: (...) -> Optional[PTransform]
   if proto is None or not proto.urn:
 return None
   parameter_type, constructor = cls._known_urns[proto.urn]
   ```
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369602)
Time Spent: 46h 20m  (was: 46h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 46h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369601
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 04:04
Start Date: 10/Jan/20 04:04
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365065181
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -437,13 +439,15 @@ def invoke_start_bundle(self):
 # type: () -> None
 """Invokes the DoFn.start_bundle() method.
 """
+assert self.output_processor is not None
 
 Review comment:
   Ok, how about I change these to `type: ignore` and you create a ticket to 
remove `output_processor`?  I would make the ticket but I don't think I have 
enough context to explain it properly.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369601)
Time Spent: 46h 10m  (was: 46h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 46h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9002?focusedWorklogId=369599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369599
 ]

ASF GitHub Bot logged work on BEAM-9002:


Author: ASF GitHub Bot
Created on: 10/Jan/20 04:03
Start Date: 10/Jan/20 04:03
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10550: [BEAM-9002] 
Add test_flatten_same_pcollections to fnapi runner
URL: https://github.com/apache/beam/pull/10550
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369596
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:36
Start Date: 10/Jan/20 03:36
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10190: [BEAM-8575] Added 
two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#issuecomment-572858348
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369596)
Time Spent: 47h 40m  (was: 47.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 47h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369595
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:36
Start Date: 10/Jan/20 03:36
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10190: [BEAM-8575] Added 
two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#issuecomment-572858605
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369595)
Time Spent: 47.5h  (was: 47h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 47.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369594
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:35
Start Date: 10/Jan/20 03:35
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10190: [BEAM-8575] Added 
two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#issuecomment-572858348
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369594)
Time Spent: 47h 20m  (was: 47h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 47h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369593
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:27
Start Date: 10/Jan/20 03:27
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365059540
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -485,6 +520,113 @@ def test_fixed_windows_combine(self):
   equal_to([('c', 3), ('c', 10), ('d', 5), ('d', 17)]),
   label='sum per key')
 
+  # Test that three different kinds of metrics work with a customized
+  # CounterIncrememtingCombineFn.
+  def test_custormized_counters_in_combine_fn(self):
+p = TestPipeline()
+input = (p
+ | beam.Create([('c', 'i'),
+('c', 'go'),
+('c', 'run'),
+('d', 'beam'),
+('d', 'tests')]))
+
+# The result of concatenating all values regardless of key.
+global_concat = (input
+ | beam.Values()
+ | beam.CombineGlobally(CounterIncrememtingCombineFn())
+ | "sort global result" >> _SortLists)
 
 Review comment:
   Deleted. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369593)
Time Spent: 47h 10m  (was: 47h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 47h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9002) test_flatten_same_pcollections (apache_beam.transforms.ptransform_test.PTransformTest) does not work in Streaming VR suite on Dataflow

2020-01-09 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9002:
---
Component/s: (was: sdk-py-core)
 runner-dataflow

> test_flatten_same_pcollections 
> (apache_beam.transforms.ptransform_test.PTransformTest) does not work in 
> Streaming VR suite on Dataflow
> --
>
> Key: BEAM-9002
> URL: https://issues.apache.org/jira/browse/BEAM-9002
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369592
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:20
Start Date: 10/Jan/20 03:20
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365055230
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
+
+  def merge_accumulators(self, accs):
+return ''.join(sorted(''.join(accs)))
+
+  def extract_output(self, acc):
+return acc
 
 Review comment:
   Once sorted, acc became a list again. So I use ''.join() to convert it back 
to a string.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369592)
Time Spent: 47h  (was: 46h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 47h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369585
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:06
Start Date: 10/Jan/20 03:06
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365049352
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
 
 Review comment:
   That doesn't work.
   This is quite tricky.
   The input strings are automatically taken as a list. For example, for the 
input item ('key1', 'abc'), the string 'abc' is processed as ['a', 'b', 'c'].
   Therefore, if we don't use ''.join() to convert the list to a string, we 
will get this result:
[('key2', ['u', 'u', 'v', 'v', 'x', 'x', 'y', 'y', 'z']), ('key1', ['a', 
'a', 'a', 'b', 'b', 'c'])],
   
   with modified code:
 def create_accumulator(self):
 return ''
   
 def add_input(self, acc, element):
 return acc + element
   
 def merge_accumulators(self, accs):
 return [ele for acc in accs for ele in acc]
   
 def extract_output(self, acc):
 return sorted(acc)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369585)
Time Spent: 46h 50m  (was: 46h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 46h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369584
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:05
Start Date: 10/Jan/20 03:05
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365049352
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
 
 Review comment:
   That doesn't work.
   This is quite tricky.
   The input strings are automatically taken as a list. For example, for the 
input item ('key1', 'abc'), the string 'abc' is processed ['a', 'b', 'c'].
   Therefore, if we don't use ''.join() to convert the list to a string, we 
will get this result:
[('key2', ['u', 'u', 'v', 'v', 'x', 'x', 'y', 'y', 'z']), ('key1', ['a', 
'a', 'a', 'b', 'b', 'c'])],
   
   with modified code:
 def create_accumulator(self):
 return ''
   
 def add_input(self, acc, element):
 return acc + element
   
 def merge_accumulators(self, accs):
 return [ele for acc in accs for ele in acc]
   
 def extract_output(self, acc):
 return sorted(acc)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369584)
Time Spent: 46h 40m  (was: 46.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 46h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369583
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:01
Start Date: 10/Jan/20 03:01
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365055230
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
+
+  def merge_accumulators(self, accs):
+return ''.join(sorted(''.join(accs)))
+
+  def extract_output(self, acc):
+return acc
 
 Review comment:
   Once sorted, acc became a list again. ''.join() is used to convert it back 
to a string.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369583)
Time Spent: 46.5h  (was: 46h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 46.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369582=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369582
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 03:00
Start Date: 10/Jan/20 03:00
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365055059
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
+
+  def merge_accumulators(self, accs):
+return ''.join(sorted(''.join(accs)))
+
+  def extract_output(self, acc):
+return acc
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369582)
Time Spent: 46h 20m  (was: 46h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 46h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369572
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 02:37
Start Date: 10/Jan/20 02:37
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365049352
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
 
 Review comment:
   That doesn't work.
   This is quick tricky.
   The input strings are automatically taken as a list. For example, for the 
input item ('key1', 'abc'), the string 'abc' is processed ['a', 'b', 'c'].
   Therefore, if we don't use ''.join() to convert the list to a string, we 
will get this result:
[('key2', ['u', 'u', 'v', 'v', 'x', 'x', 'y', 'y', 'z']), ('key1', ['a', 
'a', 'a', 'b', 'b', 'c'])],
   
   with modified code:
 def create_accumulator(self):
 return ''
   
 def add_input(self, acc, element):
 return acc + element
   
 def merge_accumulators(self, accs):
 return [ele for acc in accs for ele in acc]
   
 def extract_output(self, acc):
 return sorted(acc)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369572)
Time Spent: 46h 10m  (was: 46h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 46h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369570
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 02:36
Start Date: 10/Jan/20 02:36
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365049352
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
 
 Review comment:
   That doesn't work.
   The input strings are automatically taken as a list. For example, for the 
input item ('key1', 'abc'), the string 'abc' is processed ['a', 'b', 'c'].
   Therefore, if we don't use ''.join() to convert the list to a string, we 
will get this result:
[('key2', ['u', 'u', 'v', 'v', 'x', 'x', 'y', 'y', 'z']), ('key1', ['a', 
'a', 'a', 'b', 'b', 'c'])],
   
   with modified code:
 def create_accumulator(self):
 return ''
   
 def add_input(self, acc, element):
 return acc + element
   
 def merge_accumulators(self, accs):
 return [ele for acc in accs for ele in acc]
   
 def extract_output(self, acc):
 return sorted(acc)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369570)
Time Spent: 45h 50m  (was: 45h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 45h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369569
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 02:36
Start Date: 10/Jan/20 02:36
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365049352
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
 
 Review comment:
   That doesn't work.
   The input strings are automatically taken as a list. For example, for the 
input item ('key1', 'abc'), the string 'abc' is processed ['a', 'b', 'c'].
   Therefore, if we don't use ''.join() to convert the list to a string, we 
will get this result:
[('key2', ['u', 'u', 'v', 'v', 'x', 'x', 'y', 'y', 'z']), ('key1', ['a', 
'a', 'a', 'b', 'b', 'c'])],
   
   with modified code:
 def create_accumulator(self):
   return ''
   
 def add_input(self, acc, element):
   return acc + element
   
 def merge_accumulators(self, accs):
   return [ele for acc in accs for ele in acc]
   
 def extract_output(self, acc):
   return sorted(acc)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369569)
Time Spent: 45h 40m  (was: 45.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 45h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369565
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 02:29
Start Date: 10/Jan/20 02:29
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365049352
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
 
 Review comment:
   That doesn't work.
   The input strings are automatically taken as a list. For example, for the 
input item ('key1', 'abc'), the string 'abc' is processed ['a', 'b', 'c'].
   Therefore, if we don't use ''.join() to convert the list to a string, we 
will get this result:
[('key2', ['u', 'u', 'v', 'v', 'x', 'x', 'y', 'y', 'z']), ('key1', ['a', 
'a', 'a', 'b', 'b', 'c'])].
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369565)
Time Spent: 45.5h  (was: 45h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 45.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4287) SplittableDoFn: splitAtFraction() API for Java

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4287?focusedWorklogId=369543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369543
 ]

ASF GitHub Bot logged work on BEAM-4287:


Author: ASF GitHub Bot
Created on: 10/Jan/20 02:00
Start Date: 10/Jan/20 02:00
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10302: [BEAM-4287] Fix 
to use the residual instead of the current restriction on process continuations.
URL: https://github.com/apache/beam/pull/10302
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369543)
Time Spent: 3.5h  (was: 3h 20m)

> SplittableDoFn: splitAtFraction() API for Java
> --
>
> Key: BEAM-4287
> URL: https://issues.apache.org/jira/browse/BEAM-4287
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> SDF currently only has a checkpoint() API. This Jira is about adding the 
> splitAtFraction() API and its support in runners that support the respective 
> feature for sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7861?focusedWorklogId=369542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369542
 ]

ASF GitHub Bot logged work on BEAM-7861:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:58
Start Date: 10/Jan/20 01:58
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10536: [BEAM-7861] Add 
direct_running_mode option for direct runners to switch between multi_threading 
and multi_processing easily
URL: https://github.com/apache/beam/pull/10536
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369542)
Time Spent: 2h 40m  (was: 2.5h)

> Make it easy to change between multi-process and multi-thread mode for Python 
> Direct runners
> 
>
> Key: BEAM-7861
> URL: https://issues.apache.org/jira/browse/BEAM-7861
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and 
> multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy 
> without changing the runner each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=369541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369541
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:55
Start Date: 10/Jan/20 01:55
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10548: [BEAM-8337] Fix 
Flink version munging.
URL: https://github.com/apache/beam/pull/10548
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369541)
Time Spent: 5h 50m  (was: 5h 40m)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=369540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369540
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:53
Start Date: 10/Jan/20 01:53
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10549: [BEAM-8337] 
Hard-code Flink versions.
URL: https://github.com/apache/beam/pull/10549
 
 
   Also publish the RC images instead of rebuilding.
   
   There were problems with the bash code for munging Flink versions, so I got 
rid of it and replaced with a hard-coded list. The release manager will have to 
manually validate that they're correct, but I think that's a good idea anyway.
   
   @Hannah-Jiang @udim @mxm 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369539
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:51
Start Date: 10/Jan/20 01:51
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365042199
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369539)
Time Spent: 45h 20m  (was: 45h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 45h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369538
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:48
Start Date: 10/Jan/20 01:48
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365041530
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -485,6 +520,113 @@ def test_fixed_windows_combine(self):
   equal_to([('c', 3), ('c', 10), ('d', 5), ('d', 17)]),
   label='sum per key')
 
+  # Test that three different kinds of metrics work with a customized
+  # CounterIncrememtingCombineFn.
+  def test_custormized_counters_in_combine_fn(self):
+p = TestPipeline()
+input = (p
+ | beam.Create([('c', 'i'),
+('c', 'go'),
+('c', 'run'),
+('d', 'beam'),
+('d', 'tests')]))
+
+# The result of concatenating all values regardless of key.
+global_concat = (input
+ | beam.Values()
+ | beam.CombineGlobally(CounterIncrememtingCombineFn())
+ | "sort global result" >> _SortLists)
+
+# The (key, concatenated_string) pairs for all keys.
+concat_per_key = (input
+  | beam.CombinePerKey(CounterIncrememtingCombineFn())
+  | "sort per key result" >> _SortLists)
+
+result = p.run()
+result.wait_until_finish()
+
+# Verify the concatenated strings are correct.
+expected_concat_per_key = [('c', 'ginoru'), ('d', 'abeemsstt')]
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369538)
Time Spent: 45h 10m  (was: 45h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 45h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369532
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:39
Start Date: 10/Jan/20 01:39
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365039681
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/statesampler_slow.py
 ##
 @@ -80,8 +80,7 @@ def stop(self):
 
   def reset(self):
 # type: () -> None
-for state in self._states_by_name.values():
-  state.nsecs = 0
+pass
 
 Review comment:
   I think this was discussed a bit in the original PR.  
`statesampler_slow.StateSampler` does not have a `_states_by_name` attribute, 
so presumably this is not called. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369532)
Time Spent: 46h  (was: 45h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 46h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369531
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:38
Start Date: 10/Jan/20 01:38
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365039681
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/statesampler_slow.py
 ##
 @@ -80,8 +80,7 @@ def stop(self):
 
   def reset(self):
 # type: () -> None
-for state in self._states_by_name.values():
-  state.nsecs = 0
+pass
 
 Review comment:
   I think this was discussed a bit in the original PR.  
`statesampler_slow.StateSampler` does not have a `_states_by_name` , so 
presumably this is not called. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369531)
Time Spent: 45h 50m  (was: 45h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 45h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369529
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:37
Start Date: 10/Jan/20 01:37
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365039417
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -265,7 +265,7 @@ def finish(self):
 
 class _StateBackedIterable(object):
   def __init__(self,
-   state_handler,
+   state_handler,  # type: sdk_worker.CachingStateHandler
 
 Review comment:
   `CachingStateHandler` does not inherit from `StateHandler` nor does it 
implement its abstract methods
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369529)
Time Spent: 45h 40m  (was: 45.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 45h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369528
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:36
Start Date: 10/Jan/20 01:36
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365039298
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -55,12 +74,52 @@ class Environment(object):
   For internal use only. No backwards compatibility guarantees.
   """
 
-  _known_urns = {}
-  _urn_to_env_cls = {}
+  _known_urns = {}  # type: Dict[str, Tuple[Optional[type], ConstructorFn]]
+  _urn_to_env_cls = {}  # type: Dict[str, type]
 
   def to_runner_api_parameter(self, context):
+# type: (PipelineContext) -> Tuple[str, Optional[Union[message.Message, 
bytes, str]]]
 raise NotImplementedError
 
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: Type[T]
+  ):
+# type: (...) -> Callable[[Union[type, Callable[[T, PipelineContext], 
Any]]], Callable[[T, PipelineContext], Any]]
+pass
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: None
+  ):
+# type: (...) -> Callable[[Union[type, Callable[[bytes, PipelineContext], 
Any]]], Callable[[bytes, PipelineContext], Any]]
+pass
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: Type[T]
+   constructor  # type: Callable[[T, PipelineContext], Any]
+  ):
+# type: (...) -> None
 
 Review comment:
   it could be, but unions can be a real pain in the ass.  the overloads allow 
mypy to distinguish the result type based on the signature used.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369528)
Time Spent: 45.5h  (was: 45h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 45.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369527
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:35
Start Date: 10/Jan/20 01:35
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365039017
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -55,12 +74,52 @@ class Environment(object):
   For internal use only. No backwards compatibility guarantees.
   """
 
-  _known_urns = {}
-  _urn_to_env_cls = {}
+  _known_urns = {}  # type: Dict[str, Tuple[Optional[type], ConstructorFn]]
+  _urn_to_env_cls = {}  # type: Dict[str, type]
 
   def to_runner_api_parameter(self, context):
+# type: (PipelineContext) -> Tuple[str, Optional[Union[message.Message, 
bytes, str]]]
 raise NotImplementedError
 
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: Type[T]
+  ):
+# type: (...) -> Callable[[Union[type, Callable[[T, PipelineContext], 
Any]]], Callable[[T, PipelineContext], Any]]
+pass
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: None
 
 Review comment:
   that would be nice from a typing simplicity standpoint.  not sure about the 
implications of that though.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369527)
Time Spent: 45h 20m  (was: 45h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 45h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369526
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:34
Start Date: 10/Jan/20 01:34
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365038793
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -511,14 +540,14 @@ class GrpcStateHandlerFactory(StateHandlerFactory):
   """
 
   def __init__(self, state_cache, credentials=None):
-self._state_handler_cache = {}  # type: Dict[str, GrpcStateHandler]
+self._state_handler_cache = {}  # type: Dict[str, CachingStateHandler]
 
 Review comment:
   `CachingStateHandler` does not inherit from `StateHandler` nor does it 
implement its abstract methods  :(
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369526)
Time Spent: 45h 10m  (was: 45h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 45h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369525
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:33
Start Date: 10/Jan/20 01:33
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365038623
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -1130,19 +1132,26 @@ def process(self, windowed_value):
 
 @BeamTransformFactory.register_urn(
 DATA_INPUT_URN, beam_fn_api_pb2.RemoteGrpcPort)
-def create(factory, transform_id, transform_proto, grpc_port, consumers):
+def create_source_runner(
 
 Review comment:
   That's an idea. It would still add a level of indirection and boilerplate...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369525)
Time Spent: 45h  (was: 44h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 45h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369522
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:28
Start Date: 10/Jan/20 01:28
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365037593
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -1130,19 +1132,26 @@ def process(self, windowed_value):
 
 @BeamTransformFactory.register_urn(
 DATA_INPUT_URN, beam_fn_api_pb2.RemoteGrpcPort)
-def create(factory, transform_id, transform_proto, grpc_port, consumers):
+def create_source_runner(
 
 Review comment:
   I think the best way to reduce the noise here would be make the registration 
more object oriented.  
   
   Quick sketch:
   
   ```python
   class OpCreator(Generic[OperatorT]):
 def __init__(
 self,
 factory,  # type: BeamTransformFactory
 transform_id,  # type: str
 transform_proto,  # type: beam_runner_api_pb2.PTransform
 consumers  # type: Dict[str, List[operations.Operation]]
 ):
   self.factory = factory
   self.transform_id = transform_id
   self.transform_proto = transform_proto
   self.consumers = consumers
   
 def create(self, parameter):
   # type: (Any) -> OperatorT
   raise NotImplementedError
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369522)
Time Spent: 44h 50m  (was: 44h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 44h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369517
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:24
Start Date: 10/Jan/20 01:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365036430
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
 
 Review comment:
   No need to sort here, just return acc + element.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369517)
Time Spent: 44.5h  (was: 44h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 44.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369521
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:24
Start Date: 10/Jan/20 01:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365035848
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -485,6 +520,113 @@ def test_fixed_windows_combine(self):
   equal_to([('c', 3), ('c', 10), ('d', 5), ('d', 17)]),
   label='sum per key')
 
+  # Test that three different kinds of metrics work with a customized
+  # CounterIncrememtingCombineFn.
+  def test_custormized_counters_in_combine_fn(self):
+p = TestPipeline()
+input = (p
+ | beam.Create([('c', 'i'),
+('c', 'go'),
+('c', 'run'),
+('d', 'beam'),
+('d', 'tests')]))
+
+# The result of concatenating all values regardless of key.
+global_concat = (input
+ | beam.Values()
+ | beam.CombineGlobally(CounterIncrememtingCombineFn())
+ | "sort global result" >> _SortLists)
+
+# The (key, concatenated_string) pairs for all keys.
+concat_per_key = (input
+  | beam.CombinePerKey(CounterIncrememtingCombineFn())
+  | "sort per key result" >> _SortLists)
+
+result = p.run()
+result.wait_until_finish()
+
+# Verify the concatenated strings are correct.
+expected_concat_per_key = [('c', 'ginoru'), ('d', 'abeemsstt')]
 
 Review comment:
   Nit: it's really hard to tell from reading this whether it is correct. Maybe 
make the input something like 
   
   ('key1': 'a'), ('key1': 'ab'), .., ('key2', 'xyz'), ...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369521)
Time Spent: 45h  (was: 44h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 45h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369516
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:24
Start Date: 10/Jan/20 01:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365036214
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
 
 Review comment:
   Maybe call this SortedConcatWithCounters?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369516)
Time Spent: 44.5h  (was: 44h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 44.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369519
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:24
Start Date: 10/Jan/20 01:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365036559
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
+
+  def merge_accumulators(self, accs):
+return ''.join(sorted(''.join(accs)))
+
+  def extract_output(self, acc):
+return acc
 
 Review comment:
   I'd do the sorting here. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369519)
Time Spent: 44h 40m  (was: 44.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 44h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369520
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:24
Start Date: 10/Jan/20 01:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365036669
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -485,6 +520,113 @@ def test_fixed_windows_combine(self):
   equal_to([('c', 3), ('c', 10), ('d', 5), ('d', 17)]),
   label='sum per key')
 
+  # Test that three different kinds of metrics work with a customized
+  # CounterIncrememtingCombineFn.
+  def test_custormized_counters_in_combine_fn(self):
+p = TestPipeline()
+input = (p
+ | beam.Create([('c', 'i'),
+('c', 'go'),
+('c', 'run'),
+('d', 'beam'),
+('d', 'tests')]))
+
+# The result of concatenating all values regardless of key.
+global_concat = (input
+ | beam.Values()
+ | beam.CombineGlobally(CounterIncrememtingCombineFn())
+ | "sort global result" >> _SortLists)
 
 Review comment:
   There's no need for _SortLists anywhere here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369520)
Time Spent: 44h 50m  (was: 44h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 44h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=369518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369518
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:24
Start Date: 10/Jan/20 01:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r365036490
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -50,6 +52,39 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+
+# ''.join() converts the list to a string.
+return ''.join(sorted(acc + element))
+
+  def merge_accumulators(self, accs):
+return ''.join(sorted(''.join(accs)))
 
 Review comment:
   Likewise, `return sum(accs, '')`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369518)
Time Spent: 44.5h  (was: 44h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 44.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=369515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369515
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:23
Start Date: 10/Jan/20 01:23
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #10535: [BEAM-5605] 
Add support for executing pair with restriction, split restriction, split and 
size restriction, process element and restriction and process sized element and 
restriction within the Java SDK harness.
URL: https://github.com/apache/beam/pull/10535#discussion_r365025326
 
 

 ##
 File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
 ##
 @@ -162,9 +475,127 @@ public void output(OutputT output, Instant timestamp, 
BoundedWindow window) {
 outputTo(consumers, WindowedValue.of(output, timestamp, window, 
PaneInfo.NO_FIRING));
   }
 };
+switch (context.pTransform.getSpec().getUrn()) {
+  case PTransformTranslation.SPLITTABLE_SPLIT_RESTRICTION_URN:
+this.outputSplitRestrictionReceiver =
+new OutputReceiver() {
+
+  @Override
+  public void output(RestrictionT output) {
+outputTo(
+mainOutputConsumers,
+(WindowedValue)
+
currentElement.withValue(KV.of(currentElement.getValue(), output)));
+  }
+
+  @Override
+  public void outputWithTimestamp(RestrictionT output, Instant 
timestamp) {
+outputTo(
+mainOutputConsumers,
+(WindowedValue)
+WindowedValue.of(
+KV.of(currentElement.getValue(), output),
+timestamp,
+currentWindow,
+currentElement.getPane()));
+  }
+};
+break;
+  case PTransformTranslation.SPLITTABLE_SPLIT_AND_SIZE_RESTRICTIONS_URN:
+this.outputSplitRestrictionReceiver =
+new OutputReceiver() {
+
+  @Override
+  public void output(RestrictionT output) {
+RestrictionTracker outputTracker =
+doFnInvoker.invokeNewTracker(output);
+outputTo(
+mainOutputConsumers,
+(WindowedValue)
+currentElement.withValue(
+KV.of(
+KV.of(currentElement.getValue(), output),
+outputTracker instanceof HasSize
+? ((HasSize) outputTracker).getSize()
+: 1.0)));
+  }
+
+  @Override
+  public void outputWithTimestamp(RestrictionT output, Instant 
timestamp) {
+outputTo(
+mainOutputConsumers,
+(WindowedValue)
+WindowedValue.of(
+KV.of(currentElement.getValue(), output),
 
 Review comment:
   Shouldn't this method also produce sizes, like the `output` method above?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369515)
Time Spent: 4h 40m  (was: 4.5h)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369504
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:13
Start Date: 10/Jan/20 01:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r364979889
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -1130,19 +1132,26 @@ def process(self, windowed_value):
 
 @BeamTransformFactory.register_urn(
 DATA_INPUT_URN, beam_fn_api_pb2.RemoteGrpcPort)
-def create(factory, transform_id, transform_proto, grpc_port, consumers):
+def create_source_runner(
 
 Review comment:
   These registered constructors (necessarily) all have the same signature. Is 
there a way to declare that in a common place? (The return type is always 
Operation, what type is not ever introspected.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369504)
Time Spent: 44h 10m  (was: 44h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 44h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369506
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:13
Start Date: 10/Jan/20 01:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365006782
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -55,12 +74,52 @@ class Environment(object):
   For internal use only. No backwards compatibility guarantees.
   """
 
-  _known_urns = {}
-  _urn_to_env_cls = {}
+  _known_urns = {}  # type: Dict[str, Tuple[Optional[type], ConstructorFn]]
+  _urn_to_env_cls = {}  # type: Dict[str, type]
 
   def to_runner_api_parameter(self, context):
+# type: (PipelineContext) -> Tuple[str, Optional[Union[message.Message, 
bytes, str]]]
 raise NotImplementedError
 
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: Type[T]
+  ):
+# type: (...) -> Callable[[Union[type, Callable[[T, PipelineContext], 
Any]]], Callable[[T, PipelineContext], Any]]
+pass
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: None
+  ):
+# type: (...) -> Callable[[Union[type, Callable[[bytes, PipelineContext], 
Any]]], Callable[[bytes, PipelineContext], Any]]
+pass
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: Type[T]
+   constructor  # type: Callable[[T, PipelineContext], Any]
+  ):
+# type: (...) -> None
 
 Review comment:
   Can the return type be a union rather than a third and fourth overload?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369506)
Time Spent: 44.5h  (was: 44h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 44.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369507
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:13
Start Date: 10/Jan/20 01:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365003360
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -511,14 +540,14 @@ class GrpcStateHandlerFactory(StateHandlerFactory):
   """
 
   def __init__(self, state_cache, credentials=None):
-self._state_handler_cache = {}  # type: Dict[str, GrpcStateHandler]
+self._state_handler_cache = {}  # type: Dict[str, CachingStateHandler]
 
 Review comment:
   Should this just be a map to StateHandler? (And below.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369507)
Time Spent: 44.5h  (was: 44h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 44.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369509
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:13
Start Date: 10/Jan/20 01:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365006498
 
 

 ##
 File path: sdks/python/apache_beam/transforms/environments.py
 ##
 @@ -55,12 +74,52 @@ class Environment(object):
   For internal use only. No backwards compatibility guarantees.
   """
 
-  _known_urns = {}
-  _urn_to_env_cls = {}
+  _known_urns = {}  # type: Dict[str, Tuple[Optional[type], ConstructorFn]]
+  _urn_to_env_cls = {}  # type: Dict[str, type]
 
   def to_runner_api_parameter(self, context):
+# type: (PipelineContext) -> Tuple[str, Optional[Union[message.Message, 
bytes, str]]]
 raise NotImplementedError
 
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: Type[T]
+  ):
+# type: (...) -> Callable[[Union[type, Callable[[T, PipelineContext], 
Any]]], Callable[[T, PipelineContext], Any]]
+pass
+
+  @classmethod
+  @overload
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type,  # type: None
 
 Review comment:
   Idea: could we unify these and update all callers that currently pass None 
to pass bytes?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369509)
Time Spent: 44h 40m  (was: 44.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 44h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369505
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:13
Start Date: 10/Jan/20 01:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r364979191
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -265,7 +265,7 @@ def finish(self):
 
 class _StateBackedIterable(object):
   def __init__(self,
-   state_handler,
+   state_handler,  # type: sdk_worker.CachingStateHandler
 
 Review comment:
   Is the 'Caching' part necessary here (even if it always is right now)?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369505)
Time Spent: 44h 20m  (was: 44h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 44h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369508
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 10/Jan/20 01:13
Start Date: 10/Jan/20 01:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r365003963
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/statesampler_slow.py
 ##
 @@ -80,8 +80,7 @@ def stop(self):
 
   def reset(self):
 # type: () -> None
-for state in self._states_by_name.values():
-  state.nsecs = 0
+pass
 
 Review comment:
   Do states in this case not have a nsecs attribute? Why was this code not hit 
before?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369508)
Time Spent: 44h 40m  (was: 44.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 44h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=369497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369497
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:54
Start Date: 10/Jan/20 00:54
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #10115: [BEAM-8624] 
Implement Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#discussion_r365030515
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcService.java
 ##
 @@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.status;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ConcurrentSkipListMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusRequest;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusResponse;
+import 
org.apache.beam.model.fnexecution.v1.BeamFnWorkerStatusGrpc.BeamFnWorkerStatusImplBase;
+import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.runners.fnexecution.HeaderAccessor;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.StreamObserver;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A Fn Status service which can collect run-time status information from SDK 
harnesses for
+ * debugging purpose.
+ */
+public class BeamWorkerStatusGrpcService extends BeamFnWorkerStatusImplBase 
implements FnService {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BeamWorkerStatusGrpcService.class);
+  private static final String DEFAULT_EXCEPTION_RESPONSE =
+  "Error: exception encountered getting status from SDK harness";
+
+  private final HeaderAccessor headerAccessor;
+  private final Map> 
connectedClient =
+  Collections.synchronizedMap(new HashMap<>());
+
+  private BeamWorkerStatusGrpcService(
+  ApiServiceDescriptor apiServiceDescriptor, HeaderAccessor 
headerAccessor) {
+this.headerAccessor = headerAccessor;
+LOG.info("Launched Beam Fn Status service at {}", apiServiceDescriptor);
+  }
+
+  /**
+   * Create new instance of {@link BeamWorkerStatusGrpcService}.
+   *
+   * @param apiServiceDescriptor describes the configuration for the endpoint 
the server will
+   * expose.
+   * @param headerAccessor headerAccessor gRPC header accessor used to obtain 
SDK harness worker id.
+   * @return {@link BeamWorkerStatusGrpcService}
+   */
+  public static BeamWorkerStatusGrpcService create(
+  ApiServiceDescriptor apiServiceDescriptor, HeaderAccessor 
headerAccessor) {
+return new BeamWorkerStatusGrpcService(apiServiceDescriptor, 
headerAccessor);
+  }
+
+  @Override
+  public void close() throws Exception {
+synchronized (connectedClient) {
+  for (CompletableFuture clientFuture : 
connectedClient.values()) {
+if (clientFuture.isDone()) {
+  clientFuture.get().close();
+}
+  }
+  connectedClient.clear();
+}
+  }
+
+  @Override
+  public StreamObserver workerStatus(
+  StreamObserver requestObserver) {
+String workerId = headerAccessor.getSdkWorkerId();
+LOG.info("Beam Fn Status client connected with id {}", workerId);
+
+WorkerStatusClient fnApiStatusClient =
+WorkerStatusClient.forRequestObserver(workerId, requestObserver);
+

[jira] [Updated] (BEAM-9084) Cleaning up SDK docker image tagging

2020-01-09 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9084:
---
Status: Open  (was: Triage Needed)

> Cleaning up SDK docker image tagging
> 
>
> Key: BEAM-9084
> URL: https://issues.apache.org/jira/browse/BEAM-9084
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=369496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369496
 ]

ASF GitHub Bot logged work on BEAM-8932:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:52
Start Date: 10/Jan/20 00:52
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10476: [BEAM-8932][Cleanup] 
Move external PubsubIO hooks outside of PubsubIO.
URL: https://github.com/apache/beam/pull/10476#issuecomment-572824070
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369496)
Time Spent: 12h  (was: 11h 50m)

> Expose complete Cloud Pub/Sub messages through PubsubIO API
> ---
>
> Key: BEAM-8932
> URL: https://issues.apache.org/jira/browse/BEAM-8932
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> The PubsubIO API only exposes a subset of the fields in the underlying 
> PubsubMessage protocol buffer. To accomodate future feature changes as well 
> as for greater compatability with code using the Cloud Pub/Sub apis, a method 
> to read and write these protocol messages should be exposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9068) Use local docker image if available

2020-01-09 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9068:
---
Status: Open  (was: Triage Needed)

> Use local docker image if available
> ---
>
> Key: BEAM-9068
> URL: https://issues.apache.org/jira/browse/BEAM-9068
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Affects Versions: 2.17.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners

2020-01-09 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-7861:
---
Fix Version/s: 2.19.0

> Make it easy to change between multi-process and multi-thread mode for Python 
> Direct runners
> 
>
> Key: BEAM-7861
> URL: https://issues.apache.org/jira/browse/BEAM-7861
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and 
> multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy 
> without changing the runner each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9084) Cleaning up SDK docker image tagging

2020-01-09 Thread Hannah Jiang (Jira)
Hannah Jiang created BEAM-9084:
--

 Summary: Cleaning up SDK docker image tagging
 Key: BEAM-9084
 URL: https://issues.apache.org/jira/browse/BEAM-9084
 Project: Beam
  Issue Type: Task
  Components: build-system
Affects Versions: 2.17.0, 2.16.0
Reporter: Hannah Jiang
Assignee: Hannah Jiang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9068) Use local docker image if available

2020-01-09 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9068:
---
Description: (was: Python already implemented in this way.)

> Use local docker image if available
> ---
>
> Key: BEAM-9068
> URL: https://issues.apache.org/jira/browse/BEAM-9068
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Affects Versions: 2.17.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9068) Use local docker image if available

2020-01-09 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9068:
---
Summary: Use local docker image if available  (was: Use local docker image 
if available for Java and Go)

> Use local docker image if available
> ---
>
> Key: BEAM-9068
> URL: https://issues.apache.org/jira/browse/BEAM-9068
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Affects Versions: 2.17.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>
> Python already implemented in this way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=369495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369495
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:48
Start Date: 10/Jan/20 00:48
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10548: [BEAM-8337] Fix Flink 
version munging.
URL: https://github.com/apache/beam/pull/10548#issuecomment-572823411
 
 
   Please do this on the master branch, as I run the release scripts from 
there, and there have been recent relevant changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369495)
Time Spent: 5.5h  (was: 5h 20m)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8685) Beam Dependency Update Request: com.google.auth:google-auth-library-oauth2-http

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8685?focusedWorklogId=369494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369494
 ]

ASF GitHub Bot logged work on BEAM-8685:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:39
Start Date: 10/Jan/20 00:39
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10508: [BEAM-8685] 
sdks/java: google_auth_version 0.19.0
URL: https://github.com/apache/beam/pull/10508#issuecomment-572821432
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369494)
Time Spent: 3h 20m  (was: 3h 10m)

> Beam Dependency Update Request: 
> com.google.auth:google-auth-library-oauth2-http
> ---
>
> Key: BEAM-8685
> URL: https://issues.apache.org/jira/browse/BEAM-8685
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:27.324449 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.18.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:03.844285 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.18.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:10:30.864371 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.18.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:09:38.646889 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.18.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-23 12:09:39.967215 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.19.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-30 14:05:10.534268 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.19.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-06 12:08:45.451960 
> -
> Please consider upgrading the dependency 
> com.google.auth:google-auth-library-oauth2-http. 
> The current version is 0.12.0. The latest version is 0.19.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369486
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365025392
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
+  estimator should implement all framework-called APIs, meanwhile, new added
+  APIs of the derived watermark estimator will also be exposed to a DoFn body.
+
+  Multi-threading safety is guarded by ThreadsafeWatermarkEstimator.
 
 Review comment:
   ```suggestion
 Internal state must not be updated asynchronously.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369486)
Time Spent: 1h  (was: 50m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369490
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365024488
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
+  estimator should implement all framework-called APIs, meanwhile, new added
+  APIs of the derived watermark estimator will also be exposed to a DoFn body.
 
 Review comment:
   ```suggestion
 and will be available when invoked from within a DoFn.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369490)
Time Spent: 1h 20m  (was: 1h 10m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369492
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365026174
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
+  estimator should implement all framework-called APIs, meanwhile, new added
+  APIs of the derived watermark estimator will also be exposed to a DoFn body.
+
+  Multi-threading safety is guarded by ThreadsafeWatermarkEstimator.
+  """
+  def get_estimator_state(self):
+"""Get current state of the WatermarkEstimator instance, which can be used
+to recreate the WatermarkEstimator when processing residual.
+
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369492)
Time Spent: 1.5h  (was: 1h 20m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369493
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365026144
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
+  estimator should implement all framework-called APIs, meanwhile, new added
+  APIs of the derived watermark estimator will also be exposed to a DoFn body.
+
+  Multi-threading safety is guarded by ThreadsafeWatermarkEstimator.
+  """
+  def get_estimator_state(self):
+"""Get current state of the WatermarkEstimator instance, which can be used
+to recreate the WatermarkEstimator when processing residual.
+
+This function is called by the system.
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369493)
Time Spent: 1h 40m  (was: 1.5h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369485
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365024408
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
+  estimator should implement all framework-called APIs, meanwhile, new added
 
 Review comment:
   ```suggestion
 estimators should implement all APIs listed below. Additional methods can 
be implemented
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369485)
Time Spent: 50m  (was: 40m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369489
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365026060
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
+  estimator should implement all framework-called APIs, meanwhile, new added
+  APIs of the derived watermark estimator will also be exposed to a DoFn body.
+
+  Multi-threading safety is guarded by ThreadsafeWatermarkEstimator.
+  """
+  def get_estimator_state(self):
+"""Get current state of the WatermarkEstimator instance, which can be used
+to recreate the WatermarkEstimator when processing residual.
+
+This function is called by the system.
+"""
+raise NotImplementedError(type(self))
+
+  def current_watermark(self):
+"""Get estimated output_watermark. This function is called by system."""
 
 Review comment:
   ```suggestion
   """Return the estimated output_watermark. This function must return 
monotonically increasing watermarks."""
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369489)
Time Spent: 1h 20m  (was: 1h 10m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369488
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365025890
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
+  estimator should implement all framework-called APIs, meanwhile, new added
+  APIs of the derived watermark estimator will also be exposed to a DoFn body.
+
+  Multi-threading safety is guarded by ThreadsafeWatermarkEstimator.
+  """
+  def get_estimator_state(self):
+"""Get current state of the WatermarkEstimator instance, which can be used
+to recreate the WatermarkEstimator when processing residual.
+
+This function is called by the system.
+"""
+raise NotImplementedError(type(self))
+
+  def current_watermark(self):
+"""Get estimated output_watermark. This function is called by system."""
+raise NotImplementedError(type(self))
+
+  def observe_timestamp(self, timestamp):
+"""Update tracking  watermark with latest output timestamp.
+
+Args:
+  timestamp: the `timestamp.Timestamp` of current output element.
+
+This function is called by system.
 
 Review comment:
   ```suggestion
   This is called with the timestamp of every element output from the DoFn.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369488)
Time Spent: 1h 10m  (was: 1h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369484
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365023850
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
+  estimator should implement all framework-called APIs, meanwhile, new added
+  APIs of the derived watermark estimator will also be exposed to a DoFn body.
+
+  Multi-threading safety is guarded by ThreadsafeWatermarkEstimator.
+  """
+  def get_estimator_state(self):
+"""Get current state of the WatermarkEstimator instance, which can be used
+to recreate the WatermarkEstimator when processing residual.
+
+This function is called by the system.
+"""
+raise NotImplementedError(type(self))
+
+  def current_watermark(self):
+"""Get estimated output_watermark. This function is called by system."""
+raise NotImplementedError(type(self))
+
+  def observe_timestamp(self, timestamp):
+"""Update tracking  watermark with latest output timestamp.
+
+Args:
+  timestamp: the `timestamp.Timestamp` of current output element.
+
+This function is called by system.
+"""
+raise NotImplementedError(type(self))
+
+
+class ThreadsafeWatermarkEstimator(object):
 
 Review comment:
   We should hide this as an implementation detail within the sdk harness code 
as we don't want users to have to worry about this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369484)
Time Spent: 40m  (was: 0.5h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369487
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:38
Start Date: 10/Jan/20 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365024172
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1236,6 +1236,74 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
+class WatermarkEstimator(object):
+  """A WatermarkEstimator which is used for estimating output_watermark based 
on
+  the timestamp of output records or manual modifications.
+
+  The base class provides common APIs that are called by the framework, which
+  are also accessible inside a DoFn.process() body. One derived watermark
 
 Review comment:
   ```suggestion
 are also accessible inside a DoFn.process() body. Derived watermark
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369487)
Time Spent: 1h  (was: 50m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=369483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369483
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:34
Start Date: 10/Jan/20 00:34
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10548: [BEAM-8337] Fix 
Flink version munging.
URL: https://github.com/apache/beam/pull/10548#discussion_r365025858
 
 

 ##
 File path: release/src/main/scripts/publish_docker_images.sh
 ##
 @@ -62,6 +62,11 @@ if [[ $confirmation = "y" ]]; then
   docker push apachebeam/go_sdk:latest
 
   echo '-Generating and Pushing Flink job server 
images-'
+  FLINK_VER=("$(ls -1 runners/flink | awk '/^[0-9]+\.[0-9]+$/{print}')")
 
 Review comment:
   It's deleting the local images only.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369483)
Time Spent: 5h 20m  (was: 5h 10m)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=369482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369482
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:33
Start Date: 10/Jan/20 00:33
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10548: [BEAM-8337] Fix 
Flink version munging.
URL: https://github.com/apache/beam/pull/10548#discussion_r365025655
 
 

 ##
 File path: release/src/main/scripts/publish_docker_images.sh
 ##
 @@ -62,6 +62,11 @@ if [[ $confirmation = "y" ]]; then
   docker push apachebeam/go_sdk:latest
 
   echo '-Generating and Pushing Flink job server 
images-'
+  FLINK_VER=("$(ls -1 runners/flink | awk '/^[0-9]+\.[0-9]+$/{print}')")
 
 Review comment:
   Look below at lines 81 onwards (I can't comment there because github).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369482)
Time Spent: 5h 10m  (was: 5h)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=369477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369477
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:27
Start Date: 10/Jan/20 00:27
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10548: [BEAM-8337] Fix 
Flink version munging.
URL: https://github.com/apache/beam/pull/10548#discussion_r365024326
 
 

 ##
 File path: release/src/main/scripts/publish_docker_images.sh
 ##
 @@ -62,6 +62,11 @@ if [[ $confirmation = "y" ]]; then
   docker push apachebeam/go_sdk:latest
 
   echo '-Generating and Pushing Flink job server 
images-'
+  FLINK_VER=("$(ls -1 runners/flink | awk '/^[0-9]+\.[0-9]+$/{print}')")
 
 Review comment:
   The published images are meant to be permanent.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369477)
Time Spent: 5h  (was: 4h 50m)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=369475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369475
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:24
Start Date: 10/Jan/20 00:24
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10548: [BEAM-8337] Fix 
Flink version munging.
URL: https://github.com/apache/beam/pull/10548#discussion_r365023595
 
 

 ##
 File path: release/src/main/scripts/publish_docker_images.sh
 ##
 @@ -62,6 +62,11 @@ if [[ $confirmation = "y" ]]; then
   docker push apachebeam/go_sdk:latest
 
   echo '-Generating and Pushing Flink job server 
images-'
+  FLINK_VER=("$(ls -1 runners/flink | awk '/^[0-9]+\.[0-9]+$/{print}')")
 
 Review comment:
   Missing `docker rmi` for Flink images. (the other file has it)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369475)
Time Spent: 4h 50m  (was: 4h 40m)

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369472
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:15
Start Date: 10/Jan/20 00:15
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365021479
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -1264,14 +1275,28 @@ def process(
 def create(*args):
 
   class SplitAndSizeRestrictions(beam.DoFn):
-def __init__(self, fn, restriction_provider):
+def __init__(self,
+ fn,
+ restriction_provider,
+ watermark_estimator_provider=None):
   self.restriction_provider = restriction_provider
+  self.watermark_estimator_provider = watermark_estimator_provider
 
 def process(self, element_restriction, *args, **kwargs):
-  element, restriction = element_restriction
-  for part, size in self.restriction_provider.split_and_size(
-  element, restriction):
-yield ((element, part), size)
+  if self.watermark_estimator_provider:
+element, (restriction, _) = element_restriction
+for part, size in self.restriction_provider.split_and_size(
+element, restriction):
+  yield ((element,
+  (part,
+   self.watermark_estimator_provider.initial_estimator_state(
 
 Review comment:
   Note that I call `initial_estimator_state(element, splitted_restriction)` 
again given that restriction could be changed by initial splitting.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369472)
Time Spent: 0.5h  (was: 20m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369471
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:14
Start Date: 10/Jan/20 00:14
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r365021157
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -1243,17 +1243,28 @@ def create(factory, transform_id, transform_proto, 
serialized_fn, consumers):
 def create(*args):
 
   class PairWithRestriction(beam.DoFn):
-def __init__(self, fn, restriction_provider):
+def __init__(self,
+ fn,
+ restriction_provider,
+ watermark_estimator_provider=None):
   self.restriction_provider = restriction_provider
-
+  self.watermark_estimator_provider = watermark_estimator_provider
 # An unused window is requested to force explosion of multi-window
 # WindowedValues.
 def process(
 self, element, _unused_window=beam.DoFn.WindowParam, *args, **kwargs):
   # TODO(SDF): Do we want to allow mutation of the element?
   # (E.g. it could be nice to shift bulky description to the portion
   # that can be distributed.)
-  yield element, self.restriction_provider.initial_restriction(element)
+  initial_restriction = self.restriction_provider.initial_restriction(
+  element)
+  if self.watermark_estimator_provider:
+yield (element,
 
 Review comment:
   We also need to emit (initial_restriction, initial_estimator_state) here to 
make sure `get_restriction_coder` works,
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369471)
Time Spent: 20m  (was: 10m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?focusedWorklogId=369470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369470
 ]

ASF GitHub Bot logged work on BEAM-8337:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:12
Start Date: 10/Jan/20 00:12
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10548: [BEAM-8337] Fix 
Flink version munging.
URL: https://github.com/apache/beam/pull/10548
 
 
   Also exit if parsing fails.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=369469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369469
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:03
Start Date: 10/Jan/20 00:03
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10115: [BEAM-8624] 
Implement Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#discussion_r365017047
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/BeamWorkerStatusGrpcService.java
 ##
 @@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.status;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ConcurrentSkipListMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusRequest;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusResponse;
+import 
org.apache.beam.model.fnexecution.v1.BeamFnWorkerStatusGrpc.BeamFnWorkerStatusImplBase;
+import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor;
+import org.apache.beam.runners.fnexecution.FnService;
+import org.apache.beam.runners.fnexecution.HeaderAccessor;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.StreamObserver;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Strings;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A Fn Status service which can collect run-time status information from SDK 
harnesses for
+ * debugging purpose.
+ */
+public class BeamWorkerStatusGrpcService extends BeamFnWorkerStatusImplBase 
implements FnService {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BeamWorkerStatusGrpcService.class);
+  private static final String DEFAULT_EXCEPTION_RESPONSE =
+  "Error: exception encountered getting status from SDK harness";
+
+  private final HeaderAccessor headerAccessor;
+  private final Map> 
connectedClient =
+  Collections.synchronizedMap(new HashMap<>());
+
+  private BeamWorkerStatusGrpcService(
+  ApiServiceDescriptor apiServiceDescriptor, HeaderAccessor 
headerAccessor) {
+this.headerAccessor = headerAccessor;
+LOG.info("Launched Beam Fn Status service at {}", apiServiceDescriptor);
+  }
+
+  /**
+   * Create new instance of {@link BeamWorkerStatusGrpcService}.
+   *
+   * @param apiServiceDescriptor describes the configuration for the endpoint 
the server will
+   * expose.
+   * @param headerAccessor headerAccessor gRPC header accessor used to obtain 
SDK harness worker id.
+   * @return {@link BeamWorkerStatusGrpcService}
+   */
+  public static BeamWorkerStatusGrpcService create(
+  ApiServiceDescriptor apiServiceDescriptor, HeaderAccessor 
headerAccessor) {
+return new BeamWorkerStatusGrpcService(apiServiceDescriptor, 
headerAccessor);
+  }
+
+  @Override
+  public void close() throws Exception {
+synchronized (connectedClient) {
+  for (CompletableFuture clientFuture : 
connectedClient.values()) {
+if (clientFuture.isDone()) {
+  clientFuture.get().close();
+}
+  }
+  connectedClient.clear();
+}
+  }
+
+  @Override
+  public StreamObserver workerStatus(
+  StreamObserver requestObserver) {
+String workerId = headerAccessor.getSdkWorkerId();
+LOG.info("Beam Fn Status client connected with id {}", workerId);
+
+WorkerStatusClient fnApiStatusClient =
+WorkerStatusClient.forRequestObserver(workerId, requestObserver);
+

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=369468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369468
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 10/Jan/20 00:01
Start Date: 10/Jan/20 00:01
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10375: [BEAM-8537] Provide 
WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#issuecomment-572812388
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369468)
Remaining Estimate: 0h
Time Spent: 10m

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9083) PR9677 breaks ValidatesRunnerTest of open source runners

2020-01-09 Thread Boyuan Zhang (Jira)
Boyuan Zhang created BEAM-9083:
--

 Summary: PR9677 breaks ValidatesRunnerTest of open source runners 
 Key: BEAM-9083
 URL: https://issues.apache.org/jira/browse/BEAM-9083
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Boyuan Zhang
Assignee: Shehzaad Nakhoda


Breakages in:
Java_ValidatesRunner_Samza: 
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/
Java_ValidatesRunner_Spark: 
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/
Java_PVR_Flink_Streaming: 
https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming
Java_PVR_Flink_Batch: 
https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=369467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369467
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 09/Jan/20 23:56
Start Date: 09/Jan/20 23:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10115: [BEAM-8624] 
Implement Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#discussion_r365016815
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/status/WorkerStatusClient.java
 ##
 @@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.status;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.function.Consumer;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusRequest;
+import org.apache.beam.model.fnexecution.v1.BeamFnApi.WorkerStatusResponse;
+import org.apache.beam.sdk.fn.IdGenerator;
+import org.apache.beam.sdk.fn.IdGenerators;
+import org.apache.beam.sdk.fn.stream.SynchronizedStreamObserver;
+import org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.StreamObserver;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Client for handling requests and responses over Fn Worker Status Api 
between runner and SDK
+ * Harness.
+ */
+class WorkerStatusClient implements Closeable {
+
+  public static final Logger LOG = 
LoggerFactory.getLogger(WorkerStatusClient.class);
+  private final IdGenerator idGenerator = IdGenerators.incrementingLongs();
+  private final StreamObserver requestReceiver;
+  private final Map> 
responseQueue =
+  new ConcurrentHashMap<>();
+  private final String workerId;
+  private Consumer deregisterCallback;
+  private AtomicBoolean isClosed = new AtomicBoolean(false);
+
+  private WorkerStatusClient(String workerId, 
StreamObserver requestReceiver) {
+this.requestReceiver = 
SynchronizedStreamObserver.wrapping(requestReceiver);
+this.workerId = workerId;
+  }
+
+  /**
+   * Create new status api client with SDK Harness worker id and request 
observer.
+   *
+   * @param workerId SDK Harness worker id.
+   * @param requestObserver The outbound request observer this client uses to 
send new status
+   * requests to its corresponding SDK Harness.
+   * @return {@link WorkerStatusClient}
+   */
+  public static WorkerStatusClient forRequestObserver(
+  String workerId, StreamObserver requestObserver) {
+return new WorkerStatusClient(workerId, requestObserver);
+  }
+
+  /**
+   * Get the latest sdk worker status from the client's corresponding SDK 
Harness. A random id will
+   * be used to specify the request_id field.
+   *
+   * @return {@link CompletableFuture} of the SDK Harness status response.
+   */
+  public CompletableFuture getWorkerStatus() {
+WorkerStatusRequest request =
+WorkerStatusRequest.newBuilder().setId(idGenerator.getId()).build();
+return getWorkerStatus(request);
+  }
+
+  /**
+   * Get the latest sdk worker status from the client's corresponding SDK 
Harness with request.
+   *
+   * @param request WorkerStatusRequest to be sent to SDK Harness.
+   * @return {@link CompletableFuture} of the SDK Harness status response.
+   */
+  CompletableFuture getWorkerStatus(WorkerStatusRequest 
request) {
+CompletableFuture future = new CompletableFuture<>();
 
 Review comment:
   Your right because the javadoc does say that Collections.synchronousMap 
synchronizes on the map delegate so this is not needed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time 

[jira] [Work logged] (BEAM-8844) [SQL] Create performance tests for BigQueryTable

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8844?focusedWorklogId=369466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369466
 ]

ASF GitHub Bot logged work on BEAM-8844:


Author: ASF GitHub Bot
Created on: 09/Jan/20 23:54
Start Date: 09/Jan/20 23:54
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #10547: [BEAM-8844] 
Missing commit with suggested review changes
URL: https://github.com/apache/beam/pull/10547
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369466)
Time Spent: 10h 50m  (was: 10h 40m)

> [SQL] Create performance tests for BigQueryTable
> 
>
> Key: BEAM-8844
> URL: https://issues.apache.org/jira/browse/BEAM-8844
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> They should measure read-time for:
>  * DIRECT_READ w/o push-down
>  * DIRECT_READ w/ push-down
>  * DEFAULT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7861?focusedWorklogId=369465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369465
 ]

ASF GitHub Bot logged work on BEAM-7861:


Author: ASF GitHub Bot
Created on: 09/Jan/20 23:48
Start Date: 09/Jan/20 23:48
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #10536: 
[BEAM-7861] Add direct_running_mode option for direct runners to switch between 
multi_threading and multi_processing easily
URL: https://github.com/apache/beam/pull/10536#discussion_r365015081
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -450,6 +450,19 @@ def run_pipeline(self,
 pipeline_options.DirectOptions).direct_runner_bundle_repeat
 self._num_workers = options.view_as(
 pipeline_options.DirectOptions).direct_num_workers or self._num_workers
+
+# set direct workers running mode if it is defined with pipeline options.
+running_mode = \
+  options.view_as(pipeline_options.DirectOptions).direct_running_mode
+if running_mode == 'multi_threading':
+  self._default_environment = environments.EmbeddedPythonGrpcEnvironment()
+elif running_mode == 'multi_processing':
+  command_bytes = b'%s -m apache_beam.runners.worker.sdk_worker_main' \
+   % sys.executable.encode('ascii')
 
 Review comment:
   I removed `command_bytes` because we don't need it anymore. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369465)
Time Spent: 2.5h  (was: 2h 20m)

> Make it easy to change between multi-process and multi-thread mode for Python 
> Direct runners
> 
>
> Key: BEAM-7861
> URL: https://issues.apache.org/jira/browse/BEAM-7861
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and 
> multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy 
> without changing the runner each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=369463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369463
 ]

ASF GitHub Bot logged work on BEAM-8932:


Author: ASF GitHub Bot
Created on: 09/Jan/20 23:37
Start Date: 09/Jan/20 23:37
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10476: [BEAM-8932][Cleanup] 
Move external PubsubIO hooks outside of PubsubIO.
URL: https://github.com/apache/beam/pull/10476#issuecomment-572806539
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369463)
Time Spent: 11h 50m  (was: 11h 40m)

> Expose complete Cloud Pub/Sub messages through PubsubIO API
> ---
>
> Key: BEAM-8932
> URL: https://issues.apache.org/jira/browse/BEAM-8932
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> The PubsubIO API only exposes a subset of the fields in the underlying 
> PubsubMessage protocol buffer. To accomodate future feature changes as well 
> as for greater compatability with code using the Cloud Pub/Sub apis, a method 
> to read and write these protocol messages should be exposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8844) [SQL] Create performance tests for BigQueryTable

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8844?focusedWorklogId=369461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369461
 ]

ASF GitHub Bot logged work on BEAM-8844:


Author: ASF GitHub Bot
Created on: 09/Jan/20 23:28
Start Date: 09/Jan/20 23:28
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #10547: [BEAM-8844] Missing 
commit with suggested review changes
URL: https://github.com/apache/beam/pull/10547#issuecomment-572804259
 
 
   This is a attempt to hack! Do not run tests Jenkins!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369461)
Time Spent: 10h 40m  (was: 10.5h)

> [SQL] Create performance tests for BigQueryTable
> 
>
> Key: BEAM-8844
> URL: https://issues.apache.org/jira/browse/BEAM-8844
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> They should measure read-time for:
>  * DIRECT_READ w/o push-down
>  * DIRECT_READ w/ push-down
>  * DEFAULT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9082) Spurious GRPC errors in Flink/Spark runner log output

2020-01-09 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012295#comment-17012295
 ] 

Kyle Weaver commented on BEAM-9082:
---

This happens because the data plane is closed by the server without notifying 
the client.

https://github.com/apache/beam/blob/4c18cb4ada2650552a0006dfffd68d0775dd76c6/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/data/BeamFnDataGrpcService.java#L91


> Spurious GRPC errors in Flink/Spark runner log output
> -
>
> Key: BEAM-9082
> URL: https://issues.apache.org/jira/browse/BEAM-9082
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink, portability-spark
>
> We often see "Socket closed" errors on job shutdown, even though the pipeline 
> has finished successfully. They are misleading and especially annoying at 
> scale.
> ERROR:root:Failed to read inputs in the data plane.
> ...
> grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that 
> terminated with:
>   status = StatusCode.UNAVAILABLE
>   details = "Socket closed"
>   debug_error_string = 
> "{"created":"@1578597616.309419460","description":"Error received from peer 
> ipv6:[::1]:37211","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket
>  closed","grpc_status":14}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8844) [SQL] Create performance tests for BigQueryTable

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8844?focusedWorklogId=369460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369460
 ]

ASF GitHub Bot logged work on BEAM-8844:


Author: ASF GitHub Bot
Created on: 09/Jan/20 23:24
Start Date: 09/Jan/20 23:24
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #10547: [BEAM-8844] 
Missing commit with suggested review changes
URL: https://github.com/apache/beam/pull/10547
 
 
   PR #10226 is missing a commit with addressed review suggestions.
   R: @apilloud 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-9082) Spurious GRPC errors in Flink/Spark runner log output

2020-01-09 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-9082:
-

 Summary: Spurious GRPC errors in Flink/Spark runner log output
 Key: BEAM-9082
 URL: https://issues.apache.org/jira/browse/BEAM-9082
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink, runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


We often see "Socket closed" errors on job shutdown, even though the pipeline 
has finished successfully. They are misleading and especially annoying at scale.

ERROR:root:Failed to read inputs in the data plane.
...
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that 
terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = 
"{"created":"@1578597616.309419460","description":"Error received from peer 
ipv6:[::1]:37211","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket
 closed","grpc_status":14}"




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7232) Python test_external_transforms fails on Spark runner

2020-01-09 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-7232.
---
Fix Version/s: Not applicable
   Resolution: Not A Problem

> Python test_external_transforms fails on Spark runner
> -
>
> Key: BEAM-7232
> URL: https://issues.apache.org/jira/browse/BEAM-7232
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
> Fix For: Not applicable
>
>
> test_external_transforms should theoretically work, but it instead throws a 
> cryptic error:
> <_Rendezvous of RPC that terminated with:
>  status = StatusCode.UNKNOWN
>  details = ""
>  debug_error_string = 
> "\{"created":"@1557171218.145369401","description":"Error received from peer 
> ipv4:127.0.0.1:48159","file":"src/core/lib/surface/call.cc","file_line":1041,"grpc_message":"","grpc_status":2}"
>  >
> I could just be overlooking some obvious config issue.
>  
> EDIT: the previous issue has been fixed. now it seems to fail because the 
> read source is unbounded
> [https://github.com/apache/beam/blob/23714a335e8a6e0d91106a366e470a3a4820ae27/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReadTranslation.java#L92]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8993) [SQL] MongoDb should use predicate push-down

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8993?focusedWorklogId=369449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369449
 ]

ASF GitHub Bot logged work on BEAM-8993:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:55
Start Date: 09/Jan/20 22:55
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #10417: [BEAM-8993] [SQL] 
MongoDB predicate push down.
URL: https://github.com/apache/beam/pull/10417#issuecomment-572794760
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369449)
Time Spent: 1h 40m  (was: 1.5h)

> [SQL] MongoDb should use predicate push-down
> 
>
> Key: BEAM-8993
> URL: https://issues.apache.org/jira/browse/BEAM-8993
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> * Add a MongoDbFilter class, implementing BeamSqlTableFilter.
>  ** Support simple comparison operations.
>  ** Support boolean field.
>  ** Support nested conjunction/disjunction.
>  * Update MongoDbTable#buildIOReader
>  ** Construct a push-down filter from RexNodes.
>  ** Set filter to FindQuery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8993) [SQL] MongoDb should use predicate push-down

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8993?focusedWorklogId=369448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369448
 ]

ASF GitHub Bot logged work on BEAM-8993:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:55
Start Date: 09/Jan/20 22:55
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #10417: [BEAM-8993] [SQL] 
MongoDB predicate push down.
URL: https://github.com/apache/beam/pull/10417#issuecomment-572794691
 
 
   Run java presubmit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369448)
Time Spent: 1.5h  (was: 1h 20m)

> [SQL] MongoDb should use predicate push-down
> 
>
> Key: BEAM-8993
> URL: https://issues.apache.org/jira/browse/BEAM-8993
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> * Add a MongoDbFilter class, implementing BeamSqlTableFilter.
>  ** Support simple comparison operations.
>  ** Support boolean field.
>  ** Support nested conjunction/disjunction.
>  * Update MongoDbTable#buildIOReader
>  ** Construct a push-down filter from RexNodes.
>  ** Set filter to FindQuery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8844) [SQL] Create performance tests for BigQueryTable

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8844?focusedWorklogId=369447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369447
 ]

ASF GitHub Bot logged work on BEAM-8844:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:55
Start Date: 09/Jan/20 22:55
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #10226: [BEAM-8844] 
Add a new Jenkins job for SQL perf tests
URL: https://github.com/apache/beam/pull/10226
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369447)
Time Spent: 10h 20m  (was: 10h 10m)

> [SQL] Create performance tests for BigQueryTable
> 
>
> Key: BEAM-8844
> URL: https://issues.apache.org/jira/browse/BEAM-8844
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> They should measure read-time for:
>  * DIRECT_READ w/o push-down
>  * DIRECT_READ w/ push-down
>  * DEFAULT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369446
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:53
Start Date: 09/Jan/20 22:53
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #10527: [BEAM-2879] 
Metric name should not be constant
URL: https://github.com/apache/beam/pull/10527
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369446)
Time Spent: 6h 40m  (was: 6.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7861?focusedWorklogId=369444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369444
 ]

ASF GitHub Bot logged work on BEAM-7861:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:52
Start Date: 09/Jan/20 22:52
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10536: [BEAM-7861] Add 
direct_running_mode option for direct runners to switch between multi_threading 
and multi_processing easily
URL: https://github.com/apache/beam/pull/10536#discussion_r364999362
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -450,6 +450,19 @@ def run_pipeline(self,
 pipeline_options.DirectOptions).direct_runner_bundle_repeat
 self._num_workers = options.view_as(
 pipeline_options.DirectOptions).direct_num_workers or self._num_workers
+
+# set direct workers running mode if it is defined with pipeline options.
+running_mode = \
+  options.view_as(pipeline_options.DirectOptions).direct_running_mode
+if running_mode == 'multi_threading':
+  self._default_environment = environments.EmbeddedPythonGrpcEnvironment()
+elif running_mode == 'multi_processing':
+  command_bytes = b'%s -m apache_beam.runners.worker.sdk_worker_main' \
+   % sys.executable.encode('ascii')
 
 Review comment:
   Why not utf-8?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369444)
Time Spent: 2h 20m  (was: 2h 10m)

> Make it easy to change between multi-process and multi-thread mode for Python 
> Direct runners
> 
>
> Key: BEAM-7861
> URL: https://issues.apache.org/jira/browse/BEAM-7861
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and 
> multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy 
> without changing the runner each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369438
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:41
Start Date: 09/Jan/20 22:41
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r364991139
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -808,7 +810,7 @@ class AppliedPTransform(object):
 
   def __init__(self,
parent,
-   transform,  # type: ptransform.PTransform
+   transform,  # type: Optional[ptransform.PTransform]
 
 Review comment:
   On the second issue, I should point that mypy knows that `proto` is not 
`None` when we call `PTransform. from_runner_api ()` (because `proto.spec` is 
always non-None), and we could _almost_ use that knowledge to solve this 
problem with `@overload`s of `PTransform.from_runner_api()`, like this:
   
   ```python
 @overload
 @classmethod
 def from_runner_api(cls,
 proto,  # type: None
 context  # type: PipelineContext
):
   # type: (...) -> None
   pass
   
 @overload
 @classmethod
 def from_runner_api(cls,
 proto,  # type: beam_runner_api_pb2.FunctionSpec
 context  # type: PipelineContext
):
   # type: (...) -> PTransform
   pass
 
 @classmethod
 def from_runner_api(cls, proto, context):
   if proto is None or not proto.urn:
 return None
   ```
   
   Unfortunately, typing can't track whether the value of `proto.urn` is empty, 
which means that the above is not really accurate.   Is there any chance that 
this could be changed to `if proto is None or proto.urn is None`?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369438)
Time Spent: 44h  (was: 43h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 44h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8337) Add Flink job server container images to release process

2020-01-09 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012275#comment-17012275
 ] 

Udi Meiri commented on BEAM-8337:
-

It failed. This is the output:

{code}
-Generating and Pushing Flink job server images-
Building containers for the following Flink versions: 
Configuration on demand is an incubating feature.

FAILURE: Build failed with an exception.

* What went wrong:
874eee0d2c3b: Pushed 

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 6.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See 
https://docs.gradle.org/5.2.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 0s
2.18.0_rc1: digest: 
sha256:22b36a46e8e0dacbbdae6a6e1f9b316e824d3cdfc5e4be982bedd3ef0dca size: 
4104
{code}

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7861) Make it easy to change between multi-process and multi-thread mode for Python Direct runners

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7861?focusedWorklogId=369434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369434
 ]

ASF GitHub Bot logged work on BEAM-7861:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:31
Start Date: 09/Jan/20 22:31
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #10536: [BEAM-7861] Add 
direct_running_mode option for direct runners to switch between multi_threading 
and multi_processing easily
URL: https://github.com/apache/beam/pull/10536#issuecomment-572787148
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369434)
Time Spent: 2h 10m  (was: 2h)

> Make it easy to change between multi-process and multi-thread mode for Python 
> Direct runners
> 
>
> Key: BEAM-7861
> URL: https://issues.apache.org/jira/browse/BEAM-7861
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and 
> multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy 
> without changing the runner each time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9008) add readAll() method to CassandraIO

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=369433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369433
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:30
Start Date: 09/Jan/20 22:30
Worklog Time Spent: 10m 
  Work Description: vmarquez commented on pull request #10546: [BEAM-9008] 
add CassandraIO readAll method
URL: https://github.com/apache/beam/pull/10546
 
 
   This PR adds a readAll method that returns a `PTransform, 
A>` to allow for parallel querying of a small subset of a Cassandra databse.  
In addition, it 
refactors the current readAll PTransform to be a ParDo based one, so we can 
share code between read and readAll.  
   
   Note: One thing I wasn't sure about was how to abstract over AutoBuilder 
classes.  I'm much more familiar with Scala than Java, so I wasn't sure if 
there was a better way than what I did with having a `getCassandraConfig` on 
both the static Read class as well as the ReadAll class.  Don't think I can use 
an interface since I don't want the methods public.  Open to suggestions here. 
   
   Thanks in advance for being patient with a large change and please let me 
know if I can change/add anything!  
   
   R:@iemejia 
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=369432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369432
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:28
Start Date: 09/Jan/20 22:28
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10367: [BEAM-7746] 
Add python type hints (part 2)
URL: https://github.com/apache/beam/pull/10367#discussion_r364991139
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -808,7 +810,7 @@ class AppliedPTransform(object):
 
   def __init__(self,
parent,
-   transform,  # type: ptransform.PTransform
+   transform,  # type: Optional[ptransform.PTransform]
 
 Review comment:
   On the second issue, I should point that mypy knows that `proto` is not 
`None` when we call `PTransform. from_runner_api ()` (because `proto.spec` is 
always non-None), and we could _almost_ use that knowledge to solve this 
problem with `@overload`s of `PTransform.from_runner_api()`, like this:
   
   ```python
 @overload
 @classmethod
 def from_runner_api(cls,
 proto,  # type: None
 context  # type: PipelineContext
):
   # type: (...) -> None
   pass
   
 @overload
 @classmethod
 def from_runner_api(cls,
 proto,  # type: beam_runner_api_pb2.FunctionSpec
 context  # type: PipelineContext
):
   # type: (...) -> PTransform
   pass
 
 @classmethod
 def from_runner_api(cls, proto, context):
   if proto is None or not proto.urn:
 return None
   ```
   
   Unfortunately, typing can't track whether the value of `proto.urn` is empty. 
  Is there any chance that this could be changed to `proto.urn is None`?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369432)
Time Spent: 43h 50m  (was: 43h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 43h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >