[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400557
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 10/Mar/20 04:48
Start Date: 10/Mar/20 04:48
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r390093798
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -166,11 +168,20 @@ public static DefaultJobBundleFactory create(
 CacheBuilder.newBuilder()
 .removalListener(
 (RemovalNotification 
notification) -> {
-  int refCount = notification.getValue().unref();
-  LOG.debug(
-  "Removed environment {} with {} remaining bundle 
references.",
-  notification.getKey(),
-  refCount);
+  WrappedSdkHarnessClient client = notification.getValue();
 
 Review comment:
   The correct solution might be to enforce this from close, not from the cache 
removal listener.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400557)
Time Spent: 3h  (was: 2h 50m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400556
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 10/Mar/20 04:46
Start Date: 10/Mar/20 04:46
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r390093216
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -166,11 +168,20 @@ public static DefaultJobBundleFactory create(
 CacheBuilder.newBuilder()
 .removalListener(
 (RemovalNotification 
notification) -> {
-  int refCount = notification.getValue().unref();
-  LOG.debug(
-  "Removed environment {} with {} remaining bundle 
references.",
-  notification.getKey(),
-  refCount);
+  WrappedSdkHarnessClient client = notification.getValue();
 
 Review comment:
   As stated above, reference counting is required for environment expiration. 
The environment can only be closed when all bundles that reference it have 
finished. I would prefer we limit this PR scope strictly to the original 
purpose and discuss other changes separately.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400556)
Time Spent: 2h 50m  (was: 2h 40m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9465) Reshuffle should trigger repeatedly

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9465?focusedWorklogId=400550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400550
 ]

ASF GitHub Bot logged work on BEAM-9465:


Author: ASF GitHub Bot
Created on: 10/Mar/20 04:09
Start Date: 10/Mar/20 04:09
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11071: [BEAM-9465] Fire 
repeatedly in reshuffle
URL: https://github.com/apache/beam/pull/11071#issuecomment-596892250
 
 
   Please hold PR a bit if there is a failure in precommit. I am narrowing down 
issues in release branch now so trying to avoid introducing noise.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400550)
Time Spent: 2h  (was: 1h 50m)

> Reshuffle should trigger repeatedly
> ---
>
> Key: BEAM-9465
> URL: https://issues.apache.org/jira/browse/BEAM-9465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py#L516|https://www.google.com/url?q=https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py%23L516=D]
> should fire repeatedly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9479) Provide option to run pylint in local git pre-commit

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9479?focusedWorklogId=400547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400547
 ]

ASF GitHub Bot logged work on BEAM-9479:


Author: ASF GitHub Bot
Created on: 10/Mar/20 03:30
Start Date: 10/Mar/20 03:30
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #11091: [BEAM-9479] 
Add pre-commit hook for pylint
URL: https://github.com/apache/beam/pull/11091
 
 
   As with yapf, it would be great to run pylint locally on files that I 
edited. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-9479) Provide option to run pylint in local git pre-commit

2020-03-09 Thread Chad Dombrova (Jira)
Chad Dombrova created BEAM-9479:
---

 Summary: Provide option to run pylint in local git pre-commit
 Key: BEAM-9479
 URL: https://issues.apache.org/jira/browse/BEAM-9479
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Chad Dombrova
Assignee: Chad Dombrova


Now that we have support for running yapf in pre-commit, it would be nice to 
add pylint. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=400543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400543
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 10/Mar/20 03:21
Start Date: 10/Mar/20 03:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11068: [BEAM-2939, 
BEAM-9458] Use deduplication transform for UnboundedSources wrapped as SDFs 
that need dedup
URL: https://github.com/apache/beam/pull/11068
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400543)
Time Spent: 25h 20m  (was: 25h 10m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 25h 20m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=400538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400538
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 10/Mar/20 03:09
Start Date: 10/Mar/20 03:09
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [BEAM-8382] Add rate 
limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-596878950
 
 
   https://github.com/apache/beam/pull/11090
   
   On Mon, Mar 9, 2020 at 3:34 PM Jonothan Farr 
   wrote:
   
   > Sure, I will take a look.
   >
   > On Mar 9, 2020, at 9:29 AM, Alexey Romanenko 
   > wrote:
   >
   > 
   >
   > Hmm, there is some flaky test in KinesisIO caused by
   > 
org.apache.beam.sdk.io.kinesis.ShardReadersPoolTest.shouldCallRateLimitPolicy
   > introduced in this PR. Could you take a look please on this issue
   > ?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400538)
Time Spent: 15h 50m  (was: 15h 40m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=400533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400533
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 10/Mar/20 03:00
Start Date: 10/Mar/20 03:00
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #9942: [BEAM-3288] Do not 
drop data after trigger finishes
URL: https://github.com/apache/beam/pull/9942#issuecomment-596876766
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400533)
Time Spent: 5h 50m  (was: 5h 40m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9470) :sdks:java:io:kinesis:test is flaky

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9470?focusedWorklogId=400532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400532
 ]

ASF GitHub Bot logged work on BEAM-9470:


Author: ASF GitHub Bot
Created on: 10/Mar/20 03:00
Start Date: 10/Mar/20 03:00
Worklog Time Spent: 10m 
  Work Description: jfarr commented on pull request #11090: [BEAM-9470] 
:sdks:java:io:kinesis:test is flaky
URL: https://github.com/apache/beam/pull/11090
 
 
   I believe that the way this test was written it may have been sensitive to 
the order of concurrent operations. This implementation should not be. r: 
@aromanenko-dev 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=400534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400534
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 10/Mar/20 03:00
Start Date: 10/Mar/20 03:00
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #9942: [BEAM-3288] 
Do not drop data after trigger finishes
URL: https://github.com/apache/beam/pull/9942
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400534)
Time Spent: 6h  (was: 5h 50m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=400518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400518
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 10/Mar/20 02:24
Start Date: 10/Mar/20 02:24
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11068: [BEAM-2939, 
BEAM-9458] Use deduplication transform for UnboundedSources wrapped as SDFs 
that need dedup
URL: https://github.com/apache/beam/pull/11068#issuecomment-596868352
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400518)
Time Spent: 25h 10m  (was: 25h)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 25h 10m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9470) :sdks:java:io:kinesis:test is flaky

2020-03-09 Thread Jonothan Farr (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonothan Farr reassigned BEAM-9470:
---

Assignee: Jonothan Farr

> :sdks:java:io:kinesis:test is flaky
> ---
>
> Key: BEAM-9470
> URL: https://issues.apache.org/jira/browse/BEAM-9470
> Project: Beam
>  Issue Type: Test
>  Components: io-java-kinesis
>Reporter: Etienne Chauchot
>Assignee: Jonothan Farr
>Priority: Major
>
> [https://scans.gradle.com/s/b4jmmu72ku5jc/console-log?task=:sdks:java:io:kinesis:test]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5898) Beam Dependency Update Request: io.grpc:grpc-core

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5898?focusedWorklogId=400509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400509
 ]

ASF GitHub Bot logged work on BEAM-5898:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:59
Start Date: 10/Mar/20 01:59
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #11063: [BEAM-5898] 
Upgrading gRPC to 1.27
URL: https://github.com/apache/beam/pull/11063#issuecomment-596862312
 
 
   R: @lukecwik 
   22 successful checks (and 1 unknown Java 11 test)!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400509)
Time Spent: 4.5h  (was: 4h 20m)

> Beam Dependency Update Request: io.grpc:grpc-core
> -
>
> Key: BEAM-5898
> URL: https://issues.apache.org/jira/browse/BEAM-5898
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
>  - 2018-10-29 12:15:30.587081 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-05 12:13:13.811384 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-12 12:13:11.117333 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-19 12:13:49.472228 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-26 12:12:56.623629 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-03 12:13:19.520184 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-10 12:15:49.586565 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.17.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-17 12:16:08.426850 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.17.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-31 15:22:26.517526 
> -
> Please consider upgrading the dependency 

[jira] [Work logged] (BEAM-9431) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported with streaming engine

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9431?focusedWorklogId=400507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400507
 ]

ASF GitHub Bot logged work on BEAM-9431:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:55
Start Date: 10/Mar/20 01:55
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #11033: [BEAM-9431] Remove 
ReadFromPubSub/Read-out0-ElementCount from the met…
URL: https://github.com/apache/beam/pull/11033#issuecomment-596861410
 
 
   @Ardagan Friendly ping. Do the change look good?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400507)
Time Spent: 1h  (was: 50m)

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not 
> supported with streaming engine
> -
>
> Key: BEAM-9431
> URL: https://issues.apache.org/jira/browse/BEAM-9431
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is failing on 
> Dataflow V2 as ReadFromPubSub/Read-out0-ElementCount is not implemented in 
> with streaming engine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9465) Reshuffle should trigger repeatedly

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9465?focusedWorklogId=400506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400506
 ]

ASF GitHub Bot logged work on BEAM-9465:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:54
Start Date: 10/Mar/20 01:54
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #11071: [BEAM-9465] Fire 
repeatedly in reshuffle
URL: https://github.com/apache/beam/pull/11071#issuecomment-596861227
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400506)
Time Spent: 1h 50m  (was: 1h 40m)

> Reshuffle should trigger repeatedly
> ---
>
> Key: BEAM-9465
> URL: https://issues.apache.org/jira/browse/BEAM-9465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py#L516|https://www.google.com/url?q=https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py%23L516=D]
> should fire repeatedly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=400503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400503
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:20
Start Date: 10/Mar/20 01:20
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#discussion_r390045526
 
 

 ##
 File path: licenses/scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,157 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""A script to pull licenses for Python.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import yaml
+
+from tenacity import retry
+from tenacity import stop_after_attempt
+
+
+def run_bash_command(command):
+  process = subprocess.Popen(command.split(),
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE)
+  result, error = process.communicate()
+  if error:
+raise RuntimeError('Error occurred when running a bash command.',
+   'command: ', command, 'error message: ',
+   error.decode('utf-8'))
+  return result.decode('utf-8')
+
+
+try:
+  import wget
+except:
+  command = 'pip install wget --no-cache-dir'
+  run_bash_command(command)
+  import wget
+
+
+def install_pip_licenses():
+  command = 'pip install pip-licenses --no-cache-dir'
+  run_bash_command(command)
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name']
+  dest_dir = '/'.join([license_dir, name.lower()])
+  try:
+if not os.path.isdir(dest_dir):
+  os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+return True
+  except Exception as e:
+print(e)
+return False
+
+
+@retry(stop=stop_after_attempt(3))
+def pull_from_url(dep, configs):
+  '''
+  :param dep: name of a dependency
+  :param configs: a dict from dep_urls_py.yaml
+  :return: boolean
+
+  It downloads files form urls to a temp directory first in order to avoid
+  to deal with any temp files. It helps keep clean final directory.
+  '''
+  if dep in configs.keys():
+config = configs[dep]
+dest_dir = '/'.join([license_dir, dep])
+cur_temp_dir = 'temp_license_' + dep
+os.mkdir(cur_temp_dir)
+try:
+  is_file_available = False
+  # license is required, but not all dependencies have license.
+  # In case we have to skip, print out a message.
+  if config['license'] != 'skip':
+wget.download(config['license'], cur_temp_dir + '/LICENSE')
+is_file_available = True
+  # notice is optional.
+  if 'notice' in config:
+wget.download(config['notice'], cur_temp_dir + '/NOTICE')
+is_file_available = True
+  # copy from temp dir to final dir only when either file is abailable.
+  if is_file_available:
+if os.path.isdir(dest_dir):
+  shutil.rmtree(dest_dir)
+shutil.copytree(cur_temp_dir, dest_dir)
+  result = True
+except Exception as e:
+  print('Error occurred when pull license from url.', 'dependency =',
+dep, 'url =', config, 'error = ', e.decode('utf-8'))
+  result = False
+finally:
+  shutil.rmtree(cur_temp_dir)
+  return result
+  else:
+return False
+
+
+if __name__ == "__main__":
+  cur_dir = os.getcwd()
+  if cur_dir.split('/')[-1] != 'beam':
+raise RuntimeError('This script should run from ~/beam directory.')
+  license_dir = os.getcwd() + '/licenses/python'
+  no_licenses = []
+
+  with open('licenses/scripts/dep_urls_py.yaml') as file:
+dep_config = yaml.load(file)
+
+  install_pip_licenses()
+  dependencies = run_pip_licenses()
+  # add licenses for pip installed packages.
+  # try to pull licenses 

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=400496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400496
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:09
Start Date: 10/Mar/20 01:09
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#discussion_r390011284
 
 

 ##
 File path: licenses/scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,157 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""A script to pull licenses for Python.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import yaml
+
+from tenacity import retry
+from tenacity import stop_after_attempt
+
+
+def run_bash_command(command):
+  process = subprocess.Popen(command.split(),
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE)
+  result, error = process.communicate()
+  if error:
+raise RuntimeError('Error occurred when running a bash command.',
+   'command: ', command, 'error message: ',
+   error.decode('utf-8'))
+  return result.decode('utf-8')
+
+
+try:
+  import wget
+except:
+  command = 'pip install wget --no-cache-dir'
+  run_bash_command(command)
+  import wget
+
+
+def install_pip_licenses():
+  command = 'pip install pip-licenses --no-cache-dir'
+  run_bash_command(command)
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name']
+  dest_dir = '/'.join([license_dir, name.lower()])
+  try:
+if not os.path.isdir(dest_dir):
+  os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+return True
+  except Exception as e:
+print(e)
+return False
+
+
+@retry(stop=stop_after_attempt(3))
+def pull_from_url(dep, configs):
+  '''
+  :param dep: name of a dependency
+  :param configs: a dict from dep_urls_py.yaml
+  :return: boolean
+
+  It downloads files form urls to a temp directory first in order to avoid
+  to deal with any temp files. It helps keep clean final directory.
+  '''
+  if dep in configs.keys():
+config = configs[dep]
+dest_dir = '/'.join([license_dir, dep])
+cur_temp_dir = 'temp_license_' + dep
+os.mkdir(cur_temp_dir)
+try:
+  is_file_available = False
+  # license is required, but not all dependencies have license.
+  # In case we have to skip, print out a message.
+  if config['license'] != 'skip':
+wget.download(config['license'], cur_temp_dir + '/LICENSE')
+is_file_available = True
+  # notice is optional.
+  if 'notice' in config:
+wget.download(config['notice'], cur_temp_dir + '/NOTICE')
+is_file_available = True
+  # copy from temp dir to final dir only when either file is abailable.
+  if is_file_available:
+if os.path.isdir(dest_dir):
+  shutil.rmtree(dest_dir)
+shutil.copytree(cur_temp_dir, dest_dir)
+  result = True
+except Exception as e:
+  print('Error occurred when pull license from url.', 'dependency =',
+dep, 'url =', config, 'error = ', e.decode('utf-8'))
+  result = False
+finally:
+  shutil.rmtree(cur_temp_dir)
+  return result
+  else:
+return False
+
+
+if __name__ == "__main__":
+  cur_dir = os.getcwd()
+  if cur_dir.split('/')[-1] != 'beam':
+raise RuntimeError('This script should run from ~/beam directory.')
+  license_dir = os.getcwd() + '/licenses/python'
+  no_licenses = []
+
+  with open('licenses/scripts/dep_urls_py.yaml') as file:
+dep_config = yaml.load(file)
 
 Review comment:
   This line generates a warning: `calling yaml.load() without Loader=... is 
deprecated, as the default Loader is 

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=400492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400492
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:09
Start Date: 10/Mar/20 01:09
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#discussion_r390016591
 
 

 ##
 File path: licenses/scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,157 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""A script to pull licenses for Python.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import yaml
+
+from tenacity import retry
+from tenacity import stop_after_attempt
+
+
+def run_bash_command(command):
+  process = subprocess.Popen(command.split(),
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE)
+  result, error = process.communicate()
+  if error:
+raise RuntimeError('Error occurred when running a bash command.',
+   'command: ', command, 'error message: ',
+   error.decode('utf-8'))
+  return result.decode('utf-8')
+
+
+try:
+  import wget
+except:
+  command = 'pip install wget --no-cache-dir'
+  run_bash_command(command)
+  import wget
+
+
+def install_pip_licenses():
+  command = 'pip install pip-licenses --no-cache-dir'
+  run_bash_command(command)
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name']
+  dest_dir = '/'.join([license_dir, name.lower()])
+  try:
+if not os.path.isdir(dest_dir):
+  os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+return True
+  except Exception as e:
+print(e)
+return False
+
+
+@retry(stop=stop_after_attempt(3))
+def pull_from_url(dep, configs):
+  '''
+  :param dep: name of a dependency
+  :param configs: a dict from dep_urls_py.yaml
+  :return: boolean
+
+  It downloads files form urls to a temp directory first in order to avoid
+  to deal with any temp files. It helps keep clean final directory.
+  '''
+  if dep in configs.keys():
+config = configs[dep]
+dest_dir = '/'.join([license_dir, dep])
+cur_temp_dir = 'temp_license_' + dep
+os.mkdir(cur_temp_dir)
+try:
+  is_file_available = False
+  # license is required, but not all dependencies have license.
+  # In case we have to skip, print out a message.
+  if config['license'] != 'skip':
+wget.download(config['license'], cur_temp_dir + '/LICENSE')
+is_file_available = True
+  # notice is optional.
+  if 'notice' in config:
+wget.download(config['notice'], cur_temp_dir + '/NOTICE')
+is_file_available = True
+  # copy from temp dir to final dir only when either file is abailable.
+  if is_file_available:
+if os.path.isdir(dest_dir):
+  shutil.rmtree(dest_dir)
+shutil.copytree(cur_temp_dir, dest_dir)
+  result = True
+except Exception as e:
+  print('Error occurred when pull license from url.', 'dependency =',
+dep, 'url =', config, 'error = ', e.decode('utf-8'))
+  result = False
+finally:
+  shutil.rmtree(cur_temp_dir)
+  return result
+  else:
+return False
+
+
+if __name__ == "__main__":
+  cur_dir = os.getcwd()
+  if cur_dir.split('/')[-1] != 'beam':
+raise RuntimeError('This script should run from ~/beam directory.')
 
 Review comment:
   `~` - is a pointer to a home directory. Perhaps say 'from the root of the 
repository`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL 

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=400493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400493
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:09
Start Date: 10/Mar/20 01:09
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#discussion_r390020220
 
 

 ##
 File path: licenses/scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,157 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""A script to pull licenses for Python.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import yaml
+
+from tenacity import retry
+from tenacity import stop_after_attempt
+
+
+def run_bash_command(command):
+  process = subprocess.Popen(command.split(),
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE)
+  result, error = process.communicate()
+  if error:
+raise RuntimeError('Error occurred when running a bash command.',
+   'command: ', command, 'error message: ',
+   error.decode('utf-8'))
+  return result.decode('utf-8')
+
+
+try:
+  import wget
+except:
+  command = 'pip install wget --no-cache-dir'
+  run_bash_command(command)
+  import wget
+
+
+def install_pip_licenses():
+  command = 'pip install pip-licenses --no-cache-dir'
+  run_bash_command(command)
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name']
+  dest_dir = '/'.join([license_dir, name.lower()])
+  try:
+if not os.path.isdir(dest_dir):
+  os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+return True
+  except Exception as e:
+print(e)
+return False
+
+
+@retry(stop=stop_after_attempt(3))
+def pull_from_url(dep, configs):
+  '''
+  :param dep: name of a dependency
+  :param configs: a dict from dep_urls_py.yaml
+  :return: boolean
+
+  It downloads files form urls to a temp directory first in order to avoid
+  to deal with any temp files. It helps keep clean final directory.
+  '''
+  if dep in configs.keys():
+config = configs[dep]
+dest_dir = '/'.join([license_dir, dep])
+cur_temp_dir = 'temp_license_' + dep
+os.mkdir(cur_temp_dir)
+try:
+  is_file_available = False
+  # license is required, but not all dependencies have license.
+  # In case we have to skip, print out a message.
+  if config['license'] != 'skip':
+wget.download(config['license'], cur_temp_dir + '/LICENSE')
+is_file_available = True
+  # notice is optional.
+  if 'notice' in config:
+wget.download(config['notice'], cur_temp_dir + '/NOTICE')
+is_file_available = True
+  # copy from temp dir to final dir only when either file is abailable.
+  if is_file_available:
+if os.path.isdir(dest_dir):
+  shutil.rmtree(dest_dir)
+shutil.copytree(cur_temp_dir, dest_dir)
+  result = True
+except Exception as e:
+  print('Error occurred when pull license from url.', 'dependency =',
+dep, 'url =', config, 'error = ', e.decode('utf-8'))
+  result = False
+finally:
+  shutil.rmtree(cur_temp_dir)
+  return result
+  else:
+return False
+
+
+if __name__ == "__main__":
+  cur_dir = os.getcwd()
+  if cur_dir.split('/')[-1] != 'beam':
+raise RuntimeError('This script should run from ~/beam directory.')
+  license_dir = os.getcwd() + '/licenses/python'
+  no_licenses = []
+
+  with open('licenses/scripts/dep_urls_py.yaml') as file:
+dep_config = yaml.load(file)
+
+  install_pip_licenses()
 
 Review comment:
   besides pip-licenses, this script needs pyyaml, tenacity, wget. Maybe add a 
requirements 

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=400495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400495
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:09
Start Date: 10/Mar/20 01:09
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#discussion_r390022663
 
 

 ##
 File path: licenses/scripts/dep_urls_py.yaml
 ##
 @@ -0,0 +1,106 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+pip_dependencies:
+  # template:
+  # dependency_name:
+  #   license: url_to_license or "skip"
+  #   (optional)notice: url_to_notice
+  apache-beam:
+license: "skip" # don't include self dependency.
+  apipkg:
+license: 
"https://raw.githubusercontent.com/pytest-dev/apipkg/master/LICENSE;
+  atomicwrites:
+license: 
"https://raw.githubusercontent.com/untitaker/python-atomicwrites/master/LICENSE;
+  avro-python3:
+license: "https://raw.githubusercontent.com/apache/avro/master/LICENSE.txt;
+notice: "https://raw.githubusercontent.com/apache/avro/master/NOTICE.txt;
+  beautifulsoup4:
+license: "skip" # it is manually added.
+  chardet:
+license: "https://raw.githubusercontent.com/chardet/chardet/master/LICENSE;
+  fastavro:
+license: 
"https://raw.githubusercontent.com/fastavro/fastavro/master/LICENSE;
+notice: 
"https://raw.githubusercontent.com/fastavro/fastavro/master/NOTICE.txt;
+  freezegun:
+license: 
"https://raw.githubusercontent.com/spulec/freezegun/master/LICENSE;
+  google-apitools:
+license: "https://raw.githubusercontent.com/google/apitools/master/LICENSE;
+  grpcio:
+license: "https://raw.githubusercontent.com/grpc/grpc/master/LICENSE;
+notice: "https://raw.githubusercontent.com/grpc/grpc/master/NOTICE.txt;
+  grpcio-gcp:
+license: 
"https://raw.githubusercontent.com/GoogleCloudPlatform/grpc-gcp-python/master/LICENSE;
+  hdfs:
+license: "https://raw.githubusercontent.com/mtth/hdfs/master/LICENSE;
+  httplib2:
+license: 
"https://raw.githubusercontent.com/httplib2/httplib2/master/LICENSE;
+  mock:
+license: 
"https://raw.githubusercontent.com/testing-cabal/mock/master/LICENSE.txt;
+  monotonic:
+license: "https://raw.githubusercontent.com/atdt/monotonic/master/LICENSE;
+  mypy-protobuf:
+license: 
"https://raw.githubusercontent.com/dropbox/mypy-protobuf/master/LICENSE;
+  nose:
+license: 
"https://raw.githubusercontent.com/nose-devs/nose2/master/license.txt;
+  nose-xunitmp:
+license: "skip" # no license available.
+  numpy:
+license: "https://raw.githubusercontent.com/numpy/numpy/master/LICENSE.txt;
+  oauth2client:
+license: 
"https://raw.githubusercontent.com/googleapis/oauth2client/master/LICENSE;
+  pandas:
+license: 
"https://raw.githubusercontent.com/pandas-dev/pandas/master/LICENSE;
+  parameterized:
+license: 
"https://raw.githubusercontent.com/wolever/parameterized/master/LICENSE.txt;
+  pathlib2:
+license: 
"https://raw.githubusercontent.com/mcmtroffaes/pathlib2/develop/LICENSE.rst;
+  pkg-resources:
+license: "https://raw.githubusercontent.com/pypa/setuptools/master/LICENSE;
+  protobuf:
+license: 
"https://raw.githubusercontent.com/protocolbuffers/protobuf/master/LICENSE;
+  pyarrow:
+license: 
"https://raw.githubusercontent.com/apache/arrow/master/LICENSE.txt;
+notice: "https://raw.githubusercontent.com/apache/arrow/master/NOTICE.txt;
+  pyhamcrest:
+license: 
"https://raw.githubusercontent.com/hamcrest/PyHamcrest/master/LICENSE.txt;
+  pymongo:
+license: 
"https://raw.githubusercontent.com/mongodb/mongo-python-driver/master/LICENSE;
+  wget:
+license: "https://raw.githubusercontent.com/mirror/wget/master/COPYING;
+  yapf:
+license: "https://raw.githubusercontent.com/google/yapf/master/LICENSE;
+
+docker_dependencies:
 
 Review comment:
   Can you please add a comment here how this list is compiled?
   
 

This is an automated message from 

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=400497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400497
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:09
Start Date: 10/Mar/20 01:09
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#discussion_r390042510
 
 

 ##
 File path: licenses/scripts/pull_licenses_py.py
 ##
 @@ -0,0 +1,157 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""A script to pull licenses for Python.
+"""
+import json
+import os
+import shutil
+import subprocess
+import sys
+import yaml
+
+from tenacity import retry
+from tenacity import stop_after_attempt
+
+
+def run_bash_command(command):
+  process = subprocess.Popen(command.split(),
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE)
+  result, error = process.communicate()
+  if error:
+raise RuntimeError('Error occurred when running a bash command.',
+   'command: ', command, 'error message: ',
+   error.decode('utf-8'))
+  return result.decode('utf-8')
+
+
+try:
+  import wget
+except:
+  command = 'pip install wget --no-cache-dir'
+  run_bash_command(command)
+  import wget
+
+
+def install_pip_licenses():
+  command = 'pip install pip-licenses --no-cache-dir'
+  run_bash_command(command)
+
+
+def run_pip_licenses():
+  command = 'pip-licenses --with-license-file --format=json'
+  dependencies = run_bash_command(command)
+  return json.loads(dependencies)
+
+
+@retry(stop=stop_after_attempt(3))
+def copy_license_files(dep):
+  source_license_file = dep['LicenseFile']
+  if source_license_file.lower() == 'unknown':
+return False
+  name = dep['Name']
+  dest_dir = '/'.join([license_dir, name.lower()])
+  try:
+if not os.path.isdir(dest_dir):
+  os.mkdir(dest_dir)
+shutil.copy(source_license_file, dest_dir + '/LICENSE')
+return True
+  except Exception as e:
+print(e)
+return False
+
+
+@retry(stop=stop_after_attempt(3))
+def pull_from_url(dep, configs):
+  '''
+  :param dep: name of a dependency
+  :param configs: a dict from dep_urls_py.yaml
+  :return: boolean
+
+  It downloads files form urls to a temp directory first in order to avoid
+  to deal with any temp files. It helps keep clean final directory.
+  '''
+  if dep in configs.keys():
+config = configs[dep]
+dest_dir = '/'.join([license_dir, dep])
+cur_temp_dir = 'temp_license_' + dep
+os.mkdir(cur_temp_dir)
+try:
+  is_file_available = False
+  # license is required, but not all dependencies have license.
+  # In case we have to skip, print out a message.
+  if config['license'] != 'skip':
+wget.download(config['license'], cur_temp_dir + '/LICENSE')
+is_file_available = True
+  # notice is optional.
+  if 'notice' in config:
+wget.download(config['notice'], cur_temp_dir + '/NOTICE')
+is_file_available = True
+  # copy from temp dir to final dir only when either file is abailable.
+  if is_file_available:
+if os.path.isdir(dest_dir):
+  shutil.rmtree(dest_dir)
+shutil.copytree(cur_temp_dir, dest_dir)
+  result = True
+except Exception as e:
+  print('Error occurred when pull license from url.', 'dependency =',
+dep, 'url =', config, 'error = ', e.decode('utf-8'))
+  result = False
+finally:
+  shutil.rmtree(cur_temp_dir)
+  return result
+  else:
+return False
+
+
+if __name__ == "__main__":
+  cur_dir = os.getcwd()
+  if cur_dir.split('/')[-1] != 'beam':
+raise RuntimeError('This script should run from ~/beam directory.')
+  license_dir = os.getcwd() + '/licenses/python'
+  no_licenses = []
+
+  with open('licenses/scripts/dep_urls_py.yaml') as file:
+dep_config = yaml.load(file)
+
+  install_pip_licenses()
+  dependencies = run_pip_licenses()
+  # add licenses for pip installed packages.
+  # try to pull licenses 

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=400494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400494
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 10/Mar/20 01:09
Start Date: 10/Mar/20 01:09
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11067: 
[BEAM-9136]Add licenses for dependencies
URL: https://github.com/apache/beam/pull/11067#discussion_r390022574
 
 

 ##
 File path: licenses/scripts/dep_urls_py.yaml
 ##
 @@ -0,0 +1,106 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+pip_dependencies:
 
 Review comment:
   Can you please add a comment here how this list is compiled? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400494)
Time Spent: 1h 10m  (was: 1h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9056) Staging artifacts from environment

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=400484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400484
 ]

ASF GitHub Bot logged work on BEAM-9056:


Author: ASF GitHub Bot
Created on: 10/Mar/20 00:59
Start Date: 10/Mar/20 00:59
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10621: [BEAM-9056] 
Staging artifacts from environment
URL: https://github.com/apache/beam/pull/10621#issuecomment-596848973
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400484)
Time Spent: 7.5h  (was: 7h 20m)

> Staging artifacts from environment
> --
>
> Key: BEAM-9056
> URL: https://issues.apache.org/jira/browse/BEAM-9056
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> staging artifacts from artifact information embedded in environment proto.
> detail: 
> https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9465) Reshuffle should trigger repeatedly

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9465?focusedWorklogId=400474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400474
 ]

ASF GitHub Bot logged work on BEAM-9465:


Author: ASF GitHub Bot
Created on: 10/Mar/20 00:44
Start Date: 10/Mar/20 00:44
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #11071: [BEAM-9465] Fire 
repeatedly in reshuffle
URL: https://github.com/apache/beam/pull/11071#issuecomment-596845610
 
 
   Failure seems to be unrelated:
   AssertionError: "Failed assert" does not match "Unexpected type for Value 
message."
   Stacktrace
   self = 

   
   def test_assert_that(self):
 # TODO: figure out a way for fn_api_runner to parse and raise the
 # underlying exception.
 with self.assertRaisesRegex(Exception, 'Failed assert'):
   with self.create_pipeline() as p:
   > assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
   E AssertionError: "Failed assert" does not match "Unexpected type 
for Value message."
   
   apache_beam/runners/portability/fn_api_runner_test.py:105: AssertionError
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400474)
Time Spent: 1h 40m  (was: 1.5h)

> Reshuffle should trigger repeatedly
> ---
>
> Key: BEAM-9465
> URL: https://issues.apache.org/jira/browse/BEAM-9465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py#L516|https://www.google.com/url?q=https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py%23L516=D]
> should fire repeatedly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9478) Update samza runner page to reflect new changes

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9478?focusedWorklogId=400469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400469
 ]

ASF GitHub Bot logged work on BEAM-9478:


Author: ASF GitHub Bot
Created on: 10/Mar/20 00:17
Start Date: 10/Mar/20 00:17
Worklog Time Spent: 10m 
  Work Description: lhaiesp commented on pull request #11089: [BEAM-9478] 
Update samza runner page to reflect post 1.0 changes
URL: https://github.com/apache/beam/pull/11089
 
 
   Update samza runner page to reflect post 1.0 changes including new pipeline 
options and github examples
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work started] (BEAM-9478) Update samza runner page to reflect new changes

2020-03-09 Thread Hai Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9478 started by Hai Lu.

> Update samza runner page to reflect new changes
> ---
>
> Key: BEAM-9478
> URL: https://issues.apache.org/jira/browse/BEAM-9478
> Project: Beam
>  Issue Type: Bug
>  Components: runner-samza, website
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9478) Update samza runner page to reflect new changes

2020-03-09 Thread Hai Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hai Lu updated BEAM-9478:
-
Status: Open  (was: Triage Needed)

> Update samza runner page to reflect new changes
> ---
>
> Key: BEAM-9478
> URL: https://issues.apache.org/jira/browse/BEAM-9478
> Project: Beam
>  Issue Type: Bug
>  Components: runner-samza, website
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9478) Update samza runner page to reflect new changes

2020-03-09 Thread Hai Lu (Jira)
Hai Lu created BEAM-9478:


 Summary: Update samza runner page to reflect new changes
 Key: BEAM-9478
 URL: https://issues.apache.org/jira/browse/BEAM-9478
 Project: Beam
  Issue Type: Bug
  Components: runner-samza, website
Reporter: Hai Lu
Assignee: Hai Lu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400461
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:55
Start Date: 09/Mar/20 23:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-596833856
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400461)
Time Spent: 104h 10m  (was: 104h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 104h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400458
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:45
Start Date: 09/Mar/20 23:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-596831495
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400458)
Time Spent: 104h  (was: 103h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 104h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400456
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:38
Start Date: 09/Mar/20 23:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-596829649
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400456)
Time Spent: 103h 50m  (was: 103h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 103h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400454
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:29
Start Date: 09/Mar/20 23:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-596827343
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400454)
Time Spent: 103h 40m  (was: 103.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 103h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400450
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:21
Start Date: 09/Mar/20 23:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-596825273
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400450)
Time Spent: 103h 20m  (was: 103h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 103h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400451
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:21
Start Date: 09/Mar/20 23:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-596825310
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400451)
Time Spent: 103.5h  (was: 103h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 103.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9477) RowCoder should be hashable and picklable

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9477?focusedWorklogId=400446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400446
 ]

ASF GitHub Bot logged work on BEAM-9477:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:18
Start Date: 09/Mar/20 23:18
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #11088: [BEAM-9477] 
RowCoder should be hashable and picklable
URL: https://github.com/apache/beam/pull/11088#issuecomment-596824360
 
 
   R: @udim are you a good person to review this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400446)
Time Spent: 20m  (was: 10m)

> RowCoder should be hashable and picklable
> -
>
> Key: BEAM-9477
> URL: https://issues.apache.org/jira/browse/BEAM-9477
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> hashable - because it needs to be placed in PipelineContexts' obj_to_id map
> picklable - sometimes coders get pickled (e.g. when the type they encode is 
> used in beam.Create)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400444
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:16
Start Date: 09/Mar/20 23:16
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #11084: [BEAM-9474] Improve 
robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#issuecomment-596823759
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400444)
Time Spent: 2h 40m  (was: 2.5h)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400442
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:15
Start Date: 09/Mar/20 23:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11005: [BEAM-8335] 
Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r390011895
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -214,11 +214,31 @@ class _TestStreamRootBundleProvider(RootBundleProvider):
   """
   def get_root_bundles(self):
 test_stream = self._applied_ptransform.transform
+
+# The TestStream specification allows for multiple TestStreams in the same
 
 Review comment:
   I'm happy to see this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400442)
Time Spent: 103h  (was: 102h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 103h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400443
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:15
Start Date: 09/Mar/20 23:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-596823393
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400443)
Time Spent: 103h 10m  (was: 103h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 103h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400441
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:14
Start Date: 09/Mar/20 23:14
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-596823224
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400441)
Time Spent: 102h 50m  (was: 102h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 102h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400440
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:13
Start Date: 09/Mar/20 23:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r390011274
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -166,11 +168,20 @@ public static DefaultJobBundleFactory create(
 CacheBuilder.newBuilder()
 .removalListener(
 (RemovalNotification 
notification) -> {
-  int refCount = notification.getValue().unref();
-  LOG.debug(
-  "Removed environment {} with {} remaining bundle 
references.",
-  notification.getKey(),
-  refCount);
+  WrappedSdkHarnessClient client = notification.getValue();
 
 Review comment:
   I've removed the reference counting, PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400440)
Time Spent: 2.5h  (was: 2h 20m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9477) RowCoder should be hashable and picklable

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9477?focusedWorklogId=400435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400435
 ]

ASF GitHub Bot logged work on BEAM-9477:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:10
Start Date: 09/Mar/20 23:10
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11088: 
[BEAM-9477] RowCoder should be hashable and picklable
URL: https://github.com/apache/beam/pull/11088
 
 
   Defines `__hash__` and `__reduce__` for `RowCoder`. Also adds some 
(previously failing) tests of these functions.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Work logged] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9434?focusedWorklogId=400436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400436
 ]

ASF GitHub Bot logged work on BEAM-9434:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:10
Start Date: 09/Mar/20 23:10
Worklog Time Spent: 10m 
  Work Description: ecapoccia commented on issue #11037: [BEAM-9434] 
performance improvements reading many Avro files in S3
URL: https://github.com/apache/beam/pull/11037#issuecomment-596822153
 
 
   Unfortunately the problem happens for me, that is why this work started. 
Let's see if we can understand the root cause for it.
   
   > Reshuffle should ensure that there is a repartition between MatchAll and 
ReadMatches, is it missing (it is difficult to tell from your screenshots)? If 
it isn't missing, they why is the following stage only executing on a single 
machine (since repartition shouldn't be restricting output to only a single 
machine)
   
   It's clearly not missing as in the base case I'm using 
withHintMatchesManyFiles().
   Still what happens is that the entire reading is on one machine (see second 
last screenshot "summary metrics for 2 completed tasks"). The impression I have 
is that when the physical plan is created, there is only one task detected that 
is bound to do the entire reading on one executor. Consider that, I am doing 
something really plain, just reading from two buckets, joining the records and 
writing them back to S3. Did you try this yourself to see if you can reproduce 
the issue?
   
   I had a look at the code of Reshuffle.expand() and Reshuffle.ViaRandomKey, 
but I have some doubts on what is the expected behaviour in terms of machines / 
partitions.
   
   How many different partitions shall Reshuffle create? Will there be 1 task 
per partition? and how are the tasks ultimately assigned to the executors?
   Maybe you can help me understand the above / point me to the relevant 
documentation. That should hopefully help me troubleshoot this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400436)
Time Spent: 2.5h  (was: 2h 20m)

> Performance improvements processing a large number of Avro files in S3+Spark
> 
>
> Key: BEAM-9434
> URL: https://issues.apache.org/jira/browse/BEAM-9434
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Emiliano Capoccia
>Assignee: Emiliano Capoccia
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There is a performance issue when processing a large number of small Avro 
> files in Spark on K8S (tens of thousands or more).
> The recommended way of reading a pattern of Avro files in Beam is by means of:
>  
> {code:java}
> PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
> .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
> {code}
> However, in the case of many small files, the above results in the entire 
> reading taking place in a single task/node, which is considerably slow and 
> has scalability issues.
> The option of omitting the hint is not viable, as it results in too many 
> tasks being spawn, and the cluster being busy doing coordination of tiny 
> tasks with high overhead.
> There are a few workarounds on the internet which mainly revolve around 
> compacting the input files before processing, so that a reduced number of 
> bulky files is processed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9477) RowCoder should be hashable and picklable

2020-03-09 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-9477:
---

 Summary: RowCoder should be hashable and picklable
 Key: BEAM-9477
 URL: https://issues.apache.org/jira/browse/BEAM-9477
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: 2.21.0


hashable - because it needs to be placed in PipelineContexts' obj_to_id map

picklable - sometimes coders get pickled (e.g. when the type they encode is 
used in beam.Create)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9465) Reshuffle should trigger repeatedly

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9465?focusedWorklogId=400433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400433
 ]

ASF GitHub Bot logged work on BEAM-9465:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:07
Start Date: 09/Mar/20 23:07
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #11071: [BEAM-9465] Fire 
repeatedly in reshuffle
URL: https://github.com/apache/beam/pull/11071#issuecomment-596821140
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400433)
Time Spent: 1.5h  (was: 1h 20m)

> Reshuffle should trigger repeatedly
> ---
>
> Key: BEAM-9465
> URL: https://issues.apache.org/jira/browse/BEAM-9465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py#L516|https://www.google.com/url?q=https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py%23L516=D]
> should fire repeatedly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400432
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 23:04
Start Date: 09/Mar/20 23:04
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #11005: 
[BEAM-8335] Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r390008544
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -19,15 +19,298 @@
 
 from __future__ import absolute_import
 
+import itertools
+import os
+import shutil
+import tempfile
+import time
+
+import apache_beam as beam
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileRecord
 from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.cache_manager import CacheManager
+from apache_beam.runners.interactive.cache_manager import 
SafeFastPrimitivesCoder
+from apache_beam.testing.test_stream import ReverseTestStream
 from apache_beam.utils import timestamp
 
 
-class StreamingCache(object):
+class StreamingCacheSink(beam.PTransform):
+  """A PTransform that writes TestStreamFile(Header|Records)s to file.
+
+  This transform takes in an arbitrary element stream and writes the 
best-effort
+  list of TestStream events (as TestStreamFileRecords) to file.
+
+  Note that this PTransform is assumed to be only run on a single machine where
+  the following assumptions are correct: elements come in ordered, no two
+  transforms are writing to the same file. This PTransform is assumed to only
+  run correctly with the DirectRunner.
+  """
+  def __init__(
+  self,
+  cache_dir,
+  filename,
+  sample_resolution_sec,
+  coder=SafeFastPrimitivesCoder()):
+self._cache_dir = cache_dir
+self._filename = filename
+self._sample_resolution_sec = sample_resolution_sec
+self._coder = coder
+self._path = os.path.join(self._cache_dir, self._filename)
+
+  @property
+  def path(self):
+"""Returns the path the sink leads to."""
+return self._path
+
+  def expand(self, pcoll):
+class StreamingWriteToText(beam.DoFn):
+  """DoFn that performs the writing.
+
+  Note that the other file writing methods cannot be used in streaming
+  contexts.
+  """
+  def __init__(self, full_path, coder=SafeFastPrimitivesCoder()):
+self._full_path = full_path
+self._coder = coder
+
+# Try and make the given path.
+os.makedirs(os.path.dirname(full_path), exist_ok=True)
+
+  def start_bundle(self):
+# Open the file for 'append-mode' and writing 'bytes'.
+self._fh = open(self._full_path, 'ab')
+
+  def finish_bundle(self):
+self._fh.close()
+
+  def process(self, e):
+"""Appends the given element to the file.
+"""
+self._fh.write(self._coder.encode(e))
+self._fh.write(b'\n')
+
+return (
+pcoll
+| ReverseTestStream(
+output_tag=self._filename,
+sample_resolution_sec=self._sample_resolution_sec,
+output_format=ReverseTestStream.Format.
+SERIALIZED_TEST_STREAM_FILE_RECORDS,
+coder=self._coder)
+| beam.ParDo(
+StreamingWriteToText(full_path=self._path, coder=self._coder)))
+
+
+class StreamingCacheSource:
+  """A class that reads and parses TestStreamFile(Header|Reader)s.
+
+  This source operates in the following way:
+
+1. Wait for up to `timeout_secs` for the file to be available.
+2. Read, parse, and emit the entire contents of the file
+3. Wait for more events to come or until `is_cache_complete` returns True
+4. If there are more events, then go to 2
+5. Otherwise, stop emitting.
+
+  This class is used to read from file and send its to the TestStream via the
+  StreamingCacheManager.Reader.
+  """
+  def __init__(
+  self,
+  cache_dir,
+  labels,
+  is_cache_complete=None,
+  coder=SafeFastPrimitivesCoder()):
+self._cache_dir = cache_dir
+self._coder = coder
+self._labels = labels
+self._is_cache_complete = (
+is_cache_complete if is_cache_complete else lambda: True)
+
+  def _wait_until_file_exists(self, timeout_secs=30):
+"""Blocks until the file exists for a maximum of timeout_secs.
+"""
+f = None
+now_secs = time.time()
+timeout_timestamp_secs = now_secs + timeout_secs
+
+# Wait for up to `timeout_secs` for the file to be available.
+while f is None and now_secs < timeout_timestamp_secs:
+  now_secs = time.time()
+  try:
+path = 

[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=400428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400428
 ]

ASF GitHub Bot logged work on BEAM-9442:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:53
Start Date: 09/Mar/20 22:53
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11046: [BEAM-9442] 
Properly handle nullable fields in Select
URL: https://github.com/apache/beam/pull/11046#issuecomment-596817172
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400428)
Time Spent: 1h 50m  (was: 1h 40m)

> Schema Select does not properly handle nested nullable fields
> -
>
> Key: BEAM-9442
> URL: https://issues.apache.org/jira/browse/BEAM-9442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A select of a nested field should be nullable if any of its parents are 
> nullable. So for example, a select of "a.b" should return a field named b 
> that is nullable if _either_ of a or b is nullable. Today we only examine b 
> to see if the selected fields should be nullable.
> Also the Select transform itself does not properly check for null values, and 
> throws NullPointerExceptions when some row values are null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400426
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:49
Start Date: 09/Mar/20 22:49
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r390003729
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -166,11 +168,20 @@ public static DefaultJobBundleFactory create(
 CacheBuilder.newBuilder()
 .removalListener(
 (RemovalNotification 
notification) -> {
-  int refCount = notification.getValue().unref();
-  LOG.debug(
-  "Removed environment {} with {} remaining bundle 
references.",
-  notification.getKey(),
-  refCount);
+  WrappedSdkHarnessClient client = notification.getValue();
 
 Review comment:
   It doesn't work though if we do not ensure dereferencing under all 
circumstances. We need a safeguard here, also considering other runners may not 
dereference correctly. Generally, it is hard to guarantee dereferencing due to 
the nesting of DoFnRunners which may not even allow closing the bundle in error 
cases. I considered not doing this but I think it is the safer route.
   
   If you take a step back, when would the reference counting really be useful? 
Every restarted job will anyways run in a new classloader, so the environment 
will never be recycled. When we call close we should tear down everything. 
   
   Taking back another step, the reference counting should really be removed 
entirely. It was error prone from the beginning leading to subtle problems with 
dereferencing. If you don't mind, I'd remove it. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400426)
Time Spent: 2h 20m  (was: 2h 10m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400422
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:36
Start Date: 09/Mar/20 22:36
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r38133
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -252,14 +263,47 @@ public StageBundleFactory forStage(ExecutableStage 
executableStage) {
   }
 
   @Override
-  public void close() throws Exception {
-// Clear the cache. This closes all active environments.
-// note this may cause open calls to be cancelled by the peer
-for (LoadingCache environmentCache : 
environmentCaches) {
-  environmentCache.invalidateAll();
-  environmentCache.cleanUp();
+  public synchronized void close() throws Exception {
+if (closed) {
+  return;
+}
+Exception exception = null;
 
 Review comment:
   Good point.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400422)
Time Spent: 2h 10m  (was: 2h)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400421
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:36
Start Date: 09/Mar/20 22:36
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r38107
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
 ##
 @@ -234,9 +231,7 @@ public void run() {
   // Graceful shutdown period
   Thread.sleep(200);
   break;
-} catch (InterruptedException e) {
-  Thread.currentThread().interrupt();
-  throw new RuntimeException(e);
+} catch (InterruptedException ignored) {
 
 Review comment:
   Will do. Ignoring it because there is not point to throw it here, we want to 
shutdown the processes after the grace period expired here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400421)
Time Spent: 2h  (was: 1h 50m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=400419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400419
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:35
Start Date: 09/Mar/20 22:35
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [BEAM-8382] Add rate 
limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-596811500
 
 
   Sure, I will take a look.
   
   > On Mar 9, 2020, at 9:29 AM, Alexey Romanenko  
wrote:
   > 
   > 
   > Hmm, there is some flaky test in KinesisIO caused by 
org.apache.beam.sdk.io.kinesis.ShardReadersPoolTest.shouldCallRateLimitPolicy 
introduced in this PR. Could you take a look please on this issue?
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400419)
Time Spent: 15h 40m  (was: 15.5h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400420
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:35
Start Date: 09/Mar/20 22:35
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389998961
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
 ##
 @@ -186,31 +186,28 @@ private void stopProcess(String id, Process process) {
   LOG.debug("Attempting to stop process with id {}", id);
   // first try to kill gracefully
   process.destroy();
-  long maxTimeToWait = 2000;
-  if (waitForProcessToDie(process, maxTimeToWait)) {
-LOG.debug("Process for worker {} shut down gracefully.", id);
-  } else {
-LOG.info("Process for worker {} still running. Killing.", id);
-process.destroyForcibly();
+  long maxTimeToWait = 500;
 
 Review comment:
   The old timeout was too long.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400420)
Time Spent: 1h 50m  (was: 1h 40m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400418
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:34
Start Date: 09/Mar/20 22:34
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389998590
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -166,11 +168,20 @@ public static DefaultJobBundleFactory create(
 CacheBuilder.newBuilder()
 .removalListener(
 (RemovalNotification 
notification) -> {
-  int refCount = notification.getValue().unref();
-  LOG.debug(
-  "Removed environment {} with {} remaining bundle 
references.",
-  notification.getKey(),
-  refCount);
+  WrappedSdkHarnessClient client = notification.getValue();
+  int refCount = client.unref();
+  // Double-check to trigger closing of all environments in 
case the "refing" does
+  // not clean them up during operator shutdown. This is 
necessary in some
+  // situations, e.g when the bundle cannot be closed and thus 
the ref cannot be
+  // released. All environment types ensure they can only be 
closed once.
+  if (refCount > 0) {
+LOG.warn(
+"Clearing remaining {} bundle references from 
environment {} to ensure it shuts down.",
+refCount,
+notification.getKey());
+//noinspection StatementWithEmptyBody
 
 Review comment:
   Just to inform when there were still references. The comment is to suppress 
a warning on the while loop which counts down the refs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400418)
Time Spent: 1h 40m  (was: 1.5h)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400417
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:34
Start Date: 09/Mar/20 22:34
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389998500
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -252,14 +263,47 @@ public StageBundleFactory forStage(ExecutableStage 
executableStage) {
   }
 
   @Override
-  public void close() throws Exception {
-// Clear the cache. This closes all active environments.
-// note this may cause open calls to be cancelled by the peer
-for (LoadingCache environmentCache : 
environmentCaches) {
-  environmentCache.invalidateAll();
-  environmentCache.cleanUp();
+  public synchronized void close() throws Exception {
+if (closed) {
+  return;
+}
+Exception exception = null;
+try {
+  for (LoadingCache environmentCache 
:
+  environmentCaches) {
+try {
+  // Clear the cache. This closes all active environments.
+  // note this may cause open calls to be cancelled by the peer
+  environmentCache.invalidateAll();
+  environmentCache.cleanUp();
+} catch (Exception e) {
+  if (exception != null) {
+exception.addSuppressed(e);
+  } else {
+exception = e;
+  }
+}
+  }
+  try {
+executor.shutdown();
+  } catch (Exception e) {
+if (exception != null) {
+  exception.addSuppressed(e);
+} else {
+  exception = e;
+}
+  }
+} catch (Exception e) {
 
 Review comment:
   Not strictly, but the iterator could throw. I'm trying to be extra defensive 
here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400417)
Time Spent: 1.5h  (was: 1h 20m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400416
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:34
Start Date: 09/Mar/20 22:34
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389998456
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironment.java
 ##
 @@ -64,13 +63,32 @@ public InstructionRequestHandler 
getInstructionRequestHandler() {
   }
 
   @Override
-  public void close() throws Exception {
-synchronized (lock) {
-  if (!isClosed) {
-instructionHandler.close();
-processManager.stopProcess(workerId);
-isClosed = true;
+  public synchronized void close() throws Exception {
+if (isClosed) {
+  return;
+}
+Exception exception = null;
+try {
+  processManager.stopProcess(workerId);
+} catch (Exception e) {
+  if (exception != null) {
 
 Review comment:
   I intentionally didn't do that. If you move around the the try block to 
change the order, this will break. I'd prefer to be defensive.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400416)
Time Spent: 1h 20m  (was: 1h 10m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9056) Staging artifacts from environment

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=400415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400415
 ]

ASF GitHub Bot logged work on BEAM-9056:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:32
Start Date: 09/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10621: [BEAM-9056] 
Staging artifacts from environment
URL: https://github.com/apache/beam/pull/10621#issuecomment-596810759
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400415)
Time Spent: 7h 20m  (was: 7h 10m)

> Staging artifacts from environment
> --
>
> Key: BEAM-9056
> URL: https://issues.apache.org/jira/browse/BEAM-9056
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> staging artifacts from artifact information embedded in environment proto.
> detail: 
> https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9056) Staging artifacts from environment

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=400414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400414
 ]

ASF GitHub Bot logged work on BEAM-9056:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:32
Start Date: 09/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10621: [BEAM-9056] 
Staging artifacts from environment
URL: https://github.com/apache/beam/pull/10621#issuecomment-596810633
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400414)
Time Spent: 7h 10m  (was: 7h)

> Staging artifacts from environment
> --
>
> Key: BEAM-9056
> URL: https://issues.apache.org/jira/browse/BEAM-9056
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> staging artifacts from artifact information embedded in environment proto.
> detail: 
> https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9475) New style metrics in portability throw error

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9475?focusedWorklogId=400413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400413
 ]

ASF GitHub Bot logged work on BEAM-9475:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:31
Start Date: 09/Mar/20 22:31
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11085: [BEAM-9475] 
Fix typos and shore up expectations on type
URL: https://github.com/apache/beam/pull/11085
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400413)
Time Spent: 40m  (was: 0.5h)

> New style metrics in portability throw error
> 
>
> Key: BEAM-9475
> URL: https://issues.apache.org/jira/browse/BEAM-9475
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> for mi in op.monitoring_infos(transform_id).values():
>  File "apache_beam/runners/worker/operations.py", line 815, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  def monitoring_infos(self, transform_id):
>  File "apache_beam/runners/worker/operations.py", line 817, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  with self.lock:
>  File "apache_beam/runners/worker/operations.py", line 822, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  if current_element_progress.completed_work():
> TypeError: 'NoneType' object is not callable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=400412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400412
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:31
Start Date: 09/Mar/20 22:31
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11068: [BEAM-2939, 
BEAM-9458] Use deduplication transform for UnboundedSources wrapped as SDFs 
that need dedup
URL: https://github.com/apache/beam/pull/11068#issuecomment-596810543
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400412)
Time Spent: 25h  (was: 24h 50m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 25h
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9476) KinesisIO DescribeStream transient errors are not retried

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9476?focusedWorklogId=400402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400402
 ]

ASF GitHub Bot logged work on BEAM-9476:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:19
Start Date: 09/Mar/20 22:19
Worklog Time Spent: 10m 
  Work Description: ameihm0912 commented on issue #10973: [BEAM-9476] 
KinesisIO retry LimitExceededException
URL: https://github.com/apache/beam/pull/10973#issuecomment-596806642
 
 
   @aromanenko-dev updated to use `FluentBackoff` and references to Jira issue 
added
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400402)
Remaining Estimate: 0h
Time Spent: 10m

> KinesisIO DescribeStream transient errors are not retried
> -
>
> Key: BEAM-9476
> URL: https://issues.apache.org/jira/browse/BEAM-9476
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Reporter: Aaron Meihm
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During Kinesis stream setup {{DescribeStream}} is used in {{listShards}}. 
> This API call has quota limits that can become problematic when attempting to 
> configure multiple Kinesis streams in the same AWS account. AWS currently 
> limits this call by default to 10 times per second for a given account. With 
> multiple streams, this can be hit and results in a {{RuntimeException}} being 
> thrown immediately upon the first error.
> Ideally the rate limited call can be retried a number of times before giving 
> up instead of failing immediately.
> I have a work in progress PR that resolves this issue at 
> https://github.com/apache/beam/pull/10973.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400399
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:18
Start Date: 09/Mar/20 22:18
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389992620
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -252,14 +263,47 @@ public StageBundleFactory forStage(ExecutableStage 
executableStage) {
   }
 
   @Override
-  public void close() throws Exception {
-// Clear the cache. This closes all active environments.
-// note this may cause open calls to be cancelled by the peer
-for (LoadingCache environmentCache : 
environmentCaches) {
-  environmentCache.invalidateAll();
-  environmentCache.cleanUp();
+  public synchronized void close() throws Exception {
+if (closed) {
+  return;
+}
+Exception exception = null;
 
 Review comment:
   It may be good to add an explanation here for the following elaborate 
cleanup logic as otherwise there is the danger it will be "simplified" in the 
future.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400399)
Time Spent: 1h 10m  (was: 1h)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9379) Upgrade to Calcite 1.22.0

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9379?focusedWorklogId=400398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400398
 ]

ASF GitHub Bot logged work on BEAM-9379:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:14
Start Date: 09/Mar/20 22:14
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11058: [BEAM-9379] vendor 
calcite 1.22.0
URL: https://github.com/apache/beam/pull/11058#issuecomment-596804887
 
 
   Since shading/vendoring seems need more knowledge to do it right, I will 
also try to test this PR once I have some time this week.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400398)
Time Spent: 1h 10m  (was: 1h)

> Upgrade to Calcite 1.22.0
> -
>
> Key: BEAM-9379
> URL: https://issues.apache.org/jira/browse/BEAM-9379
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Upgrade to Calcite 1.22.0 after it gets released (expected by end of Feb 
> 2020).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400397
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:13
Start Date: 09/Mar/20 22:13
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389991139
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
 ##
 @@ -234,9 +231,7 @@ public void run() {
   // Graceful shutdown period
   Thread.sleep(200);
   break;
-} catch (InterruptedException e) {
-  Thread.currentThread().interrupt();
-  throw new RuntimeException(e);
+} catch (InterruptedException ignored) {
 
 Review comment:
   Add comment on why the exception is ignored.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400397)
Time Spent: 1h  (was: 50m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400396
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:11
Start Date: 09/Mar/20 22:11
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389990103
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessManager.java
 ##
 @@ -186,31 +186,28 @@ private void stopProcess(String id, Process process) {
   LOG.debug("Attempting to stop process with id {}", id);
   // first try to kill gracefully
   process.destroy();
-  long maxTimeToWait = 2000;
-  if (waitForProcessToDie(process, maxTimeToWait)) {
-LOG.debug("Process for worker {} shut down gracefully.", id);
-  } else {
-LOG.info("Process for worker {} still running. Killing.", id);
-process.destroyForcibly();
+  long maxTimeToWait = 500;
 
 Review comment:
   Why the timeout change?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400396)
Time Spent: 50m  (was: 40m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400390
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:08
Start Date: 09/Mar/20 22:08
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389980853
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/ProcessEnvironment.java
 ##
 @@ -64,13 +63,32 @@ public InstructionRequestHandler 
getInstructionRequestHandler() {
   }
 
   @Override
-  public void close() throws Exception {
-synchronized (lock) {
-  if (!isClosed) {
-instructionHandler.close();
-processManager.stopProcess(workerId);
-isClosed = true;
+  public synchronized void close() throws Exception {
+if (isClosed) {
+  return;
+}
+Exception exception = null;
+try {
+  processManager.stopProcess(workerId);
+} catch (Exception e) {
+  if (exception != null) {
 
 Review comment:
   Since `exception` is null, this should just be `exception = e`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400390)
Time Spent: 0.5h  (was: 20m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400391
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:08
Start Date: 09/Mar/20 22:08
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389984708
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -166,11 +168,20 @@ public static DefaultJobBundleFactory create(
 CacheBuilder.newBuilder()
 .removalListener(
 (RemovalNotification 
notification) -> {
-  int refCount = notification.getValue().unref();
-  LOG.debug(
-  "Removed environment {} with {} remaining bundle 
references.",
-  notification.getKey(),
-  refCount);
+  WrappedSdkHarnessClient client = notification.getValue();
+  int refCount = client.unref();
+  // Double-check to trigger closing of all environments in 
case the "refing" does
+  // not clean them up during operator shutdown. This is 
necessary in some
+  // situations, e.g when the bundle cannot be closed and thus 
the ref cannot be
+  // released. All environment types ensure they can only be 
closed once.
+  if (refCount > 0) {
+LOG.warn(
+"Clearing remaining {} bundle references from 
environment {} to ensure it shuts down.",
+refCount,
+notification.getKey());
+//noinspection StatementWithEmptyBody
 
 Review comment:
   Why is this needed (with the log statement above)? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400391)
Time Spent: 0.5h  (was: 20m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400393
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:08
Start Date: 09/Mar/20 22:08
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389988716
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -166,11 +168,20 @@ public static DefaultJobBundleFactory create(
 CacheBuilder.newBuilder()
 .removalListener(
 (RemovalNotification 
notification) -> {
-  int refCount = notification.getValue().unref();
-  LOG.debug(
-  "Removed environment {} with {} remaining bundle 
references.",
-  notification.getKey(),
-  refCount);
+  WrappedSdkHarnessClient client = notification.getValue();
 
 Review comment:
   Having this as part of the removal listener would prematurely close an 
environment that is still referenced. The purpose of the refcount is to be able 
to remove the environment from the cache when it expires but close it only 
after all references are gone. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400393)
Time Spent: 40m  (was: 0.5h)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400392
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 22:08
Start Date: 09/Mar/20 22:08
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#discussion_r389982978
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultJobBundleFactory.java
 ##
 @@ -252,14 +263,47 @@ public StageBundleFactory forStage(ExecutableStage 
executableStage) {
   }
 
   @Override
-  public void close() throws Exception {
-// Clear the cache. This closes all active environments.
-// note this may cause open calls to be cancelled by the peer
-for (LoadingCache environmentCache : 
environmentCaches) {
-  environmentCache.invalidateAll();
-  environmentCache.cleanUp();
+  public synchronized void close() throws Exception {
+if (closed) {
+  return;
+}
+Exception exception = null;
+try {
+  for (LoadingCache environmentCache 
:
+  environmentCaches) {
+try {
+  // Clear the cache. This closes all active environments.
+  // note this may cause open calls to be cancelled by the peer
+  environmentCache.invalidateAll();
+  environmentCache.cleanUp();
+} catch (Exception e) {
+  if (exception != null) {
+exception.addSuppressed(e);
+  } else {
+exception = e;
+  }
+}
+  }
+  try {
+executor.shutdown();
+  } catch (Exception e) {
+if (exception != null) {
+  exception.addSuppressed(e);
+} else {
+  exception = e;
+}
+  }
+} catch (Exception e) {
 
 Review comment:
   The outer try/catch isn't necessary since you already have it nested.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400392)
Time Spent: 40m  (was: 0.5h)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9476) KinesisIO DescribeStream transient errors are not retried

2020-03-09 Thread Aaron Meihm (Jira)
Aaron Meihm created BEAM-9476:
-

 Summary: KinesisIO DescribeStream transient errors are not retried
 Key: BEAM-9476
 URL: https://issues.apache.org/jira/browse/BEAM-9476
 Project: Beam
  Issue Type: Bug
  Components: io-java-kinesis
Reporter: Aaron Meihm


During Kinesis stream setup {{DescribeStream}} is used in {{listShards}}. This 
API call has quota limits that can become problematic when attempting to 
configure multiple Kinesis streams in the same AWS account. AWS currently 
limits this call by default to 10 times per second for a given account. With 
multiple streams, this can be hit and results in a {{RuntimeException}} being 
thrown immediately upon the first error.

Ideally the rate limited call can be retried a number of times before giving up 
instead of failing immediately.

I have a work in progress PR that resolves this issue at 
https://github.com/apache/beam/pull/10973.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=400386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400386
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 09/Mar/20 21:51
Start Date: 09/Mar/20 21:51
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400386)
Time Spent: 59h 50m  (was: 59h 40m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 59h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9475) New style metrics in portability throw error

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9475?focusedWorklogId=400371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400371
 ]

ASF GitHub Bot logged work on BEAM-9475:


Author: ASF GitHub Bot
Created on: 09/Mar/20 21:06
Start Date: 09/Mar/20 21:06
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11085: [BEAM-9475] Fix 
typos and shore up expectations on type
URL: https://github.com/apache/beam/pull/11085#issuecomment-596779086
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400371)
Time Spent: 0.5h  (was: 20m)

> New style metrics in portability throw error
> 
>
> Key: BEAM-9475
> URL: https://issues.apache.org/jira/browse/BEAM-9475
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> for mi in op.monitoring_infos(transform_id).values():
>  File "apache_beam/runners/worker/operations.py", line 815, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  def monitoring_infos(self, transform_id):
>  File "apache_beam/runners/worker/operations.py", line 817, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  with self.lock:
>  File "apache_beam/runners/worker/operations.py", line 822, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  if current_element_progress.completed_work():
> TypeError: 'NoneType' object is not callable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9475) New style metrics in portability throw error

2020-03-09 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9475:

Status: Open  (was: Triage Needed)

> New style metrics in portability throw error
> 
>
> Key: BEAM-9475
> URL: https://issues.apache.org/jira/browse/BEAM-9475
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> for mi in op.monitoring_infos(transform_id).values():
>  File "apache_beam/runners/worker/operations.py", line 815, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  def monitoring_infos(self, transform_id):
>  File "apache_beam/runners/worker/operations.py", line 817, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  with self.lock:
>  File "apache_beam/runners/worker/operations.py", line 822, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  if current_element_progress.completed_work():
> TypeError: 'NoneType' object is not callable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=400366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400366
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 09/Mar/20 20:54
Start Date: 09/Mar/20 20:54
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11068: [BEAM-2939, 
BEAM-9458] Use deduplication transform for UnboundedSources wrapped as SDFs 
that need dedup
URL: https://github.com/apache/beam/pull/11068#issuecomment-596774229
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400366)
Time Spent: 24h 50m  (was: 24h 40m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 24h 50m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-03-09 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055359#comment-17055359
 ] 

Udi Meiri commented on BEAM-8979:
-

I don't see an error message in your screenshot Boyuan.

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call ingen_protos.py:131
>  * Appending protoc-gen-mypy's directory to the PATH variable
> I wasn't able to reproduce this error locally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=400361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400361
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 09/Mar/20 20:43
Start Date: 09/Mar/20 20:43
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #11005: 
[BEAM-8335] Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r389950042
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -214,11 +214,31 @@ class _TestStreamRootBundleProvider(RootBundleProvider):
   """
   def get_root_bundles(self):
 test_stream = self._applied_ptransform.transform
+
+# The TestStream specification allows for multiple TestStreams in the same
+# pipeline (with only one controlling the clock). Here, we use an array in
+# the global EvaluationContext state to keep track of the iterator for each
+# event stream.
+idx = len(self._evaluation_context._test_stream_events)
+
+# If there was an endpoint defined then get the events from the
+# TestStreamService.
+if self._evaluation_context._test_stream_endpoint:
 
 Review comment:
   Gotcha, I'll set the global context once and move the state to a global on 
the _TestStreamEvaluator. That way the owner is strictly the 
_TestStreamEvaluator.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400361)
Time Spent: 102.5h  (was: 102h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 102.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9475) New style metrics in portability throw error

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9475?focusedWorklogId=400362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400362
 ]

ASF GitHub Bot logged work on BEAM-9475:


Author: ASF GitHub Bot
Created on: 09/Mar/20 20:43
Start Date: 09/Mar/20 20:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11085: [BEAM-9475] Fix 
typos and shore up expectations on type
URL: https://github.com/apache/beam/pull/11085#issuecomment-596769513
 
 
   R: @Ardagan @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400362)
Time Spent: 20m  (was: 10m)

> New style metrics in portability throw error
> 
>
> Key: BEAM-9475
> URL: https://issues.apache.org/jira/browse/BEAM-9475
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> for mi in op.monitoring_infos(transform_id).values():
>  File "apache_beam/runners/worker/operations.py", line 815, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  def monitoring_infos(self, transform_id):
>  File "apache_beam/runners/worker/operations.py", line 817, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  with self.lock:
>  File "apache_beam/runners/worker/operations.py", line 822, in 
> apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
>  if current_element_progress.completed_work():
> TypeError: 'NoneType' object is not callable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9475) New style metrics in portability throw error

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9475?focusedWorklogId=400360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400360
 ]

ASF GitHub Bot logged work on BEAM-9475:


Author: ASF GitHub Bot
Created on: 09/Mar/20 20:43
Start Date: 09/Mar/20 20:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11085: [BEAM-9475] 
Fix typos and shore up expectations on type
URL: https://github.com/apache/beam/pull/11085
 
 
   This fixes a regression caused in pr/10956 that prevented progress updates 
in portable SDFs
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400350
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 20:38
Start Date: 09/Mar/20 20:38
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #11084: [BEAM-9474] Improve 
robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084#issuecomment-596767134
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400350)
Time Spent: 20m  (was: 10m)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9473) Ensure that during vendoring we don't copy over META-INF signing/checksum/index files from other jars.

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9473?focusedWorklogId=400349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400349
 ]

ASF GitHub Bot logged work on BEAM-9473:


Author: ASF GitHub Bot
Created on: 09/Mar/20 20:34
Start Date: 09/Mar/20 20:34
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11082: [BEAM-9473] 
Dont copy over META-INF index/checksum/signing files during vendoring
URL: https://github.com/apache/beam/pull/11082
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400349)
Time Spent: 40m  (was: 0.5h)

> Ensure that during vendoring we don't copy over META-INF 
> signing/checksum/index files from other jars.
> --
>
> Key: BEAM-9473
> URL: https://issues.apache.org/jira/browse/BEAM-9473
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#The_META-INF_directory]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9473) Ensure that during vendoring we don't copy over META-INF signing/checksum/index files from other jars.

2020-03-09 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9473:

Fix Version/s: (was: 2.20.0)

> Ensure that during vendoring we don't copy over META-INF 
> signing/checksum/index files from other jars.
> --
>
> Key: BEAM-9473
> URL: https://issues.apache.org/jira/browse/BEAM-9473
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#The_META-INF_directory]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9473) Ensure that during vendoring we don't copy over META-INF signing/checksum/index files from other jars.

2020-03-09 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-9473.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

> Ensure that during vendoring we don't copy over META-INF 
> signing/checksum/index files from other jars.
> --
>
> Key: BEAM-9473
> URL: https://issues.apache.org/jira/browse/BEAM-9473
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#The_META-INF_directory]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?focusedWorklogId=400345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400345
 ]

ASF GitHub Bot logged work on BEAM-9474:


Author: ASF GitHub Bot
Created on: 09/Mar/20 20:19
Start Date: 09/Mar/20 20:19
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11084: [BEAM-9474] 
Improve robustness of BundleFactory and ProcessEnvironment
URL: https://github.com/apache/beam/pull/11084
 
 
   The cleanup code in DefaultJobBundleFactory and its RemoteEnvironments may 
leak
   resources. This is especially a concern when the execution engines reuses the
   same JVM or underlying machines for multiple runs of a pipeline.
   
   Exceptions encountered during cleanup should not lead to aborting the cleanup
   procedure. Not all code handles this correctly. We should also ensure that 
the
   cleanup succeeds even if the runner does not properly close the bundle,
   e.g. when a exception occurs during closing the bundle.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-9475) New style metrics in portability throw error

2020-03-09 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-9475:
---

 Summary: New style metrics in portability throw error
 Key: BEAM-9475
 URL: https://issues.apache.org/jira/browse/BEAM-9475
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Luke Cwik
Assignee: Luke Cwik
 Fix For: 2.20.0


for mi in op.monitoring_infos(transform_id).values():
 File "apache_beam/runners/worker/operations.py", line 815, in 
apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
 def monitoring_infos(self, transform_id):
 File "apache_beam/runners/worker/operations.py", line 817, in 
apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
 with self.lock:
 File "apache_beam/runners/worker/operations.py", line 822, in 
apache_beam.runners.worker.operations.SdfProcessSizedElements.monitoring_infos
 if current_element_progress.completed_work():
TypeError: 'NoneType' object is not callable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9474:
-
Description: The cleanup code in {{DefaultJobBundleFactory}} and its 
{{RemoteEnvironment}} s may leak resources. This is especially a concern when 
the execution engines reuses the same JVM or underlying machines for multiple 
runs of a pipeline.  (was: The cleanup code in {{DefaultJobBundleFactory}} and 
its {{RemoteEnvironment}}s may leak resources. This is especially a concern 
when the execution engines reuses the same JVM or underlying machines for 
multiple runs of a pipeline.)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>
> The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}} 
> s may leak resources. This is especially a concern when the execution engines 
> reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9474:
-
Status: Open  (was: Triage Needed)

> Environment cleanup is not robust enough and may leak resources
> ---
>
> Key: BEAM-9474
> URL: https://issues.apache.org/jira/browse/BEAM-9474
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>
> The cleanup code in {{DefaultJobBundleFactory}} and its 
> {{RemoteEnvironment}}s may leak resources. This is especially a concern when 
> the execution engines reuses the same JVM or underlying machines for multiple 
> runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=400338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400338
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 09/Mar/20 20:02
Start Date: 09/Mar/20 20:02
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #11080: [BEAM-2939] 
Mark Deduplicate as experimental
URL: https://github.com/apache/beam/pull/11080
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400338)
Time Spent: 24h 40m  (was: 24.5h)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 24h 40m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=400335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400335
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 09/Mar/20 19:56
Start Date: 09/Mar/20 19:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11020: [BEAM-7926] Update 
Data Visualization
URL: https://github.com/apache/beam/pull/11020#issuecomment-596749018
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400335)
Time Spent: 59h 40m  (was: 59.5h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 59h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5898) Beam Dependency Update Request: io.grpc:grpc-core

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5898?focusedWorklogId=400329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400329
 ]

ASF GitHub Bot logged work on BEAM-5898:


Author: ASF GitHub Bot
Created on: 09/Mar/20 19:38
Start Date: 09/Mar/20 19:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11063: [BEAM-5898] 
Upgrading gRPC to 1.27
URL: https://github.com/apache/beam/pull/11063#issuecomment-596741478
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400329)
Time Spent: 4h 20m  (was: 4h 10m)

> Beam Dependency Update Request: io.grpc:grpc-core
> -
>
> Key: BEAM-5898
> URL: https://issues.apache.org/jira/browse/BEAM-5898
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
>  - 2018-10-29 12:15:30.587081 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-05 12:13:13.811384 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-12 12:13:11.117333 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-19 12:13:49.472228 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-26 12:12:56.623629 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-03 12:13:19.520184 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.16.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-10 12:15:49.586565 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.17.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-17 12:16:08.426850 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. The latest version is 1.17.1 
> cc: [~chamikara], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-31 15:22:26.517526 
> -
> Please consider upgrading the dependency io.grpc:grpc-core. 
> The current version is 1.13.1. 

[jira] [Work logged] (BEAM-9473) Ensure that during vendoring we don't copy over META-INF signing/checksum/index files from other jars.

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9473?focusedWorklogId=400327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400327
 ]

ASF GitHub Bot logged work on BEAM-9473:


Author: ASF GitHub Bot
Created on: 09/Mar/20 19:37
Start Date: 09/Mar/20 19:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11082: [BEAM-9473] Dont 
copy over META-INF index/checksum/signing files during vendoring
URL: https://github.com/apache/beam/pull/11082#issuecomment-596740782
 
 
   Run Website PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400327)
Time Spent: 0.5h  (was: 20m)

> Ensure that during vendoring we don't copy over META-INF 
> signing/checksum/index files from other jars.
> --
>
> Key: BEAM-9473
> URL: https://issues.apache.org/jira/browse/BEAM-9473
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#The_META-INF_directory]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9413) [beam_PostCommit_Py_ValCont] build failed

2020-03-09 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang resolved BEAM-9413.

Resolution: Fixed

> [beam_PostCommit_Py_ValCont] build failed
> -
>
> Key: BEAM-9413
> URL: https://issues.apache.org/jira/browse/BEAM-9413
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yueyang Qiu
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: currently-failing
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> See [https://builds.apache.org/job/beam_PostCommit_Py_ValCont/5706/]
> Error:
>  
> *16:12:13* The push refers to repository 
> [us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk]*16:12:13* An image does 
> not exist locally with the tag: 
> us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk*16:12:14* Build step 
> 'Execute shell' marked build as failure*16:12:15* Sending e-mails to: 
> bui...@beam.apache.org*16:12:15* Recording test results*16:12:16* ERROR: Step 
> 'Publish JUnit test result report' failed: No test report files were found. 
> Configuration error?*16:12:18* No emails were triggered.*16:12:18* Finished: 
> FAILURE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-03-09 Thread Boyuan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055320#comment-17055320
 ] 

Boyuan Zhang commented on BEAM-8979:


Hi, I also hit this issue and after running `pip install -r 
build-requirements.txt` I still got error like 
https://screenshot.googleplex.com/Pvxdmu5UXBW. Any idea what I miss here?

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call ingen_protos.py:131
>  * Appending protoc-gen-mypy's directory to the PATH variable
> I wasn't able to reproduce this error locally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9411) Use BigQuery DIRECT_READ by default for SQL

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9411?focusedWorklogId=400320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400320
 ]

ASF GitHub Bot logged work on BEAM-9411:


Author: ASF GitHub Bot
Created on: 09/Mar/20 19:10
Start Date: 09/Mar/20 19:10
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #11003: [BEAM-9411] Enable 
BigQuery DIRECT_READ by default in SQL
URL: https://github.com/apache/beam/pull/11003#issuecomment-596728268
 
 
   Run SQL Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400320)
Time Spent: 3.5h  (was: 3h 20m)

> Use BigQuery DIRECT_READ by default for SQL
> ---
>
> Key: BEAM-9411
> URL: https://issues.apache.org/jira/browse/BEAM-9411
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The BigQuery DIRECT_READ mode is available globally as of January 17, we 
> should enable it by default!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9411) Use BigQuery DIRECT_READ by default for SQL

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9411?focusedWorklogId=400319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400319
 ]

ASF GitHub Bot logged work on BEAM-9411:


Author: ASF GitHub Bot
Created on: 09/Mar/20 19:10
Start Date: 09/Mar/20 19:10
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #11003: [BEAM-9411] 
Enable BigQuery DIRECT_READ by default in SQL
URL: https://github.com/apache/beam/pull/11003#discussion_r389903110
 
 

 ##
 File path: 
sdks/java/extensions/sql/datacatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/DataCatalogBigQueryIT.java
 ##
 @@ -69,9 +73,18 @@
 
 @Test
 public void testRead() throws Exception {
-  bigQuery.insertRows(ID_NAME_SCHEMA, row(1, "name1"), row(2, "name2"), 
row(3, "name3"));
-
   TableReference bqTable = bigQuery.tableReference();
+
+  // Streaming inserts do not work with DIRECT_READ mode, there is a 
several hour lag.
+  PCollection data =
+  writePipeline.apply(Create.of(row(1, "name1"), row(2, "name2"), 
row(3, "name3")));
+  data.apply(
+  BigQueryIO.write()
+  .withSchema(BigQueryUtils.toTableSchema(ID_NAME_SCHEMA))
+  .withFormatFunction(BigQueryUtils.toTableRow())
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400319)
Time Spent: 3h 20m  (was: 3h 10m)

> Use BigQuery DIRECT_READ by default for SQL
> ---
>
> Key: BEAM-9411
> URL: https://issues.apache.org/jira/browse/BEAM-9411
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The BigQuery DIRECT_READ mode is available globally as of January 17, we 
> should enable it by default!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9474) Environment cleanup is not robust enough and may leak resources

2020-03-09 Thread Maximilian Michels (Jira)
Maximilian Michels created BEAM-9474:


 Summary: Environment cleanup is not robust enough and may leak 
resources
 Key: BEAM-9474
 URL: https://issues.apache.org/jira/browse/BEAM-9474
 Project: Beam
  Issue Type: Bug
  Components: java-fn-execution
Reporter: Maximilian Michels
Assignee: Maximilian Michels


The cleanup code in {{DefaultJobBundleFactory}} and its {{RemoteEnvironment}}s 
may leak resources. This is especially a concern when the execution engines 
reuses the same JVM or underlying machines for multiple runs of a pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9465) Reshuffle should trigger repeatedly

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9465?focusedWorklogId=400317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400317
 ]

ASF GitHub Bot logged work on BEAM-9465:


Author: ASF GitHub Bot
Created on: 09/Mar/20 18:56
Start Date: 09/Mar/20 18:56
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11071: [BEAM-9465] Fire 
repeatedly in reshuffle
URL: https://github.com/apache/beam/pull/11071#issuecomment-596721489
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400317)
Time Spent: 1h 20m  (was: 1h 10m)

> Reshuffle should trigger repeatedly
> ---
>
> Key: BEAM-9465
> URL: https://issues.apache.org/jira/browse/BEAM-9465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py#L516|https://www.google.com/url?q=https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py%23L516=D]
> should fire repeatedly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9465) Reshuffle should trigger repeatedly

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9465?focusedWorklogId=400315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400315
 ]

ASF GitHub Bot logged work on BEAM-9465:


Author: ASF GitHub Bot
Created on: 09/Mar/20 18:55
Start Date: 09/Mar/20 18:55
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11071: [BEAM-9465] Fire 
repeatedly in reshuffle
URL: https://github.com/apache/beam/pull/11071#issuecomment-596721142
 
 
   LGTM as it's just a cherry pick PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400315)
Time Spent: 1h 10m  (was: 1h)

> Reshuffle should trigger repeatedly
> ---
>
> Key: BEAM-9465
> URL: https://issues.apache.org/jira/browse/BEAM-9465
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py#L516|https://www.google.com/url?q=https://github.com/apache/beam/blob/403a08f8b95d13e5381a22c1c032ad22c8848650/sdks/python/apache_beam/transforms/trigger.py%23L516=D]
> should fire repeatedly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9411) Use BigQuery DIRECT_READ by default for SQL

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9411?focusedWorklogId=400313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400313
 ]

ASF GitHub Bot logged work on BEAM-9411:


Author: ASF GitHub Bot
Created on: 09/Mar/20 18:53
Start Date: 09/Mar/20 18:53
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #11003: 
[BEAM-9411] Enable BigQuery DIRECT_READ by default in SQL
URL: https://github.com/apache/beam/pull/11003#discussion_r389893825
 
 

 ##
 File path: 
sdks/java/extensions/sql/datacatalog/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/DataCatalogBigQueryIT.java
 ##
 @@ -69,9 +73,18 @@
 
 @Test
 public void testRead() throws Exception {
-  bigQuery.insertRows(ID_NAME_SCHEMA, row(1, "name1"), row(2, "name2"), 
row(3, "name3"));
-
   TableReference bqTable = bigQuery.tableReference();
+
+  // Streaming inserts do not work with DIRECT_READ mode, there is a 
several hour lag.
+  PCollection data =
+  writePipeline.apply(Create.of(row(1, "name1"), row(2, "name2"), 
row(3, "name3")));
+  data.apply(
+  BigQueryIO.write()
+  .withSchema(BigQueryUtils.toTableSchema(ID_NAME_SCHEMA))
+  .withFormatFunction(BigQueryUtils.toTableRow())
 
 Review comment:
   I would suggest to add `.withMethod(Method.FILE_LOADS)` to be explicit. It 
connects it better with the comment above. And is protected in case the default 
changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400313)
Time Spent: 3h 10m  (was: 3h)

> Use BigQuery DIRECT_READ by default for SQL
> ---
>
> Key: BEAM-9411
> URL: https://issues.apache.org/jira/browse/BEAM-9411
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The BigQuery DIRECT_READ mode is available globally as of January 17, we 
> should enable it by default!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9295) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9295?focusedWorklogId=400310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400310
 ]

ASF GitHub Bot logged work on BEAM-9295:


Author: ASF GitHub Bot
Created on: 09/Mar/20 18:46
Start Date: 09/Mar/20 18:46
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10945: [BEAM-9295] Add Flink 
1.10 build target and Make FlinkRunner compatible with Flink 1.10
URL: https://github.com/apache/beam/pull/10945#issuecomment-596716342
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400310)
Time Spent: 4h  (was: 3h 50m)

> Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
> ---
>
> Key: BEAM-9295
> URL: https://issues.apache.org/jira/browse/BEAM-9295
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Apache Flink 1.10 has completed the final release vote, see [1]. So, I would 
> like to add Flink 1.10 build target and make Flink Runner compatible with 
> Flink 1.10.
> And I appreciate it if you can leave your suggestions or comments!
> [1] 
> https://lists.apache.org/thread.html/r97672d4d1e47372cebf23e6643a6cc30a06bfbdf3f277b0be3695b15%40%3Cdev.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9379) Upgrade to Calcite 1.22.0

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9379?focusedWorklogId=400308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400308
 ]

ASF GitHub Bot logged work on BEAM-9379:


Author: ASF GitHub Bot
Created on: 09/Mar/20 18:45
Start Date: 09/Mar/20 18:45
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11058: [BEAM-9379] 
vendor calcite 1.22.0
URL: https://github.com/apache/beam/pull/11058#discussion_r389886788
 
 

 ##
 File path: vendor/calcite-1_22_0/build.gradle
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+plugins { id 'org.apache.beam.vendor-java' }
+
+description = "Apache Beam :: Vendored Dependencies :: Calcite 1.22.0"
+
+group = "org.apache.beam"
+version = "0.1"
+
+def calcite_version = "1.22.0"
+def avatica_version = "1.16.0"
+def prefix = "org.apache.beam.vendor.calcite.v1_22_0"
+
+List packagesToRelocate = [
+"antlr",
+"com.beust",
+"com.esri",
+"com.fasterxml",
+"com.google.common",
+"com.google.gson",
+"com.google.inject",
+"com.google.protobuf",
+"com.google.thirdparty",
+"com.jayway.jsonpath",
+"com.yahoo",
+"javax",
+"net.minidev",
+"net.sf",
+"org.antlr",
+"org.aopalliance",
+"org.apache.calcite",
+"org.apache.commons",
+"org.apache.http",
+"org.apache.tool",
+"org.apiguardian.api",
+"org.codehaus",
+"org.easymock",
+"org.eclipse.jetty",
+"org.json",
+"org.objectweb",
+"org.objenesis",
+"org.pentaho",
+"org.slf4j",
 
 Review comment:
   If you relocate logging, you will lose all logging from this vendored 
library.
   
   Consider not relocating them and marking them as runtime depedencies.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400308)
Time Spent: 50m  (was: 40m)

> Upgrade to Calcite 1.22.0
> -
>
> Key: BEAM-9379
> URL: https://issues.apache.org/jira/browse/BEAM-9379
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Upgrade to Calcite 1.22.0 after it gets released (expected by end of Feb 
> 2020).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9379) Upgrade to Calcite 1.22.0

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9379?focusedWorklogId=400309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400309
 ]

ASF GitHub Bot logged work on BEAM-9379:


Author: ASF GitHub Bot
Created on: 09/Mar/20 18:45
Start Date: 09/Mar/20 18:45
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11058: [BEAM-9379] 
vendor calcite 1.22.0
URL: https://github.com/apache/beam/pull/11058#discussion_r389887843
 
 

 ##
 File path: vendor/calcite-1_22_0/build.gradle
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+plugins { id 'org.apache.beam.vendor-java' }
+
+description = "Apache Beam :: Vendored Dependencies :: Calcite 1.22.0"
+
+group = "org.apache.beam"
+version = "0.1"
+
+def calcite_version = "1.22.0"
+def avatica_version = "1.16.0"
+def prefix = "org.apache.beam.vendor.calcite.v1_22_0"
+
+List packagesToRelocate = [
+"antlr",
+"com.beust",
+"com.esri",
+"com.fasterxml",
+"com.google.common",
+"com.google.gson",
+"com.google.inject",
+"com.google.protobuf",
+"com.google.thirdparty",
+"com.jayway.jsonpath",
+"com.yahoo",
+"javax",
+"net.minidev",
+"net.sf",
+"org.antlr",
+"org.aopalliance",
+"org.apache.calcite",
+"org.apache.commons",
+"org.apache.http",
+"org.apache.tool",
+"org.apiguardian.api",
+"org.codehaus",
+"org.easymock",
+"org.eclipse.jetty",
+"org.json",
+"org.objectweb",
+"org.objenesis",
+"org.pentaho",
+"org.slf4j",
+"org.testng",
+"org.yaml",
+]
+
+vendorJava(
+dependencies: [
+"org.apache.calcite:calcite-core:$calcite_version",
+"org.apache.calcite:calcite-linq4j:$calcite_version",
+"org.apache.calcite.avatica:avatica-core:$avatica_version",
+library.java.protobuf_java,
+library.java.slf4j_api,
+
+"cglib:cglib:3.3.0",
+"com.google.code.gson:gson:2.8.6",
+"javax.servlet:javax.servlet-api:3.0.1",
+"javax.transaction:javax.transaction-api:1.2",
+"org.antlr:stringtemplate:3.2.1",
+"org.apache.ant:ant:1.9.2",
+"org.codehaus.jettison:jettison:1.4.0",
+"org.easymock:easymock:4.1",
+"org.eclipse.jetty:jetty-jmx:9.4.27.v20200227",
+"org.json:json:20190722",
+"org.testng:testng:7.1.0",
+],
+runtimeDependencies: [
+"com.jayway.jsonpath:json-path:2.4.0",
 
 Review comment:
   Why are you relocating com.jayway and marking json-path as a runtime 
dependency?
   
   Ditto for other libraries below such as commons-logging, antlr, slf4j
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400309)
Time Spent: 1h  (was: 50m)

> Upgrade to Calcite 1.22.0
> -
>
> Key: BEAM-9379
> URL: https://issues.apache.org/jira/browse/BEAM-9379
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Upgrade to Calcite 1.22.0 after it gets released (expected by end of Feb 
> 2020).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9379) Upgrade to Calcite 1.22.0

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9379?focusedWorklogId=400307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-400307
 ]

ASF GitHub Bot logged work on BEAM-9379:


Author: ASF GitHub Bot
Created on: 09/Mar/20 18:41
Start Date: 09/Mar/20 18:41
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #11058: [BEAM-9379] vendor 
calcite 1.22.0
URL: https://github.com/apache/beam/pull/11058#issuecomment-596713933
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 400307)
Time Spent: 40m  (was: 0.5h)

> Upgrade to Calcite 1.22.0
> -
>
> Key: BEAM-9379
> URL: https://issues.apache.org/jira/browse/BEAM-9379
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Upgrade to Calcite 1.22.0 after it gets released (expected by end of Feb 
> 2020).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >