[jira] [Resolved] (BEAM-8558) BigQueryIOIT Jenkins job flakes

2019-11-05 Thread Michal Walenia (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michal Walenia resolved BEAM-8558.
--
Fix Version/s: 2.17.0
   Resolution: Fixed

> BigQueryIOIT Jenkins job flakes
> ---
>
> Key: BEAM-8558
> URL: https://issues.apache.org/jira/browse/BEAM-8558
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Java BigQueryIOIT fails with exception:
>  
> {code:java}
> org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT > testWriteThenRead 
> FAILED java.lang.IllegalStateException: BigQuery table is not empty: 
> apache-beam-testing:beam_performance.bqio_write_10GB_java. at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:588)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.verifyTableNotExistOrEmpty(BigQueryHelpers.java:511)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.validate(BigQueryIO.java:2246)
>  at 
> org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:643)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:653)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
>  at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) at 
> org.apache.beam.sdk.Pipeline.validate(Pipeline.java:579) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:314) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) at 
> org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT.testWrite(BigQueryIOIT.java:146)
>  at 
> org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT.testWriteThenRead(BigQueryIOIT.java:116)
> {code}
> The fix is to append unique UUID to the table names in the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8564) Add LZO compression and decompression support for java sdk

2019-11-05 Thread Amogh Tiwari (Jira)
Amogh Tiwari created BEAM-8564:
--

 Summary: Add LZO compression and decompression support for java sdk
 Key: BEAM-8564
 URL: https://issues.apache.org/jira/browse/BEAM-8564
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-core
Reporter: Amogh Tiwari


LZO is a lossless data compression algorithm which is focused on compression 
and decompression speeds.

This will enable Apache Beam sdk to compress/decompress files using LZO 
compression. 

This will include the following functionalities:
 # compress() : for compressing files into an LZO archive
 # decompress() : for decompressing files archived using LZO compression

Appropriate Input and Output stream will also be added to enable working with 
LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=339198&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339198
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 06/Nov/19 07:08
Start Date: 06/Nov/19 07:08
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10004: [BEAM-8442] 
Unify bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/10004#issuecomment-550176508
 
 
   R: @mxm @ibzib @aaltay @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339198)
Time Spent: 2h 20m  (was: 2h 10m)

> Unify bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339186
 ]

ASF GitHub Bot logged work on BEAM-8557:


Author: ASF GitHub Bot
Created on: 06/Nov/19 06:18
Start Date: 06/Nov/19 06:18
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9991: 
[BEAM-8557] Cleanup the useless null check for ResponseStreamObserver…
URL: https://github.com/apache/beam/pull/9991#discussion_r342931236
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java
 ##
 @@ -100,7 +99,8 @@ public void testRequestError() throws Exception {
   }
 
   @Test
-  public void testUnknownResponseIgnored() throws Exception {
+  public void testUnknownResponse() throws Exception {
+thrown.expect(NullPointerException.class);
 
 Review comment:
   I agree that meaningful message is better than NPE.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339186)
Time Spent: 1h 10m  (was: 1h)

> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339184
 ]

ASF GitHub Bot logged work on BEAM-8557:


Author: ASF GitHub Bot
Created on: 06/Nov/19 06:17
Start Date: 06/Nov/19 06:17
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9991: 
[BEAM-8557] Cleanup the useless null check for ResponseStreamObserver…
URL: https://github.com/apache/beam/pull/9991#discussion_r342931236
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java
 ##
 @@ -100,7 +99,8 @@ public void testRequestError() throws Exception {
   }
 
   @Test
-  public void testUnknownResponseIgnored() throws Exception {
+  public void testUnknownResponse() throws Exception {
+thrown.expect(NullPointerException.class);
 
 Review comment:
   I agree that meaningful message is better than NPE.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339184)
Time Spent: 50m  (was: 40m)

> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339183
 ]

ASF GitHub Bot logged work on BEAM-8557:


Author: ASF GitHub Bot
Created on: 06/Nov/19 06:17
Start Date: 06/Nov/19 06:17
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9991: 
[BEAM-8557] Cleanup the useless null check for ResponseStreamObserver…
URL: https://github.com/apache/beam/pull/9991#discussion_r342931221
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java
 ##
 @@ -100,7 +99,8 @@ public void testRequestError() throws Exception {
   }
 
   @Test
-  public void testUnknownResponseIgnored() throws Exception {
+  public void testUnknownResponse() throws Exception {
+thrown.expect(NullPointerException.class);
 
 Review comment:
   I agree that meaningful message is better than NPE.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339183)
Time Spent: 40m  (was: 0.5h)

> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339185&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339185
 ]

ASF GitHub Bot logged work on BEAM-8557:


Author: ASF GitHub Bot
Created on: 06/Nov/19 06:17
Start Date: 06/Nov/19 06:17
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9991: 
[BEAM-8557] Cleanup the useless null check for ResponseStreamObserver…
URL: https://github.com/apache/beam/pull/9991#discussion_r342931269
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java
 ##
 @@ -148,16 +148,14 @@ public void onNext(BeamFnApi.InstructionResponse 
response) {
   LOG.debug("Received InstructionResponse {}", response);
   CompletableFuture responseFuture =
   outstandingRequests.remove(response.getInstructionId());
-  if (responseFuture != null) {
-if (response.getError().isEmpty()) {
-  responseFuture.complete(response);
-} else {
-  responseFuture.completeExceptionally(
-  new RuntimeException(
-  String.format(
-  "Error received from SDK harness for instruction %s: %s",
-  response.getInstructionId(), response.getError(;
-}
+  if (response.getError().isEmpty()) {
 
 Review comment:
   Thanks for the quick reply!
   
   Is there some logic for the resent the message?  I have not find it, so  I 
think in this case there is no duplicate message. 
   
   Please correct me if there anything I misunderstand :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339185)
Time Spent: 1h  (was: 50m)

> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339175
 ]

ASF GitHub Bot logged work on BEAM-8557:


Author: ASF GitHub Bot
Created on: 06/Nov/19 06:03
Start Date: 06/Nov/19 06:03
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9991: 
[BEAM-8557] Cleanup the useless null check for ResponseStreamObserver…
URL: https://github.com/apache/beam/pull/9991#discussion_r342927908
 
 

 ##
 File path: 
runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java
 ##
 @@ -100,7 +99,8 @@ public void testRequestError() throws Exception {
   }
 
   @Test
-  public void testUnknownResponseIgnored() throws Exception {
+  public void testUnknownResponse() throws Exception {
+thrown.expect(NullPointerException.class);
 
 Review comment:
   My opinion: an NPE always indicates a programming error. Null should not 
exist anymore in high-level programming. Or in most medium-level or low-level 
programming. The code should prevent an NPE and give a good error message.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339175)
Time Spent: 20m  (was: 10m)

> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339176
 ]

ASF GitHub Bot logged work on BEAM-8557:


Author: ASF GitHub Bot
Created on: 06/Nov/19 06:03
Start Date: 06/Nov/19 06:03
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9991: 
[BEAM-8557] Cleanup the useless null check for ResponseStreamObserver…
URL: https://github.com/apache/beam/pull/9991#discussion_r342928301
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java
 ##
 @@ -148,16 +148,14 @@ public void onNext(BeamFnApi.InstructionResponse 
response) {
   LOG.debug("Received InstructionResponse {}", response);
   CompletableFuture responseFuture =
   outstandingRequests.remove(response.getInstructionId());
-  if (responseFuture != null) {
-if (response.getError().isEmpty()) {
-  responseFuture.complete(response);
-} else {
-  responseFuture.completeExceptionally(
-  new RuntimeException(
-  String.format(
-  "Error received from SDK harness for instruction %s: %s",
-  response.getInstructionId(), response.getError(;
-}
+  if (response.getError().isEmpty()) {
 
 Review comment:
   I do not like this change, because of my comment below about NPE.
   
   But also, I do not know the code well enough. Is there a distributed systems 
question? Can a duplicate message be received that should be recieved?
   
   If the answer is "yes" then the previous code is correct. If the answer is 
"no" then the previous code should crash or at least log an error.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339176)
Time Spent: 0.5h  (was: 20m)

> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated BEAM-8557:
--
Description: 
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
{code:java}
 @Test
 public void testUnknownResponseIgnored() throws Exception{code}
I do not know why we need test this case? I think it would be better if we 
throw the Exception for an UnknownResponse, otherwise, this may hidden a 
potential bug. 

Please correct me if there anything I misunderstand @kennknowles

 

  was:
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
{code:java}
 @Test
 public void testUnknownResponseIgnored() throws Exception{code}
I do not know why we need test this case? I think it would be better if we 
throw the Exception for an UnknownResponse, otherwise, this may hidden a 
potential bug. 

What do you think? @kennknowles

 


> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> Please correct me if there anything I misunderstand @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3493) Prevent users from "implementing" PipelineOptions

2019-11-05 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968103#comment-16968103
 ] 

Kenneth Knowles commented on BEAM-3493:
---

This bug was filed only 1 year ago, but that line seems to be 3 years old. Can 
you confirm that an implementation of PipelineOptions fails? This should 
actually be a dynamic check, so maybe it can be in a test.

> Prevent users from "implementing" PipelineOptions
> -
>
> Key: BEAM-3493
> URL: https://issues.apache.org/jira/browse/BEAM-3493
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Jing Chen
>Priority: Minor
>  Labels: newbie, starter
>
> I've seen a user implement \{{PipelineOptions}}. This implies that it is 
> backwards-incompatible to add new options, which is of course not our intent. 
> We should at least document very loudly that it is not to be implemented, and 
> preferably have some automation that will fail on load if they have 
> implemented it. Ideas?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3493) Prevent users from "implementing" PipelineOptions

2019-11-05 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3493:
-

Assignee: Jing Chen  (was: Luke Cwik)

> Prevent users from "implementing" PipelineOptions
> -
>
> Key: BEAM-3493
> URL: https://issues.apache.org/jira/browse/BEAM-3493
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Jing Chen
>Priority: Minor
>  Labels: newbie, starter
>
> I've seen a user implement \{{PipelineOptions}}. This implies that it is 
> backwards-incompatible to add new options, which is of course not our intent. 
> We should at least document very loudly that it is not to be implemented, and 
> preferably have some automation that will fail on load if they have 
> implemented it. Ideas?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3493) Prevent users from "implementing" PipelineOptions

2019-11-05 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-3493:
-

Assignee: Luke Cwik  (was: Jing Chen)

> Prevent users from "implementing" PipelineOptions
> -
>
> Key: BEAM-3493
> URL: https://issues.apache.org/jira/browse/BEAM-3493
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>Priority: Minor
>  Labels: newbie, starter
>
> I've seen a user implement \{{PipelineOptions}}. This implies that it is 
> backwards-incompatible to add new options, which is of course not our intent. 
> We should at least document very loudly that it is not to be implemented, and 
> preferably have some automation that will fail on load if they have 
> implemented it. Ideas?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated BEAM-8557:
--
Description: 
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
{code:java}
 @Test
 public void testUnknownResponseIgnored() throws Exception{code}
I do not know why we need test this case? I think it would be better if we 
throw the Exception for an UnknownResponse, otherwise, this may hidden a 
potential bug. 

What do you think? @kennknowles

 

  was:
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
{code:java}
 @Test
 public void testUnknownResponseIgnored() throws Exception{code}
I am do not know why we need test this case? I think it would be better if we 
throw the Exception for an UnknownResponse, otherwise, this may hidden a 
potential bug. 

What do you think? @kennknowles

 


> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> What do you think? @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated BEAM-8557:
--
Description: 
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
{code:java}
 @Test
 public void testUnknownResponseIgnored() throws Exception{code}
I am do not know why we need test this case? I think it would be better if we 
throw the Exception for an UnknownResponse, otherwise, this may hidden a 
potential bug. 

What do you think? @kennknowles

 

  was:
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
{code:java}
 @Test
 public void testUnknownResponseIgnored() throws Exception{code}

 I am do not know why we need test this case? @kennknowles

What do you think?

 


> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
> I am do not know why we need test this case? I think it would be better if we 
> throw the Exception for an UnknownResponse, otherwise, this may hidden a 
> potential bug. 
> What do you think? @kennknowles
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3493) Prevent users from "implementing" PipelineOptions

2019-11-05 Thread Jing Chen (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968095#comment-16968095
 ] 

Jing Chen commented on BEAM-3493:
-

The issue should have been fixed by 
[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsValidator.java#L69]
 and L70.

 

I am about to close the ticket once either [~kenn] or [~lcwik] could confirm it.

 

Thanks

Jing

> Prevent users from "implementing" PipelineOptions
> -
>
> Key: BEAM-3493
> URL: https://issues.apache.org/jira/browse/BEAM-3493
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Jing Chen
>Priority: Minor
>  Labels: newbie, starter
>
> I've seen a user implement \{{PipelineOptions}}. This implies that it is 
> backwards-incompatible to add new options, which is of course not our intent. 
> We should at least document very loudly that it is not to be implemented, and 
> preferably have some automation that will fail on load if they have 
> implemented it. Ideas?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8563) View.asMap() should reject multiply-firing triggers

2019-11-05 Thread Kenneth Knowles (Jira)
Kenneth Knowles created BEAM-8563:
-

 Summary: View.asMap() should reject multiply-firing triggers
 Key: BEAM-8563
 URL: https://issues.apache.org/jira/browse/BEAM-8563
 Project: Beam
  Issue Type: Bug
  Components: beam-model, sdk-java-core
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


View.asMap() will always cause a failure with multiple firings, because a key 
will have duplicate elements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated BEAM-8557:
--
Description: 
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
{code:java}
 @Test
 public void testUnknownResponseIgnored() throws Exception{code}

 I am do not know why we need test this case? @kennknowles

What do you think?

 

  was:
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
```
@Test
 public void testUnknownResponseIgnored() throws Exception {
 String id = "actualInstruction";
 String unknownId = "unknownInstruction";

CompletionStage responseFuture =
 
client.handle(BeamFnApi.InstructionRequest.newBuilder().setInstructionId(id).build());

client
 .asResponseObserver()
 
.onNext(BeamFnApi.InstructionResponse.newBuilder().setInstructionId(unknownId).build());

assertThat(MoreFutures.isDone(responseFuture), is(false));
 assertThat(MoreFutures.isCancelled(responseFuture), is(false));
 }
```
I am do not know why we need test this case? @kennknowles

What do you think?

 


> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> {code:java}
>  @Test
>  public void testUnknownResponseIgnored() throws Exception{code}
>  I am do not know why we need test this case? @kennknowles
> What do you think?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8557) Clean up useless null check.

2019-11-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated BEAM-8557:
--
Description: 
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

 

I found the test as follows:
```
@Test
 public void testUnknownResponseIgnored() throws Exception {
 String id = "actualInstruction";
 String unknownId = "unknownInstruction";

CompletionStage responseFuture =
 
client.handle(BeamFnApi.InstructionRequest.newBuilder().setInstructionId(id).build());

client
 .asResponseObserver()
 
.onNext(BeamFnApi.InstructionResponse.newBuilder().setInstructionId(unknownId).build());

assertThat(MoreFutures.isDone(responseFuture), is(false));
 assertThat(MoreFutures.isCancelled(responseFuture), is(false));
 }
```
I am do not know why we need test this case? @kennknowles

What do you think?

 

  was:
I think we do not need null check here:

[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]

Because before the the `onNext` call, the `Future` already put into the queue 
in `handle` method.

What do you think?

 


> Clean up useless null check.
> 
>
> Key: BEAM-8557
> URL: https://issues.apache.org/jira/browse/BEAM-8557
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think we do not need null check here:
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151]
> Because before the the `onNext` call, the `Future` already put into the queue 
> in `handle` method.
>  
> I found the test as follows:
> ```
> @Test
>  public void testUnknownResponseIgnored() throws Exception {
>  String id = "actualInstruction";
>  String unknownId = "unknownInstruction";
> CompletionStage responseFuture =
>  
> client.handle(BeamFnApi.InstructionRequest.newBuilder().setInstructionId(id).build());
> client
>  .asResponseObserver()
>  
> .onNext(BeamFnApi.InstructionResponse.newBuilder().setInstructionId(unknownId).build());
> assertThat(MoreFutures.isDone(responseFuture), is(false));
>  assertThat(MoreFutures.isCancelled(responseFuture), is(false));
>  }
> ```
> I am do not know why we need test this case? @kennknowles
> What do you think?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6007) Create ClassLoadingStrategy with Java 11 compatible way

2019-11-05 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968080#comment-16968080
 ] 

Kenneth Knowles commented on BEAM-6007:
---

[~reuvenlax] is this the same bug we've been chatting about?

> Create ClassLoadingStrategy with Java 11 compatible way
> ---
>
> Key: BEAM-6007
> URL: https://issues.apache.org/jira/browse/BEAM-6007
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Keisuke Kondo
>Assignee: Keisuke Kondo
>Priority: Minor
>
> Since sun.misc.Unsafe API was deprecated in Java 11,
>  
> {code:java}
> ClassLoadingStrategy.Default.INJECTION{code}
> is not work with Java 11.
> Author of byte-buddy library shared way to make it compatible with Java 11 on 
> his blog post. (and way to keep compatibility with java 9 and prior version)
> [http://mydailyjava.blogspot.com/2018/04/jdk-11-and-proxies-in-world-past.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=339138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339138
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 06/Nov/19 04:16
Start Date: 06/Nov/19 04:16
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9965: [BEAM-8539] Make job 
state transitions well-defined and consistent
URL: https://github.com/apache/beam/pull/9965#issuecomment-550135736
 
 
   R: @lukecwik 
   R: @mxm 
   R: @robertwb 
   
   This is a followup to https://github.com/apache/beam/pull/9969 that corrects 
python-based runners to behave the same as Java-based runners wrt STOPPED: it 
is the initial state and not a terminal state.  
   
   Tests pass, but I'm not sure if this could cause problems with Dataflow. 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339138)
Time Spent: 3h 10m  (was: 3h)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8533) test_java_expansion_portable_runner failing

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8533?focusedWorklogId=339135&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339135
 ]

ASF GitHub Bot logged work on BEAM-8533:


Author: ASF GitHub Bot
Created on: 06/Nov/19 03:48
Start Date: 06/Nov/19 03:48
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9956: [BEAM-8533] 
Revert "[BEAM-8442] Remove duplicate code for bundle register in Pyth…
URL: https://github.com/apache/beam/pull/9956#issuecomment-550130482
 
 
   @mxm @ibzib Thanks a lot for your analysis. Really sorry that my commit 
breaks the post commit tests. I have investigated this issue and submitted a 
new PR #10004 which addressed this issue. It includes the following changes:
   
   - Update the client side implementation of Fn API to make sure registration 
is executed successfully before executing process_bundle. Personally I think 
that the client side implementation of Fn API should not assume that the 
implementation of registration in SDK harness is synchronous. The registration 
API itself is asynchronous and the client side implementation should check if 
the registration succeed. This will decouple the implementation dependency 
between the runner and the SDK harness.
   
   - Remove duplicate registration logic in the Python SDK harness. As you 
know, it changed the registration implementation of Python SDK harness to 
asynchronous.
   
   Looking forward to your feedback. :)
   
   Best,
   Jincheng
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339135)
Time Spent: 40m  (was: 0.5h)

> test_java_expansion_portable_runner failing
> ---
>
> Key: BEAM-8533
> URL: https://issues.apache.org/jira/browse/BEAM-8533
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> probable cause: 
> https://github.com/apache/beam/pull/9842#issuecomment-548496295
> 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> Error received from SDK harness for instruction 72: Traceback (most recent 
> call last):11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 173, in _execute*11:13:37* response = task()11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 196, in 11:13:37 self._execute(lambda: 
> worker.do_instruction(work), work)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 378, in process_bundle*11:13:37* instruction_id, 
> request.process_bundle_descriptor_id)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: 
> u'1-37'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=339134&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339134
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 06/Nov/19 03:46
Start Date: 06/Nov/19 03:46
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9842: [BEAM-8442] 
Remove duplicate code for bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/9842#issuecomment-550130122
 
 
   @mxm @ibzib @lukecwik @aaltay Thanks a lot for your analysis. Really sorry 
that my commit breaks the post commit tests. I have investigated this issue and 
submitted a new PR #10004 which addressed this issue. It includes the following 
changes:
   
   1) Update the client side implementation of Fn API to make sure registration 
is executed successfully before executing process_bundle. Personally I think 
that the client side implementation of Fn API should not assume that the 
implementation of registration in SDK harness is synchronous. The registration 
API itself is asynchronous and the client side implementation should check if 
the registration succeed. This will decouple the implementation dependency 
between the runner and the SDK harness.
   
   2) Remove duplicate registration logic in the Python SDK harness. As you 
know, it changed the registration implementation of Python SDK harness to 
asynchronous.
   
   Looking forward to your feedback. :)
   
   Best,
   Jincheng
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339134)
Time Spent: 2h 10m  (was: 2h)

> Unify bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=339133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339133
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 06/Nov/19 03:41
Start Date: 06/Nov/19 03:41
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #10004: 
[BEAM-8442] Unify bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/10004
 
 
   Motivation
   
   - Cleanup the code of `sdk_worker.py` on register process boundle descriptor.
   - Correct the implementation of Fn API, i.e., would be better to cater to 
the asynchronous design principles of the Beam API.
   
   Changes
- 6868333: 
Update the client side implemetation of Fn API to make sure 
registration is executed sucessfully before executing process_bundle. 
Personally I think that the client side implemetation of Fn API should not 
assume that the implementation of registration in SDK harness is synchourous. 
The registration API itself is asynchronous and the client side implementation 
should check if the registration succeed. This will decouple the implementation 
dependency between the runner and the SDK harness.
   
- 79d21b1:
Remove duplicate registration logic in the Python SDK harness.  It 
changed the registration implementation of Python SDK harness to asynchronous.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://b

[jira] [Updated] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-05 Thread sunjincheng (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated BEAM-8442:
--
Summary: Unify bundle register in Python SDK harness  (was: Unfiy bundle 
register in Python SDK harness)

> Unify bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8500) PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is incorrect

2019-11-05 Thread yoshiki obata (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yoshiki obata closed BEAM-8500.
---

> PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is 
> incorrect
> ---
>
> Key: BEAM-8500
> URL: https://issues.apache.org/jira/browse/BEAM-8500
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: yoshiki obata
>Assignee: yoshiki obata
>Priority: Trivial
>  Labels: easyfix, starter
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in 
> ./test-infra/jenkins/README.md is *Run Website PreCommit* [1]
>  But correct phrase is *Run Website_Stage_GCS PreCommit*
> [1] https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8500) PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is incorrect

2019-11-05 Thread yoshiki obata (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yoshiki obata resolved BEAM-8500.
-
Fix Version/s: Not applicable
   Resolution: Resolved

> PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is 
> incorrect
> ---
>
> Key: BEAM-8500
> URL: https://issues.apache.org/jira/browse/BEAM-8500
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: yoshiki obata
>Assignee: yoshiki obata
>Priority: Trivial
>  Labels: easyfix, starter
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in 
> ./test-infra/jenkins/README.md is *Run Website PreCommit* [1]
>  But correct phrase is *Run Website_Stage_GCS PreCommit*
> [1] https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=339125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339125
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 06/Nov/19 02:19
Start Date: 06/Nov/19 02:19
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on pull request #9686: 
[WIP][BEAM-5878] update dill min version to 0.3.1.1 and add test for functions 
with Keyword-only arguments
URL: https://github.com/apache/beam/pull/9686#discussion_r342890040
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -106,8 +106,7 @@ def get_version():
 'avro>=1.8.1,<2.0.0; python_version < "3.0"',
 'avro-python3>=1.8.1,<2.0.0; python_version >= "3.0"',
 'crcmod>=1.7,<2.0',
-# Dill doesn't guarantee comatibility between releases within minor 
version.
-'dill>=0.3.0,<0.3.1',
+'dill>=0.3.1.1,<0.4.0',
 
 Review comment:
   Updated. PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339125)
Time Spent: 16h  (was: 15h 50m)

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=339122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339122
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 06/Nov/19 02:03
Start Date: 06/Nov/19 02:03
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9965: WIP: [BEAM-8539] Make 
job state transitions well-defined and consistent
URL: https://github.com/apache/beam/pull/9965#issuecomment-550107727
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339122)
Time Spent: 3h  (was: 2h 50m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339121
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 02:00
Start Date: 06/Nov/19 02:00
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342886212
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -171,14 +199,17 @@ class StreamCoderImpl(CoderImpl):
   Subclass of CoderImpl implementing encode/decode using stream methods."""
 
   def encode(self, value):
+# type: (Any) -> bytes
 
 Review comment:
   No, but there's an issue for it, and an open pull request:
   
   https://github.com/python/mypy/issues/3903
   https://github.com/python/mypy/pull/7548
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339121)
Time Spent: 20h  (was: 19h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=339120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339120
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:56
Start Date: 06/Nov/19 01:56
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-550106200
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339120)
Time Spent: 15h 10m  (was: 15h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339119
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:53
Start Date: 06/Nov/19 01:53
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342884801
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -136,7 +150,12 @@ def estimate_size(self):
 """
 raise NotImplementedError
 
-  def split(self, desired_bundle_size, start_position=None, 
stop_position=None):
+  def split(self,
+desired_bundle_size,  # type: int
+start_position=None,  # type: Optional[int]
 
 Review comment:
   got it. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339119)
Time Spent: 19h 50m  (was: 19h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339117
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:53
Start Date: 06/Nov/19 01:53
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342884673
 
 

 ##
 File path: sdks/python/apache_beam/coders/coders.py
 ##
 @@ -248,7 +281,26 @@ def __ne__(self, other):
   def __hash__(self):
 return hash(type(self))
 
-  _known_urns = {}
+  _known_urns = {}  # type: Dict[str, Tuple[type, ConstructorFn]]
+
+  @classmethod
 
 Review comment:
   I'm confused what you are confused about :)
   
   It's a dictionary of str (URN) to tuple of payload type and constructor 
function (type alias defined a the top of the module).
   
   Or are you talking about the `@overloads` just below this?  
   
   Overload decorators work conceptually like an overloaded C function.   
Without overloads, we'd have to use a bunch of Unions around all the possible 
arg and result types, but then the picture of which combinations of arg and 
result type are valid is lost (lots of stuff in, lots of stuff out).  Using 
overloads we can clearly define those relationships.  This also lets mypy more 
accurately track the results of function invocations as well (e.g. when there's 
no `fn` argument, the result will be a `Callable`, but when there is, the 
result will be `None`).
   
   More docs:
   
   https://docs.python.org/3/library/typing.html#typing.overload
   https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339117)
Time Spent: 19.5h  (was: 19h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339118
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:53
Start Date: 06/Nov/19 01:53
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342884673
 
 

 ##
 File path: sdks/python/apache_beam/coders/coders.py
 ##
 @@ -248,7 +281,26 @@ def __ne__(self, other):
   def __hash__(self):
 return hash(type(self))
 
-  _known_urns = {}
+  _known_urns = {}  # type: Dict[str, Tuple[type, ConstructorFn]]
+
+  @classmethod
 
 Review comment:
   I'm confused about what you are confused about :)
   
   It's a dictionary of str (URN) to tuple of payload type and constructor 
function (type alias defined a the top of the module).
   
   Or are you talking about the `@overloads` just below this?  
   
   Overload decorators work conceptually like an overloaded C function.   
Without overloads, we'd have to use a bunch of Unions around all the possible 
arg and result types, but then the picture of which combinations of arg and 
result type are valid is lost (lots of stuff in, lots of stuff out).  Using 
overloads we can clearly define those relationships.  This also lets mypy more 
accurately track the results of function invocations as well (e.g. when there's 
no `fn` argument, the result will be a `Callable`, but when there is, the 
result will be `None`).
   
   More docs:
   
   https://docs.python.org/3/library/typing.html#typing.overload
   https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339118)
Time Spent: 19h 40m  (was: 19.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339116
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:40
Start Date: 06/Nov/19 01:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342882155
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -74,53 +86,66 @@
   else:
 is_compiled = False
 fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1
+
 # pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports
 
 
 _TIME_SHIFT = 1 << 63
 MIN_TIMESTAMP_micros = MIN_TIMESTAMP.micros
 MAX_TIMESTAMP_micros = MAX_TIMESTAMP.micros
 
+IterableStateReader = Callable[[bytes, 'CoderImpl'], Iterable]
+IterableStateWriter = Callable[[Iterable, 'CoderImpl'], bytes]
+Observables = List[Tuple[observable.ObservableMixin, 'CoderImpl']]
 
 class CoderImpl(object):
   """For internal use only; no backwards-compatibility guarantees."""
 
   def encode_to_stream(self, value, stream, nested):
+# type: (Any, create_OutputStream, bool) -> None
 
 Review comment:
   My intent was that mypy would use the slow_stream implementation of these, 
which _are_ types.  But I realized when looking at this that 
`create_InputStream` is being treated as `Any` because mypy can't inspect the 
`stream` c-extension module (mypy always uses the first import location of an 
object, unless `TYPE_CHECKING` is used). 
   
   The simple/obvious approach works, but is pretty verbose:
   
   ```python
   if TYPE_CHECKING:
 from apache_beam.transforms.window import IntervalWindow
 from .slow_stream import InputStream as create_InputStream
 from .slow_stream import OutputStream as create_OutputStream
 from .slow_stream import ByteCountingOutputStream
 from .slow_stream import get_varint_size
   else:
 # pylint: disable=wrong-import-order, wrong-import-position, 
ungrouped-imports
 try:
   from .stream import InputStream as create_InputStream
   from .stream import OutputStream as create_OutputStream
   from .stream import ByteCountingOutputStream
   from .stream import get_varint_size
   # Make it possible to import create_InputStream and other cdef-classes
   # from apache_beam.coders.coder_impl when Cython codepath is used.
   globals()['create_InputStream'] = create_InputStream
   globals()['create_OutputStream'] = create_OutputStream
   globals()['ByteCountingOutputStream'] = ByteCountingOutputStream
 except ImportError:
   from .slow_stream import InputStream as create_InputStream
   from .slow_stream import OutputStream as create_OutputStream
   from .slow_stream import ByteCountingOutputStream
   from .slow_stream import get_varint_size
   if False:  # pylint: disable=using-constant-test
 # This clause is interpreted by the compiler.
 from cython import compiled as is_compiled
   else:
 is_compiled = False
 fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1
   ```
   
   Here's another approach (still verbose but less redundant):
   
   ```python
   try:
 from . import stream
   except ImportError:
 SLOW_STREAM = True
   else:
 SLOW_STREAM = False
   
   if TYPE_CHECKING or SLOW_STREAM:
 from .slow_stream import InputStream as create_InputStream
 from .slow_stream import OutputStream as create_OutputStream
 from .slow_stream import ByteCountingOutputStream
 from .slow_stream import get_varint_size
   
 if False:  # pylint: disable=using-constant-test
   # This clause is interpreted by the compiler.
   from cython import compiled as is_compiled
 else:
   is_compiled = False
   fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1
   
   else:
 # pylint: disable=wrong-import-order, wrong-import-position, 
ungrouped-imports
 from .stream import InputStream as create_InputStream
 from .stream import OutputStream as create_OutputStream
 from .stream import ByteCountingOutputStream
 from .stream import get_varint_size
 # Make it possible to import create_InputStream and other cdef-classes
 # from apache_beam.coders.coder_impl when Cython codepath is used.
 globals()['create_InputStream'] = create_InputStream
 globals()['create_OutputStream'] = create_OutputStream
 globals()['ByteCountingOutputStream'] = ByteCountingOutputStream
 # pylint: enable=wrong-import-order, wrong-import-position, 
ungrouped-imports
   ```
   
   Opinion?
   
   
   
 

This is an automated message from the Apache Git Service.
To respond 

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339115
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:25
Start Date: 06/Nov/19 01:25
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342878935
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -724,10 +790,12 @@ def _construct_from_components(self, components):
 
 class _ConcatSequence(object):
   def __init__(self, head, tail):
+# type: (Iterable[Any], Iterable[Any]) -> None
 
 Review comment:
   Yeah, I avoided introducing generics in this MR.  I'll do that in a 
follow-up. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339115)
Time Spent: 19h 10m  (was: 19h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339114
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:24
Start Date: 06/Nov/19 01:24
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342878626
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1275,6 +1306,7 @@ def defer_remainder(self, watermark=None):
 raise NotImplementedError
 
   def deferred_status(self):
+# type: () -> Optional[Tuple[restriction_trackers.OffsetRange, Timestamp]]
 
 Review comment:
   Roger
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339114)
Time Spent: 19h  (was: 18h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 19h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339112
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:17
Start Date: 06/Nov/19 01:17
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9954: [BEAM-8335] 
Add the PTuple
URL: https://github.com/apache/beam/pull/9954#discussion_r342876345
 
 

 ##
 File path: sdks/python/apache_beam/pvalue.py
 ##
 @@ -201,6 +201,43 @@ class PDone(PValue):
   pass
 
 
+class PTuple(object):
+  """An object grouping multiple PCollections.
+
+  This class is useful for returning a named tuple of PCollections from a
+  composite.
+  """
+
+  def __init__(self, pcoll_dict):
+"""Initializes this named tuple with a dictionary of tagged PCollections.
+"""
+self._pcolls = pcoll_dict
+
+  def __str__(self):
+return '<%s>' % self._str_internal()
+
+  def __repr__(self):
+return '<%s at %s>' % (self._str_internal(), hex(id(self)))
 
 Review comment:
   The representation should be PTuple(key1=pc1, key2=pc2, ...)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339112)
Time Spent: 19h  (was: 18h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 19h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339111
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:17
Start Date: 06/Nov/19 01:17
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9954: [BEAM-8335] 
Add the PTuple
URL: https://github.com/apache/beam/pull/9954#discussion_r342876590
 
 

 ##
 File path: sdks/python/apache_beam/pvalue.py
 ##
 @@ -201,6 +201,43 @@ class PDone(PValue):
   pass
 
 
+class PTuple(object):
+  """An object grouping multiple PCollections.
+
+  This class is useful for returning a named tuple of PCollections from a
+  composite.
+  """
+
+  def __init__(self, pcoll_dict):
+"""Initializes this named tuple with a dictionary of tagged PCollections.
+"""
+self._pcolls = pcoll_dict
+
+  def __str__(self):
+return '<%s>' % self._str_internal()
+
+  def __repr__(self):
+return '<%s at %s>' % (self._str_internal(), hex(id(self)))
+
+  def _str_internal(self):
+return '%s pcollections=%s' % (
+self.__class__.__name__, self._pcolls)
+
+  def __iter__(self):
+for tag in self._pcolls:
+  yield self[tag]
+
+  def __getattr__(self, tag):
+# Special methods which may be accessed before the object is
 
 Review comment:
   We shouldn't support pickling these. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339111)
Time Spent: 18h 50m  (was: 18h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339108
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:13
Start Date: 06/Nov/19 01:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9954: [BEAM-8335] 
Add the PTuple
URL: https://github.com/apache/beam/pull/9954#discussion_r342876169
 
 

 ##
 File path: sdks/python/apache_beam/pvalue.py
 ##
 @@ -201,6 +201,43 @@ class PDone(PValue):
   pass
 
 
+class PTuple(object):
+  """An object grouping multiple PCollections.
+
+  This class is useful for returning a named tuple of PCollections from a
+  composite.
+  """
+
+  def __init__(self, pcoll_dict):
+"""Initializes this named tuple with a dictionary of tagged PCollections.
+"""
+self._pcolls = pcoll_dict
+
+  def __str__(self):
+return '<%s>' % self._str_internal()
 
 Review comment:
   Remove and let it default to `__repr__`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339108)
Time Spent: 18.5h  (was: 18h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339109
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:13
Start Date: 06/Nov/19 01:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9954: [BEAM-8335] 
Add the PTuple
URL: https://github.com/apache/beam/pull/9954#discussion_r342876169
 
 

 ##
 File path: sdks/python/apache_beam/pvalue.py
 ##
 @@ -201,6 +201,43 @@ class PDone(PValue):
   pass
 
 
+class PTuple(object):
+  """An object grouping multiple PCollections.
+
+  This class is useful for returning a named tuple of PCollections from a
+  composite.
+  """
+
+  def __init__(self, pcoll_dict):
+"""Initializes this named tuple with a dictionary of tagged PCollections.
+"""
+self._pcolls = pcoll_dict
+
+  def __str__(self):
+return '<%s>' % self._str_internal()
 
 Review comment:
   Remove and let it default to `__repr__` (and inline _str_internal). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339109)
Time Spent: 18h 40m  (was: 18.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8562) source packages don't include gradlew

2019-11-05 Thread Robert Lugg (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Lugg updated BEAM-8562:
--
Description: 
I want to run using Flink.  Procedure says source code is required, but gradlew 
isn't there
h3. Executing a Beam pipeline on a Flink Cluster

As of now you will need a copy of Apache Beam’s source code. You can download 
it on the [Downloads page|https://beam.apache.org/get-started/downloads/].

Following a couple of links, I get to:

[http://www.apache.org/dyn/closer.cgi/beam/2.16.0/apache-beam-2.16.0-source-release.zip]

But I don't see any gradlew wrappers where they should be.  If I grab current 
using 'git clone', then the files are there.

--- additional information ---

I manually copied gradlew from the git clone to the 2.16 source release.  That 
gradlew appears? to work.

 

  was:
I want to run using Flink.  Procedure says source code is required, but gradlew 
isn't there
h3. Executing a Beam pipeline on a Flink Cluster

As of now you will need a copy of Apache Beam’s source code. You can download 
it on the [Downloads page|https://beam.apache.org/get-started/downloads/].

Following a couple of links, I get to:

[http://www.apache.org/dyn/closer.cgi/beam/2.16.0/apache-beam-2.16.0-source-release.zip]

But I don't see any gradlew wrappers where they should be.  If I grab current 
using 'git clone', then the files are there.

Either the instructions should be modified or the source releases should 
contain all the tools.

 


> source packages don't include gradlew
> -
>
> Key: BEAM-8562
> URL: https://issues.apache.org/jira/browse/BEAM-8562
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
> Environment: linux
>Reporter: Robert Lugg
>Priority: Minor
>
> I want to run using Flink.  Procedure says source code is required, but 
> gradlew isn't there
> h3. Executing a Beam pipeline on a Flink Cluster
> As of now you will need a copy of Apache Beam’s source code. You can download 
> it on the [Downloads page|https://beam.apache.org/get-started/downloads/].
> Following a couple of links, I get to:
> [http://www.apache.org/dyn/closer.cgi/beam/2.16.0/apache-beam-2.16.0-source-release.zip]
> But I don't see any gradlew wrappers where they should be.  If I grab current 
> using 'git clone', then the files are there.
> --- additional information ---
> I manually copied gradlew from the git clone to the 2.16 source release.  
> That gradlew appears? to work.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339103
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:03
Start Date: 06/Nov/19 01:03
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] 
Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#issuecomment-550093576
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339103)
Time Spent: 3h 20m  (was: 3h 10m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339102
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:03
Start Date: 06/Nov/19 01:03
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] 
Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#issuecomment-550093576
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339102)
Time Spent: 3h 10m  (was: 3h)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339101
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:59
Start Date: 06/Nov/19 00:59
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] 
Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#issuecomment-550092685
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339101)
Time Spent: 3h  (was: 2h 50m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339100
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:59
Start Date: 06/Nov/19 00:59
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] 
Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#issuecomment-550092685
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339100)
Time Spent: 2h 50m  (was: 2h 40m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=339098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339098
 ]

ASF GitHub Bot logged work on BEAM-8402:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:44
Start Date: 06/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9811: [BEAM-8402] Create a 
class hierarchy to represent environments
URL: https://github.com/apache/beam/pull/9811#issuecomment-550089460
 
 
   Yes, this is mostly about the public API. This does create a module of 
environments, with several pre-defined, which gets us a step closer to just 
using these in the public API, which is not I think a scalable/portable way 
forward. We should at least add a caveat to that module that these are intended 
for internal use only. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339098)
Time Spent: 2h  (was: 1h 50m)

> Create a class hierarchy to represent environments
> --
>
> Key: BEAM-8402
> URL: https://issues.apache.org/jira/browse/BEAM-8402
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> As a first step towards making it possible to assign different environments 
> to sections of a pipeline, we first need to expose environment classes to the 
> pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
> environments exists solely in the portability framework as protobuf objects.  
>  By creating a hierarchy of "native" classes that represent the various 
> environment types -- external, docker, process, etc -- users will be able to 
> instantiate these and assign them to parts of the pipeline.  The assignment 
> portion will be covered in a follow-up issue/PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=339095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339095
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:41
Start Date: 06/Nov/19 00:41
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9789: [BEAM-8348] set 
job_name in portable_runner.py job request
URL: https://github.com/apache/beam/pull/9789#issuecomment-550088785
 
 
   I haven't made any progress on this. It's not a terribly important feature.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339095)
Time Spent: 2h 50m  (was: 2h 40m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=339093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339093
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:35
Start Date: 06/Nov/19 00:35
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9789: [BEAM-8348] set 
job_name in portable_runner.py job request
URL: https://github.com/apache/beam/pull/9789#issuecomment-550087290
 
 
   Any update on this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339093)
Time Spent: 2h 40m  (was: 2.5h)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339090
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:27
Start Date: 06/Nov/19 00:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342865760
 
 

 ##
 File path: sdks/python/apache_beam/options/value_provider.py
 ##
 @@ -23,6 +23,7 @@
 
 from builtins import object
 from functools import wraps
+from typing import Set
 
 Review comment:
   I don't follow.  Are you seeing an error?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339090)
Time Spent: 18h 50m  (was: 18h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339089
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:26
Start Date: 06/Nov/19 00:26
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342865507
 
 

 ##
 File path: sdks/python/apache_beam/io/textio.py
 ##
 @@ -92,7 +93,7 @@ def __init__(self,
min_bundle_size,
compression_type,
strip_trailing_newlines,
-   coder,
+   coder,  # type: coders.Coder
 
 Review comment:
   Yeah, the latter.  It'll be good to go back and fill in the blanks, but I 
was trying to maximize my time. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339089)
Time Spent: 18h 40m  (was: 18.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339087
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:24
Start Date: 06/Nov/19 00:24
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342865083
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -838,11 +911,12 @@ def encode_to_stream(self, value, out, nested):
 out.write_var_int64(0)
 
   def decode_from_stream(self, in_stream, nested):
+# type: (create_InputStream, bool) -> Sequence
 size = in_stream.read_bigendian_int32()
 
 if size >= 0:
   elements = [self._elem_coder.decode_from_stream(in_stream, True)
-  for _ in range(size)]
+  for _ in range(size)]  # type: Iterable[Any]
 
 Review comment:
   This is a bit less obvious than that.  mypy knows that this is a 
`List[Any]`, but below this `element` is rebound to `_ConcatSequence`.  Without 
this annotation, we get this error:
   
   ```
   apache_beam/coders/coder_impl.py:935: error: Incompatible types in 
assignment (expression has type "_ConcatSequence", variable has type 
"List[Any]")  [assignment]
   ```
   
   Here's the full context:
   
   ```python
   if size >= 0:
 elements = [self._elem_coder.decode_from_stream(in_stream, True)
 for _ in range(size)]  # type: Iterable[Any]
   else:
 elements = []
 count = in_stream.read_var_int64()
 while count > 0:
   for _ in range(count):
 elements.append(self._elem_coder.decode_from_stream(in_stream, 
True))
   count = in_stream.read_var_int64()
   
 if count == -1:
   if self._read_state is None:
 raise ValueError(
 'Cannot read state-written iterable without state reader.')
   
   state_token = in_stream.read_all(True)
   elements = _ConcatSequence(
   elements, self._read_state(state_token, self._elem_coder))
   ```
   
   The first time that a variable appears in the code we can take that 
opportunity to relax the type definition, in this case from `List` to 
`Iterable`.  `_ConcatSequence` is also an `Iterable`, so it meets the type 
requirement is allowed to be assigned to `elements`.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339087)
Time Spent: 18.5h  (was: 18h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339085
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:22
Start Date: 06/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clarify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   I think we have a little bit trade off here:
   1.  What we have here: Determining if the job is started from a pipeline 
that was originally bundled with an Interactive Runner.
   Doing it with string comparison, we don't introduce new dependency into 
DataflowRunner.
   2. Deduce if the job is started from a notebook environment.
   We'll introduce [interactive] dependencies including at least ipython into 
DataflowRunner.
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported but not necessarily used.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339085)
Time Spent: 7h 40m  (was: 7.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339084
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:20
Start Date: 06/Nov/19 00:20
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clarify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   I think we have a little bit trade off here:
   1.  What we have here: Determining if the job is started from a pipeline 
that was originally bundled with an Interactive Runner.
   Doing it with string comparison, we don't introduce new dependency into 
DataflowRunner.
   2. Deduce if the job is started from a notebook environment.
   We'll introduce [interactive] dependencies including at least ipython into 
DataflowRunner.
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339084)
Time Spent: 7.5h  (was: 7h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339083
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:17
Start Date: 06/Nov/19 00:17
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342855200
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   Thanks!
   If we track `interactive` as a property of runner, we cannot implicitly pass 
along the property from runner to runner.
   
   And if we deduce `interactive` from the environment, we'll introduce new 
dependencies into DataflowRunner. See below comment.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339083)
Time Spent: 7h 20m  (was: 7h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339082
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:16
Start Date: 06/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clartify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   I think we have a little bit trade off here:
   1.  What we have here: Determining if the job is started from a pipeline 
that was originally bundled with an Interactive Runner.
   Doing it with string comparison, we don't introduce new dependency into 
DataflowRunner.
   2. Deduce if the job is started from a notebook environment.
   We'll introduce [interactive] dependencies including at least ipython into 
DataflowRunner.
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339082)
Time Spent: 7h 10m  (was: 7h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339081
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:14
Start Date: 06/Nov/19 00:14
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clartify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   I think we have a little bit trade off here:
   1.  What we have here: Determining if the job is started from a pipeline 
that was originally bundled with an Interactive Runner.
   Doing it with string comparison
   2. Deduce if the job is started from a notebook environment.
   We'll introduce [interactive] dependencies including ipython into 
DataflowRunner.
   
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339081)
Time Spent: 7h  (was: 6h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339080
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:12
Start Date: 06/Nov/19 00:12
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342855200
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   Thanks!
   If we track `interactive` as a property of runner, we cannot implicitly pass 
along the property from runner to runner.
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339080)
Time Spent: 6h 50m  (was: 6h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-8193) Python PostCommit Tests are Flaky

2019-11-05 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw reopened BEAM-8193:
---

> Python PostCommit Tests are Flaky
> -
>
> Key: BEAM-8193
> URL: https://issues.apache.org/jira/browse/BEAM-8193
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Major
> Fix For: Not applicable
>
>
> I will use this as an umbrella bug, create sub task and assign to folks 
> working on Python in general. I will try to do a round-robin assignment.
> The goal is to improve stability of Py2, 3.5, 3.6, 3.7 postcommits to be 
> working >80% of the time [1].
> /cc (you might get issues assigned) [~pabloem] [~udim] [~chamikara] 
> [~tvalentyn]
> [1] http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8416) ZipFileArtifactServiceTest.test_concurrent_requests flaky

2019-11-05 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw resolved BEAM-8416.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> ZipFileArtifactServiceTest.test_concurrent_requests flaky
> -
>
> Key: BEAM-8416
> URL: https://issues.apache.org/jira/browse/BEAM-8416
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> {code}
> Traceback (most recent call last):
>   File "/usr/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
> yield
>   File "/usr/lib/python3.7/unittest/case.py", line 615, in run
> testMethod()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
>  line 215, in test_concurrent_requests
> _ = list(pool.map(check, range(100)))
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in 
> result_iterator
> yield fs.pop().result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result
> return self.__get_result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in 
> __get_result
> raise self._exception
>   File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
> result = self.fn(*self.args, **self.kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
>  line 208, in check
> self._service, tokens[session(index)], name(index)))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
>  line 73, in retrieve_artifact
> name=name)))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
>  line 70, in 
> return b''.join(chunk.data for chunk in retrieval_service.GetArtifact(
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service.py",
>  line 133, in GetArtifact
> chunk = fin.read(self._chunk_size)
>   File "/usr/lib/python3.7/zipfile.py", line 899, in read
> data = self._read1(n)
>   File "/usr/lib/python3.7/zipfile.py", line 989, in _read1
> self._update_crc(data)
>   File "/usr/lib/python3.7/zipfile.py", line 917, in _update_crc
> raise BadZipFile("Bad CRC-32 for file %r" % self.name)
> zipfile.BadZipFile: Bad CRC-32 for file 
> '/3b2b55eb92de23535010b7ac80d553ec2d4bae872ac5606bc3042ce9313dff87/e1d492628cc0c1d0c1b736184f689be54fa03a996de918268ad834560e77305f'
> {code}
> and:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
> yield
>   File "/usr/lib/python3.7/unittest/case.py", line 615, in run
> testMethod()
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
>  line 215, in test_concurrent_requests
> _ = list(pool.map(check, range(100)))
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in 
> result_iterator
> yield fs.pop().result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result
> return self.__get_result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in 
> __get_result
> raise self._exception
>   File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
> result = self.fn(*self.args, **self.kwargs)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
>  line 208, in check
> self._service, tokens[session(index)], name(index)))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py",
>  line 73, in retrieve_artifact
> name=name)))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/por

[jira] [Resolved] (BEAM-8193) Python PostCommit Tests are Flaky

2019-11-05 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw resolved BEAM-8193.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Python PostCommit Tests are Flaky
> -
>
> Key: BEAM-8193
> URL: https://issues.apache.org/jira/browse/BEAM-8193
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Major
> Fix For: Not applicable
>
>
> I will use this as an umbrella bug, create sub task and assign to folks 
> working on Python in general. I will try to do a round-robin assignment.
> The goal is to improve stability of Py2, 3.5, 3.6, 3.7 postcommits to be 
> working >80% of the time [1].
> /cc (you might get issues assigned) [~pabloem] [~udim] [~chamikara] 
> [~tvalentyn]
> [1] http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339076
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:03
Start Date: 06/Nov/19 00:03
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r342859843
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java
 ##
 @@ -128,22 +128,13 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
 FieldAccessDescriptor.withFieldNames(fieldNames).resolve(getSchema());
 final Schema newSchema = SelectHelpers.getOutputSchema(getSchema(), 
resolved);
 
-TypedRead builder = getBigQueryReadBuilder(newSchema);
+TypedRead builder = 
getBigQueryReadBuilder(newSchema).withSelectedFields(fieldNames);
 
 Review comment:
   Will add `if (!fieldNames.isEmpty())` back later.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339076)
Time Spent: 2h 40m  (was: 2.5h)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339072
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342846522
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -136,7 +150,12 @@ def estimate_size(self):
 """
 raise NotImplementedError
 
-  def split(self, desired_bundle_size, start_position=None, 
stop_position=None):
+  def split(self,
+desired_bundle_size,  # type: int
+start_position=None,  # type: Optional[int]
 
 Review comment:
   These should be Optional[Any]. E.g. the start and end could be (str) keys. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339072)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339071
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342849862
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1275,6 +1306,7 @@ def defer_remainder(self, watermark=None):
 raise NotImplementedError
 
   def deferred_status(self):
+# type: () -> Optional[Tuple[restriction_trackers.OffsetRange, Timestamp]]
 
 Review comment:
   Optional[Tuple[Any, Timestamp]]
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339071)
Time Spent: 18h  (was: 17h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339074
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342850742
 
 

 ##
 File path: sdks/python/apache_beam/io/textio.py
 ##
 @@ -92,7 +93,7 @@ def __init__(self,
min_bundle_size,
compression_type,
strip_trailing_newlines,
-   coder,
+   coder,  # type: coders.Coder
 
 Review comment:
   Was there a reason to only type some of these arguments? Enough to make mypy 
happy? Only ones that were not obvious? (I'm fine with this, just trying to 
understand the reasoning.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339074)
Time Spent: 18h 20m  (was: 18h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339065
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342837849
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -74,53 +86,66 @@
   else:
 is_compiled = False
 fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1
+
 # pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports
 
 
 _TIME_SHIFT = 1 << 63
 MIN_TIMESTAMP_micros = MIN_TIMESTAMP.micros
 MAX_TIMESTAMP_micros = MAX_TIMESTAMP.micros
 
+IterableStateReader = Callable[[bytes, 'CoderImpl'], Iterable]
+IterableStateWriter = Callable[[Iterable, 'CoderImpl'], bytes]
+Observables = List[Tuple[observable.ObservableMixin, 'CoderImpl']]
 
 class CoderImpl(object):
   """For internal use only; no backwards-compatibility guarantees."""
 
   def encode_to_stream(self, value, stream, nested):
+# type: (Any, create_OutputStream, bool) -> None
 
 Review comment:
   Technically create_OutputStream is a constructor, not the type itself (due 
to some hacking around working in both compiled and non-compiled mode), but if 
this makes the type checker happy I guess that's OK. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339065)
Time Spent: 17.5h  (was: 17h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339068
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342842027
 
 

 ##
 File path: sdks/python/apache_beam/coders/coders.py
 ##
 @@ -248,7 +281,26 @@ def __ne__(self, other):
   def __hash__(self):
 return hash(type(self))
 
-  _known_urns = {}
+  _known_urns = {}  # type: Dict[str, Tuple[type, ConstructorFn]]
+
+  @classmethod
 
 Review comment:
   I'm not quite following what you're trying to declare here. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339068)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339067
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342840025
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -838,11 +911,12 @@ def encode_to_stream(self, value, out, nested):
 out.write_var_int64(0)
 
   def decode_from_stream(self, in_stream, nested):
+# type: (create_InputStream, bool) -> Sequence
 size = in_stream.read_bigendian_int32()
 
 if size >= 0:
   elements = [self._elem_coder.decode_from_stream(in_stream, True)
-  for _ in range(size)]
+  for _ in range(size)]  # type: Iterable[Any]
 
 Review comment:
   mypy doesn't know a list comprehension is at least an `Iterable[Any]`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339067)
Time Spent: 17h 40m  (was: 17.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339066
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342839770
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -724,10 +790,12 @@ def _construct_from_components(self, components):
 
 class _ConcatSequence(object):
   def __init__(self, head, tail):
+# type: (Iterable[Any], Iterable[Any]) -> None
 
 Review comment:
   Should we use some time variable here? I suppose for that to be useful 
_ConcatSequence would have to be a generic type. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339066)
Time Spent: 17h 40m  (was: 17.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339073
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342851318
 
 

 ##
 File path: sdks/python/apache_beam/options/value_provider.py
 ##
 @@ -23,6 +23,7 @@
 
 from builtins import object
 from functools import wraps
+from typing import Set
 
 Review comment:
   pylint ignore?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339073)
Time Spent: 18h 10m  (was: 18h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339070
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342838959
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -171,14 +199,17 @@ class StreamCoderImpl(CoderImpl):
   Subclass of CoderImpl implementing encode/decode using stream methods."""
 
   def encode(self, value):
+# type: (Any) -> bytes
 
 Review comment:
   Does mypy have a mode to assume inherited methods have the same type 
signature? I'm seeing a lot of DRY violation here...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339070)
Time Spent: 17h 50m  (was: 17h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339069
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:02
Start Date: 06/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r342846545
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -153,7 +172,11 @@ def split(self, desired_bundle_size, start_position=None, 
stop_position=None):
 """
 raise NotImplementedError
 
-  def get_range_tracker(self, start_position, stop_position):
+  def get_range_tracker(self,
+start_position,  # type: Optional[int]
 
 Review comment:
   Same.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339069)
Time Spent: 17h 50m  (was: 17h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339064
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:01
Start Date: 06/Nov/19 00:01
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clartify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339064)
Time Spent: 6h 40m  (was: 6.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339061
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:57
Start Date: 05/Nov/19 23:57
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clartify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339061)
Time Spent: 6.5h  (was: 6h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8123) Support CloudPickle as pickler for Apache Beam.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8123?focusedWorklogId=339057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339057
 ]

ASF GitHub Bot logged work on BEAM-8123:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:49
Start Date: 05/Nov/19 23:49
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8978: [BEAM-8123]  [DO 
NOT MERGE] POC: use cloudpickle for pickling.
URL: https://github.com/apache/beam/pull/8978#issuecomment-550075988
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339057)
Time Spent: 50m  (was: 40m)

> Support CloudPickle as pickler for Apache Beam.
> ---
>
> Key: BEAM-8123
> URL: https://issues.apache.org/jira/browse/BEAM-8123
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8123) Support CloudPickle as pickler for Apache Beam.

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8123?focusedWorklogId=339058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339058
 ]

ASF GitHub Bot logged work on BEAM-8123:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:49
Start Date: 05/Nov/19 23:49
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #8978: [BEAM-8123] 
 [DO NOT MERGE] POC: use cloudpickle for pickling.
URL: https://github.com/apache/beam/pull/8978
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339058)
Time Spent: 1h  (was: 50m)

> Support CloudPickle as pickler for Apache Beam.
> ---
>
> Key: BEAM-8123
> URL: https://issues.apache.org/jira/browse/BEAM-8123
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339054
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:48
Start Date: 05/Nov/19 23:48
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342855715
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
 
 Review comment:
   IIRC, we want to allow the user to switch to `DataflowRunner` using the 
`p.run()` pattern instead of limiting the user to `Runner().run_pipeline(p, 
options)`.
   
   Do you think we should put it into a separate PR, or simply not supporting 
it at all?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339054)
Time Spent: 6h 20m  (was: 6h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8562) source packages don't include gradlew

2019-11-05 Thread Robert Lugg (Jira)
Robert Lugg created BEAM-8562:
-

 Summary: source packages don't include gradlew
 Key: BEAM-8562
 URL: https://issues.apache.org/jira/browse/BEAM-8562
 Project: Beam
  Issue Type: Bug
  Components: build-system
Affects Versions: 2.16.0
 Environment: linux
Reporter: Robert Lugg


I want to run using Flink.  Procedure says source code is required, but gradlew 
isn't there
h3. Executing a Beam pipeline on a Flink Cluster

As of now you will need a copy of Apache Beam’s source code. You can download 
it on the [Downloads page|https://beam.apache.org/get-started/downloads/].

Following a couple of links, I get to:

[http://www.apache.org/dyn/closer.cgi/beam/2.16.0/apache-beam-2.16.0-source-release.zip]

But I don't see any gradlew wrappers where they should be.  If I grab current 
using 'git clone', then the files are there.

Either the instructions should be modified or the source releases should 
contain all the tools.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339053
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:46
Start Date: 05/Nov/19 23:46
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342855200
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   Thanks! I'll go with your suggestion below to deduce if the job is started 
from a notebook environment instead of determining if the job is started from a 
pipeline that was originally bundled with an Interactive Runner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339053)
Time Spent: 6h 10m  (was: 6h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-11-05 Thread Chris Larsen (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Larsen updated BEAM-8561:
---
Summary: Add ThriftIO to Support IO for Thrift Files  (was: Add ThriftIO to 
Support for )

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Priority: Minor
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8561) Add ThriftIO to Support for

2019-11-05 Thread Chris Larsen (Jira)
Chris Larsen created BEAM-8561:
--

 Summary: Add ThriftIO to Support for 
 Key: BEAM-8561
 URL: https://issues.apache.org/jira/browse/BEAM-8561
 Project: Beam
  Issue Type: New Feature
  Components: io-java-files
Reporter: Chris Larsen


Similar to AvroIO it would be very useful to support reading and writing 
to/from Thrift files with a native connector. 

Functionality would include:
 # read() - Reading from one or more Thrift files.
 # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=339049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339049
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:24
Start Date: 05/Nov/19 23:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #: [BEAM-8294] run Spark 
portable validates runner tests in parallel
URL: https://github.com/apache/beam/pull/#issuecomment-550030660
 
 
   The tests are [failing](https://issues.apache.org/jira/browse/BEAM-8556), 
which as improvement over timing out and being aborted.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339049)
Time Spent: 1.5h  (was: 1h 20m)

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339038
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:56
Start Date: 05/Nov/19 22:56
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r342840896
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -123,7 +129,12 @@ public void onMatch(RelOptRuleCall call) {
 // 1. Calc only does projects and renames.
 //And
 // 2. Predicate can be completely pushed-down to IO level.
-if (isProjectRenameOnlyProgram(program) && 
tableFilter.getNotSupported().isEmpty()) {
+//And
+// 3. And IO supports project push-down OR all fields are projected by a 
Calc.
+if (isProjectRenameOnlyProgram(program)
+&& tableFilter.getNotSupported().isEmpty()
+&& (beamSqlTable.supportsProjects()
+|| calc.getRowType().getFieldCount() == 
calcInputRowType.getFieldCount())) {
 
 Review comment:
   Moved this check into a separate method to keep things readable.
   Updated the check to compare a list of projected field names to the list 
passed to a `Calc` as input.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339038)
Time Spent: 2.5h  (was: 2h 20m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339037
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:53
Start Date: 05/Nov/19 22:53
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r342839842
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -124,11 +126,20 @@ public void onMatch(RelOptRuleCall call) {
 RelDataType calcInputType =
 CalciteUtils.toCalciteRowType(newSchema, 
ioSourceRel.getCluster().getTypeFactory());
 
+// TODO: Check if an IO supports field reordering and drop a Calc when it 
does (1).
 // Check if the calc can be dropped:
-// 1. Calc only does projects and renames.
+// 1. Calc only does projects and renames of fields in the same order.
 //And
 // 2. Predicate can be completely pushed-down to IO level.
-if (isProjectRenameOnlyProgram(program) && 
tableFilter.getNotSupported().isEmpty()) {
+//And
+// 3. And IO supports project push-down OR all fields are projected by a 
Calc.
+if (isProjectRenameOnlyProgram(program, beamSqlTable.supportsProjects())
 
 Review comment:
   This should be resolved now, IOs can specify whether they support field 
reordering. When reordering in not supported and fields are projected in a 
different order (from Schema) - Calc should not be dropped.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339037)
Time Spent: 2h 20m  (was: 2h 10m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339031
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:32
Start Date: 05/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342831745
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
 
 Review comment:
   Why are we adding runner and options parameters here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339031)
Time Spent: 5h 50m  (was: 5h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339030&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339030
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:32
Start Date: 05/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342830295
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   I believe one more suggestion from the previous PR was to not keepting track 
of interactivity as part of the Pipeline but making it a property of the runner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339030)
Time Spent: 5h 40m  (was: 5.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8460) Portable Flink runner fails UsesStrictTimerOrdering category tests

2019-11-05 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver reassigned BEAM-8460:
-

Assignee: Kyle Weaver

> Portable Flink runner fails UsesStrictTimerOrdering category tests
> --
>
> Key: BEAM-8460
> URL: https://issues.apache.org/jira/browse/BEAM-8460
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.17.0
>Reporter: Jan Lukavský
>Assignee: Kyle Weaver
>Priority: Major
>
> BEAM-7520 introduced new set of validatesRunner tests that test that timers 
> are fired exactly in order of increasing timestamp. Portable Flink runner 
> fails these added tests (are currently ignored).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339032
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:32
Start Date: 05/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342832459
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   The change could be limited to:
   - here detect, whether we are in interactive environment or not. (For 
example, check whether certain imports are loaded?) Could also be a utility 
method somewhere, to check whether this in an interactive method.
   - If yes, add the labels.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339032)
Time Spent: 6h  (was: 5h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339029&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339029
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:25
Start Date: 05/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9056: [BEAM-7746] Add 
python type hints
URL: https://github.com/apache/beam/pull/9056#issuecomment-550050927
 
 
   Hello reviewers,
   Thanks for the feedback so far!  There are a few questions that I've 
responded to that are now in your court.  I'd love to get these addressed soon 
so that I can minimize the merge conflicts that are piling up. 
   
   Once I have those answers your feedback will be integrated into the smaller 
"part 1" PR here: https://github.com/apache/beam/pull/9915
   
   thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339029)
Time Spent: 17h 20m  (was: 17h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=339011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339011
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 21:33
Start Date: 05/Nov/19 21:33
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #: [BEAM-8294] run Spark 
portable validates runner tests in parallel
URL: https://github.com/apache/beam/pull/#issuecomment-550030660
 
 
   The tests are failing, which as improvement over timing out and being 
aborted.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339011)
Time Spent: 1h 20m  (was: 1h 10m)

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8452) TriggerLoadJobs.process in bigquery_file_loads schema is type str

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8452?focusedWorklogId=339006&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339006
 ]

ASF GitHub Bot logged work on BEAM-8452:


Author: ASF GitHub Bot
Created on: 05/Nov/19 21:27
Start Date: 05/Nov/19 21:27
Worklog Time Spent: 10m 
  Work Description: noah-goodrich commented on pull request #1: 
BEAM-8452 - TriggerLoadJobs.process in bigquery_file_loads schema is type str
URL: https://github.com/apache/beam/pull/1
 
 
   When trying to run a BigQueryFileLoads job, I receive the following error:
   
   ```
   ValidationError: Expected type 
 for field schema, found {"fields": [{"name": "id", "type": "INTEGER", "mode": 
"required"}, {"name": "description", "type"
   : "STRING", "mode": "nullable"}]} (type )
   ```
   R: @chamikaramj
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ X] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [X ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ X] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=338988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338988
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 05/Nov/19 20:37
Start Date: 05/Nov/19 20:37
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #9960: [BEAM-3288] 
Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#discussion_r342784933
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java
 ##
 @@ -162,6 +164,45 @@ public static void applicableTo(PCollection input) {
   throw new IllegalStateException(
   "GroupByKey must have a valid Window merge function.  " + "Invalid 
because: " + cause);
 }
+
+// Validate that the trigger does not finish before garbage collection time
+if (!triggerIsSafe(windowingStrategy)) {
 
 Review comment:
   Correction - stateful ParDo doesn't use triggers. But Combine might still be 
affected by this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338988)
Time Spent: 2h 10m  (was: 2h)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> 

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338987
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 05/Nov/19 20:35
Start Date: 05/Nov/19 20:35
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #9987: [BEAM-8504] Fix a 
bug related to zero-row responses
URL: https://github.com/apache/beam/pull/9987#issuecomment-550008537
 
 
   Thanks. It looks good!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338987)
Time Spent: 1h 40m  (was: 1.5h)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-05 Thread Gleb Kanterov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967844#comment-16967844
 ] 

Gleb Kanterov commented on BEAM-8504:
-

[~aryann] thanks!

[~Ardagan] the change looks very self-contained, given that it is a regression, 
is there any chance we can have it as a part of 2.17.0?

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338984
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 20:26
Start Date: 05/Nov/19 20:26
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #: [BEAM-8294] run Spark 
portable validates runner tests in parallel
URL: https://github.com/apache/beam/pull/#issuecomment-550005046
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338984)
Time Spent: 1h 10m  (was: 1h)

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338983
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 20:22
Start Date: 05/Nov/19 20:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #: [BEAM-8294] run Spark 
portable validates runner tests in parallel
URL: https://github.com/apache/beam/pull/#issuecomment-550003251
 
 
   Run Java Spark PortableValidatesRunner Batch
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338983)
Time Spent: 1h  (was: 50m)

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338982
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 20:21
Start Date: 05/Nov/19 20:21
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #: [BEAM-8294] run 
Spark portable validates runner tests in parallel
URL: https://github.com/apache/beam/pull/
 
 
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=338978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338978
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 05/Nov/19 20:08
Start Date: 05/Nov/19 20:08
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998
 
 
   This is just a very basic smoke test to ensure that FlinkRunner works as 
intended with the default settings. This does _not_ test behavior with an 
actual Flink cluster.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.or

[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=338958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338958
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 05/Nov/19 19:36
Start Date: 05/Nov/19 19:36
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-549623033
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338958)
Time Spent: 1h 40m  (was: 1.5h)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=338957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338957
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 05/Nov/19 19:36
Start Date: 05/Nov/19 19:36
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-549984848
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338957)
Time Spent: 1.5h  (was: 1h 20m)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >