[jira] [Resolved] (BEAM-8558) BigQueryIOIT Jenkins job flakes
[ https://issues.apache.org/jira/browse/BEAM-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michal Walenia resolved BEAM-8558. -- Fix Version/s: 2.17.0 Resolution: Fixed > BigQueryIOIT Jenkins job flakes > --- > > Key: BEAM-8558 > URL: https://issues.apache.org/jira/browse/BEAM-8558 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Minor > Fix For: 2.17.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Java BigQueryIOIT fails with exception: > > {code:java} > org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT > testWriteThenRead > FAILED java.lang.IllegalStateException: BigQuery table is not empty: > apache-beam-testing:beam_performance.bqio_write_10GB_java. at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:588) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.verifyTableNotExistOrEmpty(BigQueryHelpers.java:511) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.validate(BigQueryIO.java:2246) > at > org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:643) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:653) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317) > at > org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251) > at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) at > org.apache.beam.sdk.Pipeline.validate(Pipeline.java:579) at > org.apache.beam.sdk.Pipeline.run(Pipeline.java:314) at > org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) at > org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT.testWrite(BigQueryIOIT.java:146) > at > org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT.testWriteThenRead(BigQueryIOIT.java:116) > {code} > The fix is to append unique UUID to the table names in the test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8564) Add LZO compression and decompression support for java sdk
Amogh Tiwari created BEAM-8564: -- Summary: Add LZO compression and decompression support for java sdk Key: BEAM-8564 URL: https://issues.apache.org/jira/browse/BEAM-8564 Project: Beam Issue Type: New Feature Components: sdk-java-core Reporter: Amogh Tiwari LZO is a lossless data compression algorithm which is focused on compression and decompression speeds. This will enable Apache Beam sdk to compress/decompress files using LZO compression. This will include the following functionalities: # compress() : for compressing files into an LZO archive # decompress() : for decompressing files archived using LZO compression Appropriate Input and Output stream will also be added to enable working with LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=339198&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339198 ] ASF GitHub Bot logged work on BEAM-8442: Author: ASF GitHub Bot Created on: 06/Nov/19 07:08 Start Date: 06/Nov/19 07:08 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10004: [BEAM-8442] Unify bundle register in Python SDK harness URL: https://github.com/apache/beam/pull/10004#issuecomment-550176508 R: @mxm @ibzib @aaltay @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339198) Time Spent: 2h 20m (was: 2h 10m) > Unify bundle register in Python SDK harness > --- > > Key: BEAM-8442 > URL: https://issues.apache.org/jira/browse/BEAM-8442 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > There are two methods for bundle register in Python SDK harness: > `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339186 ] ASF GitHub Bot logged work on BEAM-8557: Author: ASF GitHub Bot Created on: 06/Nov/19 06:18 Start Date: 06/Nov/19 06:18 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #9991: [BEAM-8557] Cleanup the useless null check for ResponseStreamObserver… URL: https://github.com/apache/beam/pull/9991#discussion_r342931236 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java ## @@ -100,7 +99,8 @@ public void testRequestError() throws Exception { } @Test - public void testUnknownResponseIgnored() throws Exception { + public void testUnknownResponse() throws Exception { +thrown.expect(NullPointerException.class); Review comment: I agree that meaningful message is better than NPE. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339186) Time Spent: 1h 10m (was: 1h) > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > Please correct me if there anything I misunderstand @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339184 ] ASF GitHub Bot logged work on BEAM-8557: Author: ASF GitHub Bot Created on: 06/Nov/19 06:17 Start Date: 06/Nov/19 06:17 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #9991: [BEAM-8557] Cleanup the useless null check for ResponseStreamObserver… URL: https://github.com/apache/beam/pull/9991#discussion_r342931236 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java ## @@ -100,7 +99,8 @@ public void testRequestError() throws Exception { } @Test - public void testUnknownResponseIgnored() throws Exception { + public void testUnknownResponse() throws Exception { +thrown.expect(NullPointerException.class); Review comment: I agree that meaningful message is better than NPE. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339184) Time Spent: 50m (was: 40m) > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > Please correct me if there anything I misunderstand @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339183 ] ASF GitHub Bot logged work on BEAM-8557: Author: ASF GitHub Bot Created on: 06/Nov/19 06:17 Start Date: 06/Nov/19 06:17 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #9991: [BEAM-8557] Cleanup the useless null check for ResponseStreamObserver… URL: https://github.com/apache/beam/pull/9991#discussion_r342931221 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java ## @@ -100,7 +99,8 @@ public void testRequestError() throws Exception { } @Test - public void testUnknownResponseIgnored() throws Exception { + public void testUnknownResponse() throws Exception { +thrown.expect(NullPointerException.class); Review comment: I agree that meaningful message is better than NPE. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339183) Time Spent: 40m (was: 0.5h) > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > Please correct me if there anything I misunderstand @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339185&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339185 ] ASF GitHub Bot logged work on BEAM-8557: Author: ASF GitHub Bot Created on: 06/Nov/19 06:17 Start Date: 06/Nov/19 06:17 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #9991: [BEAM-8557] Cleanup the useless null check for ResponseStreamObserver… URL: https://github.com/apache/beam/pull/9991#discussion_r342931269 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java ## @@ -148,16 +148,14 @@ public void onNext(BeamFnApi.InstructionResponse response) { LOG.debug("Received InstructionResponse {}", response); CompletableFuture responseFuture = outstandingRequests.remove(response.getInstructionId()); - if (responseFuture != null) { -if (response.getError().isEmpty()) { - responseFuture.complete(response); -} else { - responseFuture.completeExceptionally( - new RuntimeException( - String.format( - "Error received from SDK harness for instruction %s: %s", - response.getInstructionId(), response.getError(; -} + if (response.getError().isEmpty()) { Review comment: Thanks for the quick reply! Is there some logic for the resent the message? I have not find it, so I think in this case there is no duplicate message. Please correct me if there anything I misunderstand :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339185) Time Spent: 1h (was: 50m) > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > Please correct me if there anything I misunderstand @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339175 ] ASF GitHub Bot logged work on BEAM-8557: Author: ASF GitHub Bot Created on: 06/Nov/19 06:03 Start Date: 06/Nov/19 06:03 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9991: [BEAM-8557] Cleanup the useless null check for ResponseStreamObserver… URL: https://github.com/apache/beam/pull/9991#discussion_r342927908 ## File path: runners/java-fn-execution/src/test/java/org/apache/beam/runners/fnexecution/control/FnApiControlClientTest.java ## @@ -100,7 +99,8 @@ public void testRequestError() throws Exception { } @Test - public void testUnknownResponseIgnored() throws Exception { + public void testUnknownResponse() throws Exception { +thrown.expect(NullPointerException.class); Review comment: My opinion: an NPE always indicates a programming error. Null should not exist anymore in high-level programming. Or in most medium-level or low-level programming. The code should prevent an NPE and give a good error message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339175) Time Spent: 20m (was: 10m) > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > Please correct me if there anything I misunderstand @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?focusedWorklogId=339176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339176 ] ASF GitHub Bot logged work on BEAM-8557: Author: ASF GitHub Bot Created on: 06/Nov/19 06:03 Start Date: 06/Nov/19 06:03 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9991: [BEAM-8557] Cleanup the useless null check for ResponseStreamObserver… URL: https://github.com/apache/beam/pull/9991#discussion_r342928301 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java ## @@ -148,16 +148,14 @@ public void onNext(BeamFnApi.InstructionResponse response) { LOG.debug("Received InstructionResponse {}", response); CompletableFuture responseFuture = outstandingRequests.remove(response.getInstructionId()); - if (responseFuture != null) { -if (response.getError().isEmpty()) { - responseFuture.complete(response); -} else { - responseFuture.completeExceptionally( - new RuntimeException( - String.format( - "Error received from SDK harness for instruction %s: %s", - response.getInstructionId(), response.getError(; -} + if (response.getError().isEmpty()) { Review comment: I do not like this change, because of my comment below about NPE. But also, I do not know the code well enough. Is there a distributed systems question? Can a duplicate message be received that should be recieved? If the answer is "yes" then the previous code is correct. If the answer is "no" then the previous code should crash or at least log an error. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339176) Time Spent: 0.5h (was: 20m) > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > Please correct me if there anything I misunderstand @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-8557: -- Description: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: {code:java} @Test public void testUnknownResponseIgnored() throws Exception{code} I do not know why we need test this case? I think it would be better if we throw the Exception for an UnknownResponse, otherwise, this may hidden a potential bug. Please correct me if there anything I misunderstand @kennknowles was: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: {code:java} @Test public void testUnknownResponseIgnored() throws Exception{code} I do not know why we need test this case? I think it would be better if we throw the Exception for an UnknownResponse, otherwise, this may hidden a potential bug. What do you think? @kennknowles > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > Please correct me if there anything I misunderstand @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3493) Prevent users from "implementing" PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968103#comment-16968103 ] Kenneth Knowles commented on BEAM-3493: --- This bug was filed only 1 year ago, but that line seems to be 3 years old. Can you confirm that an implementation of PipelineOptions fails? This should actually be a dynamic check, so maybe it can be in a test. > Prevent users from "implementing" PipelineOptions > - > > Key: BEAM-3493 > URL: https://issues.apache.org/jira/browse/BEAM-3493 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Jing Chen >Priority: Minor > Labels: newbie, starter > > I've seen a user implement \{{PipelineOptions}}. This implies that it is > backwards-incompatible to add new options, which is of course not our intent. > We should at least document very loudly that it is not to be implemented, and > preferably have some automation that will fail on load if they have > implemented it. Ideas? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-3493) Prevent users from "implementing" PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-3493: - Assignee: Jing Chen (was: Luke Cwik) > Prevent users from "implementing" PipelineOptions > - > > Key: BEAM-3493 > URL: https://issues.apache.org/jira/browse/BEAM-3493 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Jing Chen >Priority: Minor > Labels: newbie, starter > > I've seen a user implement \{{PipelineOptions}}. This implies that it is > backwards-incompatible to add new options, which is of course not our intent. > We should at least document very loudly that it is not to be implemented, and > preferably have some automation that will fail on load if they have > implemented it. Ideas? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-3493) Prevent users from "implementing" PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-3493: - Assignee: Luke Cwik (was: Jing Chen) > Prevent users from "implementing" PipelineOptions > - > > Key: BEAM-3493 > URL: https://issues.apache.org/jira/browse/BEAM-3493 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Luke Cwik >Priority: Minor > Labels: newbie, starter > > I've seen a user implement \{{PipelineOptions}}. This implies that it is > backwards-incompatible to add new options, which is of course not our intent. > We should at least document very loudly that it is not to be implemented, and > preferably have some automation that will fail on load if they have > implemented it. Ideas? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-8557: -- Description: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: {code:java} @Test public void testUnknownResponseIgnored() throws Exception{code} I do not know why we need test this case? I think it would be better if we throw the Exception for an UnknownResponse, otherwise, this may hidden a potential bug. What do you think? @kennknowles was: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: {code:java} @Test public void testUnknownResponseIgnored() throws Exception{code} I am do not know why we need test this case? I think it would be better if we throw the Exception for an UnknownResponse, otherwise, this may hidden a potential bug. What do you think? @kennknowles > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > What do you think? @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-8557: -- Description: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: {code:java} @Test public void testUnknownResponseIgnored() throws Exception{code} I am do not know why we need test this case? I think it would be better if we throw the Exception for an UnknownResponse, otherwise, this may hidden a potential bug. What do you think? @kennknowles was: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: {code:java} @Test public void testUnknownResponseIgnored() throws Exception{code} I am do not know why we need test this case? @kennknowles What do you think? > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I am do not know why we need test this case? I think it would be better if we > throw the Exception for an UnknownResponse, otherwise, this may hidden a > potential bug. > What do you think? @kennknowles > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3493) Prevent users from "implementing" PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968095#comment-16968095 ] Jing Chen commented on BEAM-3493: - The issue should have been fixed by [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsValidator.java#L69] and L70. I am about to close the ticket once either [~kenn] or [~lcwik] could confirm it. Thanks Jing > Prevent users from "implementing" PipelineOptions > - > > Key: BEAM-3493 > URL: https://issues.apache.org/jira/browse/BEAM-3493 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Jing Chen >Priority: Minor > Labels: newbie, starter > > I've seen a user implement \{{PipelineOptions}}. This implies that it is > backwards-incompatible to add new options, which is of course not our intent. > We should at least document very loudly that it is not to be implemented, and > preferably have some automation that will fail on load if they have > implemented it. Ideas? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8563) View.asMap() should reject multiply-firing triggers
Kenneth Knowles created BEAM-8563: - Summary: View.asMap() should reject multiply-firing triggers Key: BEAM-8563 URL: https://issues.apache.org/jira/browse/BEAM-8563 Project: Beam Issue Type: Bug Components: beam-model, sdk-java-core Reporter: Kenneth Knowles Assignee: Kenneth Knowles View.asMap() will always cause a failure with multiple firings, because a key will have duplicate elements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-8557: -- Description: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: {code:java} @Test public void testUnknownResponseIgnored() throws Exception{code} I am do not know why we need test this case? @kennknowles What do you think? was: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: ``` @Test public void testUnknownResponseIgnored() throws Exception { String id = "actualInstruction"; String unknownId = "unknownInstruction"; CompletionStage responseFuture = client.handle(BeamFnApi.InstructionRequest.newBuilder().setInstructionId(id).build()); client .asResponseObserver() .onNext(BeamFnApi.InstructionResponse.newBuilder().setInstructionId(unknownId).build()); assertThat(MoreFutures.isDone(responseFuture), is(false)); assertThat(MoreFutures.isCancelled(responseFuture), is(false)); } ``` I am do not know why we need test this case? @kennknowles What do you think? > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > {code:java} > @Test > public void testUnknownResponseIgnored() throws Exception{code} > I am do not know why we need test this case? @kennknowles > What do you think? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8557) Clean up useless null check.
[ https://issues.apache.org/jira/browse/BEAM-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-8557: -- Description: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. I found the test as follows: ``` @Test public void testUnknownResponseIgnored() throws Exception { String id = "actualInstruction"; String unknownId = "unknownInstruction"; CompletionStage responseFuture = client.handle(BeamFnApi.InstructionRequest.newBuilder().setInstructionId(id).build()); client .asResponseObserver() .onNext(BeamFnApi.InstructionResponse.newBuilder().setInstructionId(unknownId).build()); assertThat(MoreFutures.isDone(responseFuture), is(false)); assertThat(MoreFutures.isCancelled(responseFuture), is(false)); } ``` I am do not know why we need test this case? @kennknowles What do you think? was: I think we do not need null check here: [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] Because before the the `onNext` call, the `Future` already put into the queue in `handle` method. What do you think? > Clean up useless null check. > > > Key: BEAM-8557 > URL: https://issues.apache.org/jira/browse/BEAM-8557 > Project: Beam > Issue Type: Sub-task > Components: runner-core, sdk-java-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I think we do not need null check here: > [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L151] > Because before the the `onNext` call, the `Future` already put into the queue > in `handle` method. > > I found the test as follows: > ``` > @Test > public void testUnknownResponseIgnored() throws Exception { > String id = "actualInstruction"; > String unknownId = "unknownInstruction"; > CompletionStage responseFuture = > > client.handle(BeamFnApi.InstructionRequest.newBuilder().setInstructionId(id).build()); > client > .asResponseObserver() > > .onNext(BeamFnApi.InstructionResponse.newBuilder().setInstructionId(unknownId).build()); > assertThat(MoreFutures.isDone(responseFuture), is(false)); > assertThat(MoreFutures.isCancelled(responseFuture), is(false)); > } > ``` > I am do not know why we need test this case? @kennknowles > What do you think? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-6007) Create ClassLoadingStrategy with Java 11 compatible way
[ https://issues.apache.org/jira/browse/BEAM-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968080#comment-16968080 ] Kenneth Knowles commented on BEAM-6007: --- [~reuvenlax] is this the same bug we've been chatting about? > Create ClassLoadingStrategy with Java 11 compatible way > --- > > Key: BEAM-6007 > URL: https://issues.apache.org/jira/browse/BEAM-6007 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Keisuke Kondo >Assignee: Keisuke Kondo >Priority: Minor > > Since sun.misc.Unsafe API was deprecated in Java 11, > > {code:java} > ClassLoadingStrategy.Default.INJECTION{code} > is not work with Java 11. > Author of byte-buddy library shared way to make it compatible with Java 11 on > his blog post. (and way to keep compatibility with java 9 and prior version) > [http://mydailyjava.blogspot.com/2018/04/jdk-11-and-proxies-in-world-past.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=339138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339138 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 06/Nov/19 04:16 Start Date: 06/Nov/19 04:16 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9965: [BEAM-8539] Make job state transitions well-defined and consistent URL: https://github.com/apache/beam/pull/9965#issuecomment-550135736 R: @lukecwik R: @mxm R: @robertwb This is a followup to https://github.com/apache/beam/pull/9969 that corrects python-based runners to behave the same as Java-based runners wrt STOPPED: it is the initial state and not a terminal state. Tests pass, but I'm not sure if this could cause problems with Dataflow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339138) Time Spent: 3h 10m (was: 3h) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8533) test_java_expansion_portable_runner failing
[ https://issues.apache.org/jira/browse/BEAM-8533?focusedWorklogId=339135&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339135 ] ASF GitHub Bot logged work on BEAM-8533: Author: ASF GitHub Bot Created on: 06/Nov/19 03:48 Start Date: 06/Nov/19 03:48 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9956: [BEAM-8533] Revert "[BEAM-8442] Remove duplicate code for bundle register in Pyth… URL: https://github.com/apache/beam/pull/9956#issuecomment-550130482 @mxm @ibzib Thanks a lot for your analysis. Really sorry that my commit breaks the post commit tests. I have investigated this issue and submitted a new PR #10004 which addressed this issue. It includes the following changes: - Update the client side implementation of Fn API to make sure registration is executed successfully before executing process_bundle. Personally I think that the client side implementation of Fn API should not assume that the implementation of registration in SDK harness is synchronous. The registration API itself is asynchronous and the client side implementation should check if the registration succeed. This will decouple the implementation dependency between the runner and the SDK harness. - Remove duplicate registration logic in the Python SDK harness. As you know, it changed the registration implementation of Python SDK harness to asynchronous. Looking forward to your feedback. :) Best, Jincheng This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339135) Time Spent: 40m (was: 0.5h) > test_java_expansion_portable_runner failing > --- > > Key: BEAM-8533 > URL: https://issues.apache.org/jira/browse/BEAM-8533 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > probable cause: > https://github.com/apache/beam/pull/9842#issuecomment-548496295 > 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction 72: Traceback (most recent > call last):11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 173, in _execute*11:13:37* response = task()11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 196, in 11:13:37 self._execute(lambda: > worker.do_instruction(work), work)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 378, in process_bundle*11:13:37* instruction_id, > request.process_bundle_descriptor_id)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: > u'1-37' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=339134&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339134 ] ASF GitHub Bot logged work on BEAM-8442: Author: ASF GitHub Bot Created on: 06/Nov/19 03:46 Start Date: 06/Nov/19 03:46 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9842: [BEAM-8442] Remove duplicate code for bundle register in Python SDK harness URL: https://github.com/apache/beam/pull/9842#issuecomment-550130122 @mxm @ibzib @lukecwik @aaltay Thanks a lot for your analysis. Really sorry that my commit breaks the post commit tests. I have investigated this issue and submitted a new PR #10004 which addressed this issue. It includes the following changes: 1) Update the client side implementation of Fn API to make sure registration is executed successfully before executing process_bundle. Personally I think that the client side implementation of Fn API should not assume that the implementation of registration in SDK harness is synchronous. The registration API itself is asynchronous and the client side implementation should check if the registration succeed. This will decouple the implementation dependency between the runner and the SDK harness. 2) Remove duplicate registration logic in the Python SDK harness. As you know, it changed the registration implementation of Python SDK harness to asynchronous. Looking forward to your feedback. :) Best, Jincheng This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339134) Time Spent: 2h 10m (was: 2h) > Unify bundle register in Python SDK harness > --- > > Key: BEAM-8442 > URL: https://issues.apache.org/jira/browse/BEAM-8442 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > There are two methods for bundle register in Python SDK harness: > `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=339133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339133 ] ASF GitHub Bot logged work on BEAM-8442: Author: ASF GitHub Bot Created on: 06/Nov/19 03:41 Start Date: 06/Nov/19 03:41 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #10004: [BEAM-8442] Unify bundle register in Python SDK harness URL: https://github.com/apache/beam/pull/10004 Motivation - Cleanup the code of `sdk_worker.py` on register process boundle descriptor. - Correct the implementation of Fn API, i.e., would be better to cater to the asynchronous design principles of the Beam API. Changes - 6868333: Update the client side implemetation of Fn API to make sure registration is executed sucessfully before executing process_bundle. Personally I think that the client side implemetation of Fn API should not assume that the implementation of registration in SDK harness is synchourous. The registration API itself is asynchronous and the client side implementation should check if the registration succeed. This will decouple the implementation dependency between the runner and the SDK harness. - 79d21b1: Remove duplicate registration logic in the Python SDK harness. It changed the registration implementation of Python SDK harness to asynchronous. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://b
[jira] [Updated] (BEAM-8442) Unify bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated BEAM-8442: -- Summary: Unify bundle register in Python SDK harness (was: Unfiy bundle register in Python SDK harness) > Unify bundle register in Python SDK harness > --- > > Key: BEAM-8442 > URL: https://issues.apache.org/jira/browse/BEAM-8442 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > There are two methods for bundle register in Python SDK harness: > `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8500) PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is incorrect
[ https://issues.apache.org/jira/browse/BEAM-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yoshiki obata closed BEAM-8500. --- > PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is > incorrect > --- > > Key: BEAM-8500 > URL: https://issues.apache.org/jira/browse/BEAM-8500 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: yoshiki obata >Assignee: yoshiki obata >Priority: Trivial > Labels: easyfix, starter > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in > ./test-infra/jenkins/README.md is *Run Website PreCommit* [1] > But correct phrase is *Run Website_Stage_GCS PreCommit* > [1] https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8500) PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is incorrect
[ https://issues.apache.org/jira/browse/BEAM-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yoshiki obata resolved BEAM-8500. - Fix Version/s: Not applicable Resolution: Resolved > PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is > incorrect > --- > > Key: BEAM-8500 > URL: https://issues.apache.org/jira/browse/BEAM-8500 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: yoshiki obata >Assignee: yoshiki obata >Priority: Trivial > Labels: easyfix, starter > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in > ./test-infra/jenkins/README.md is *Run Website PreCommit* [1] > But correct phrase is *Run Website_Stage_GCS PreCommit* > [1] https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=339125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339125 ] ASF GitHub Bot logged work on BEAM-5878: Author: ASF GitHub Bot Created on: 06/Nov/19 02:19 Start Date: 06/Nov/19 02:19 Worklog Time Spent: 10m Work Description: lazylynx commented on pull request #9686: [WIP][BEAM-5878] update dill min version to 0.3.1.1 and add test for functions with Keyword-only arguments URL: https://github.com/apache/beam/pull/9686#discussion_r342890040 ## File path: sdks/python/setup.py ## @@ -106,8 +106,7 @@ def get_version(): 'avro>=1.8.1,<2.0.0; python_version < "3.0"', 'avro-python3>=1.8.1,<2.0.0; python_version >= "3.0"', 'crcmod>=1.7,<2.0', -# Dill doesn't guarantee comatibility between releases within minor version. -'dill>=0.3.0,<0.3.1', +'dill>=0.3.1.1,<0.4.0', Review comment: Updated. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339125) Time Spent: 16h (was: 15h 50m) > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Time Spent: 16h > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=339122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339122 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 06/Nov/19 02:03 Start Date: 06/Nov/19 02:03 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9965: WIP: [BEAM-8539] Make job state transitions well-defined and consistent URL: https://github.com/apache/beam/pull/9965#issuecomment-550107727 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339122) Time Spent: 3h (was: 2h 50m) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339121 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 02:00 Start Date: 06/Nov/19 02:00 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342886212 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -171,14 +199,17 @@ class StreamCoderImpl(CoderImpl): Subclass of CoderImpl implementing encode/decode using stream methods.""" def encode(self, value): +# type: (Any) -> bytes Review comment: No, but there's an issue for it, and an open pull request: https://github.com/python/mypy/issues/3903 https://github.com/python/mypy/pull/7548 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339121) Time Spent: 20h (was: 19h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 20h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=339120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339120 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 06/Nov/19 01:56 Start Date: 06/Nov/19 01:56 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#issuecomment-550106200 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339120) Time Spent: 15h 10m (was: 15h) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 15h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339119 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 01:53 Start Date: 06/Nov/19 01:53 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342884801 ## File path: sdks/python/apache_beam/io/iobase.py ## @@ -136,7 +150,12 @@ def estimate_size(self): """ raise NotImplementedError - def split(self, desired_bundle_size, start_position=None, stop_position=None): + def split(self, +desired_bundle_size, # type: int +start_position=None, # type: Optional[int] Review comment: got it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339119) Time Spent: 19h 50m (was: 19h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 19h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339117 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 01:53 Start Date: 06/Nov/19 01:53 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342884673 ## File path: sdks/python/apache_beam/coders/coders.py ## @@ -248,7 +281,26 @@ def __ne__(self, other): def __hash__(self): return hash(type(self)) - _known_urns = {} + _known_urns = {} # type: Dict[str, Tuple[type, ConstructorFn]] + + @classmethod Review comment: I'm confused what you are confused about :) It's a dictionary of str (URN) to tuple of payload type and constructor function (type alias defined a the top of the module). Or are you talking about the `@overloads` just below this? Overload decorators work conceptually like an overloaded C function. Without overloads, we'd have to use a bunch of Unions around all the possible arg and result types, but then the picture of which combinations of arg and result type are valid is lost (lots of stuff in, lots of stuff out). Using overloads we can clearly define those relationships. This also lets mypy more accurately track the results of function invocations as well (e.g. when there's no `fn` argument, the result will be a `Callable`, but when there is, the result will be `None`). More docs: https://docs.python.org/3/library/typing.html#typing.overload https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339117) Time Spent: 19.5h (was: 19h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 19.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339118 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 01:53 Start Date: 06/Nov/19 01:53 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342884673 ## File path: sdks/python/apache_beam/coders/coders.py ## @@ -248,7 +281,26 @@ def __ne__(self, other): def __hash__(self): return hash(type(self)) - _known_urns = {} + _known_urns = {} # type: Dict[str, Tuple[type, ConstructorFn]] + + @classmethod Review comment: I'm confused about what you are confused about :) It's a dictionary of str (URN) to tuple of payload type and constructor function (type alias defined a the top of the module). Or are you talking about the `@overloads` just below this? Overload decorators work conceptually like an overloaded C function. Without overloads, we'd have to use a bunch of Unions around all the possible arg and result types, but then the picture of which combinations of arg and result type are valid is lost (lots of stuff in, lots of stuff out). Using overloads we can clearly define those relationships. This also lets mypy more accurately track the results of function invocations as well (e.g. when there's no `fn` argument, the result will be a `Callable`, but when there is, the result will be `None`). More docs: https://docs.python.org/3/library/typing.html#typing.overload https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339118) Time Spent: 19h 40m (was: 19.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 19h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339116 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 01:40 Start Date: 06/Nov/19 01:40 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342882155 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -74,53 +86,66 @@ else: is_compiled = False fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1 + # pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports _TIME_SHIFT = 1 << 63 MIN_TIMESTAMP_micros = MIN_TIMESTAMP.micros MAX_TIMESTAMP_micros = MAX_TIMESTAMP.micros +IterableStateReader = Callable[[bytes, 'CoderImpl'], Iterable] +IterableStateWriter = Callable[[Iterable, 'CoderImpl'], bytes] +Observables = List[Tuple[observable.ObservableMixin, 'CoderImpl']] class CoderImpl(object): """For internal use only; no backwards-compatibility guarantees.""" def encode_to_stream(self, value, stream, nested): +# type: (Any, create_OutputStream, bool) -> None Review comment: My intent was that mypy would use the slow_stream implementation of these, which _are_ types. But I realized when looking at this that `create_InputStream` is being treated as `Any` because mypy can't inspect the `stream` c-extension module (mypy always uses the first import location of an object, unless `TYPE_CHECKING` is used). The simple/obvious approach works, but is pretty verbose: ```python if TYPE_CHECKING: from apache_beam.transforms.window import IntervalWindow from .slow_stream import InputStream as create_InputStream from .slow_stream import OutputStream as create_OutputStream from .slow_stream import ByteCountingOutputStream from .slow_stream import get_varint_size else: # pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports try: from .stream import InputStream as create_InputStream from .stream import OutputStream as create_OutputStream from .stream import ByteCountingOutputStream from .stream import get_varint_size # Make it possible to import create_InputStream and other cdef-classes # from apache_beam.coders.coder_impl when Cython codepath is used. globals()['create_InputStream'] = create_InputStream globals()['create_OutputStream'] = create_OutputStream globals()['ByteCountingOutputStream'] = ByteCountingOutputStream except ImportError: from .slow_stream import InputStream as create_InputStream from .slow_stream import OutputStream as create_OutputStream from .slow_stream import ByteCountingOutputStream from .slow_stream import get_varint_size if False: # pylint: disable=using-constant-test # This clause is interpreted by the compiler. from cython import compiled as is_compiled else: is_compiled = False fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1 ``` Here's another approach (still verbose but less redundant): ```python try: from . import stream except ImportError: SLOW_STREAM = True else: SLOW_STREAM = False if TYPE_CHECKING or SLOW_STREAM: from .slow_stream import InputStream as create_InputStream from .slow_stream import OutputStream as create_OutputStream from .slow_stream import ByteCountingOutputStream from .slow_stream import get_varint_size if False: # pylint: disable=using-constant-test # This clause is interpreted by the compiler. from cython import compiled as is_compiled else: is_compiled = False fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1 else: # pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports from .stream import InputStream as create_InputStream from .stream import OutputStream as create_OutputStream from .stream import ByteCountingOutputStream from .stream import get_varint_size # Make it possible to import create_InputStream and other cdef-classes # from apache_beam.coders.coder_impl when Cython codepath is used. globals()['create_InputStream'] = create_InputStream globals()['create_OutputStream'] = create_OutputStream globals()['ByteCountingOutputStream'] = ByteCountingOutputStream # pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports ``` Opinion? This is an automated message from the Apache Git Service. To respond
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339115 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 01:25 Start Date: 06/Nov/19 01:25 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342878935 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -724,10 +790,12 @@ def _construct_from_components(self, components): class _ConcatSequence(object): def __init__(self, head, tail): +# type: (Iterable[Any], Iterable[Any]) -> None Review comment: Yeah, I avoided introducing generics in this MR. I'll do that in a follow-up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339115) Time Spent: 19h 10m (was: 19h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 19h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339114 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 01:24 Start Date: 06/Nov/19 01:24 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342878626 ## File path: sdks/python/apache_beam/io/iobase.py ## @@ -1275,6 +1306,7 @@ def defer_remainder(self, watermark=None): raise NotImplementedError def deferred_status(self): +# type: () -> Optional[Tuple[restriction_trackers.OffsetRange, Timestamp]] Review comment: Roger This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339114) Time Spent: 19h (was: 18h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 19h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339112 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Nov/19 01:17 Start Date: 06/Nov/19 01:17 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#discussion_r342876345 ## File path: sdks/python/apache_beam/pvalue.py ## @@ -201,6 +201,43 @@ class PDone(PValue): pass +class PTuple(object): + """An object grouping multiple PCollections. + + This class is useful for returning a named tuple of PCollections from a + composite. + """ + + def __init__(self, pcoll_dict): +"""Initializes this named tuple with a dictionary of tagged PCollections. +""" +self._pcolls = pcoll_dict + + def __str__(self): +return '<%s>' % self._str_internal() + + def __repr__(self): +return '<%s at %s>' % (self._str_internal(), hex(id(self))) Review comment: The representation should be PTuple(key1=pc1, key2=pc2, ...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339112) Time Spent: 19h (was: 18h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 19h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339111 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Nov/19 01:17 Start Date: 06/Nov/19 01:17 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#discussion_r342876590 ## File path: sdks/python/apache_beam/pvalue.py ## @@ -201,6 +201,43 @@ class PDone(PValue): pass +class PTuple(object): + """An object grouping multiple PCollections. + + This class is useful for returning a named tuple of PCollections from a + composite. + """ + + def __init__(self, pcoll_dict): +"""Initializes this named tuple with a dictionary of tagged PCollections. +""" +self._pcolls = pcoll_dict + + def __str__(self): +return '<%s>' % self._str_internal() + + def __repr__(self): +return '<%s at %s>' % (self._str_internal(), hex(id(self))) + + def _str_internal(self): +return '%s pcollections=%s' % ( +self.__class__.__name__, self._pcolls) + + def __iter__(self): +for tag in self._pcolls: + yield self[tag] + + def __getattr__(self, tag): +# Special methods which may be accessed before the object is Review comment: We shouldn't support pickling these. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339111) Time Spent: 18h 50m (was: 18h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 18h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339108 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Nov/19 01:13 Start Date: 06/Nov/19 01:13 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#discussion_r342876169 ## File path: sdks/python/apache_beam/pvalue.py ## @@ -201,6 +201,43 @@ class PDone(PValue): pass +class PTuple(object): + """An object grouping multiple PCollections. + + This class is useful for returning a named tuple of PCollections from a + composite. + """ + + def __init__(self, pcoll_dict): +"""Initializes this named tuple with a dictionary of tagged PCollections. +""" +self._pcolls = pcoll_dict + + def __str__(self): +return '<%s>' % self._str_internal() Review comment: Remove and let it default to `__repr__`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339108) Time Spent: 18.5h (was: 18h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 18.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=339109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339109 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Nov/19 01:13 Start Date: 06/Nov/19 01:13 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#discussion_r342876169 ## File path: sdks/python/apache_beam/pvalue.py ## @@ -201,6 +201,43 @@ class PDone(PValue): pass +class PTuple(object): + """An object grouping multiple PCollections. + + This class is useful for returning a named tuple of PCollections from a + composite. + """ + + def __init__(self, pcoll_dict): +"""Initializes this named tuple with a dictionary of tagged PCollections. +""" +self._pcolls = pcoll_dict + + def __str__(self): +return '<%s>' % self._str_internal() Review comment: Remove and let it default to `__repr__` (and inline _str_internal). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339109) Time Spent: 18h 40m (was: 18.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 18h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8562) source packages don't include gradlew
[ https://issues.apache.org/jira/browse/BEAM-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Lugg updated BEAM-8562: -- Description: I want to run using Flink. Procedure says source code is required, but gradlew isn't there h3. Executing a Beam pipeline on a Flink Cluster As of now you will need a copy of Apache Beam’s source code. You can download it on the [Downloads page|https://beam.apache.org/get-started/downloads/]. Following a couple of links, I get to: [http://www.apache.org/dyn/closer.cgi/beam/2.16.0/apache-beam-2.16.0-source-release.zip] But I don't see any gradlew wrappers where they should be. If I grab current using 'git clone', then the files are there. --- additional information --- I manually copied gradlew from the git clone to the 2.16 source release. That gradlew appears? to work. was: I want to run using Flink. Procedure says source code is required, but gradlew isn't there h3. Executing a Beam pipeline on a Flink Cluster As of now you will need a copy of Apache Beam’s source code. You can download it on the [Downloads page|https://beam.apache.org/get-started/downloads/]. Following a couple of links, I get to: [http://www.apache.org/dyn/closer.cgi/beam/2.16.0/apache-beam-2.16.0-source-release.zip] But I don't see any gradlew wrappers where they should be. If I grab current using 'git clone', then the files are there. Either the instructions should be modified or the source releases should contain all the tools. > source packages don't include gradlew > - > > Key: BEAM-8562 > URL: https://issues.apache.org/jira/browse/BEAM-8562 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 > Environment: linux >Reporter: Robert Lugg >Priority: Minor > > I want to run using Flink. Procedure says source code is required, but > gradlew isn't there > h3. Executing a Beam pipeline on a Flink Cluster > As of now you will need a copy of Apache Beam’s source code. You can download > it on the [Downloads page|https://beam.apache.org/get-started/downloads/]. > Following a couple of links, I get to: > [http://www.apache.org/dyn/closer.cgi/beam/2.16.0/apache-beam-2.16.0-source-release.zip] > But I don't see any gradlew wrappers where they should be. If I grab current > using 'git clone', then the files are there. > --- additional information --- > I manually copied gradlew from the git clone to the 2.16 source release. > That gradlew appears? to work. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339103 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 06/Nov/19 01:03 Start Date: 06/Nov/19 01:03 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#issuecomment-550093576 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339103) Time Spent: 3h 20m (was: 3h 10m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339102 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 06/Nov/19 01:03 Start Date: 06/Nov/19 01:03 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#issuecomment-550093576 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339102) Time Spent: 3h 10m (was: 3h) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339101 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 06/Nov/19 00:59 Start Date: 06/Nov/19 00:59 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#issuecomment-550092685 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339101) Time Spent: 3h (was: 2h 50m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339100 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 06/Nov/19 00:59 Start Date: 06/Nov/19 00:59 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#issuecomment-550092685 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339100) Time Spent: 2h 50m (was: 2h 40m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8402) Create a class hierarchy to represent environments
[ https://issues.apache.org/jira/browse/BEAM-8402?focusedWorklogId=339098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339098 ] ASF GitHub Bot logged work on BEAM-8402: Author: ASF GitHub Bot Created on: 06/Nov/19 00:44 Start Date: 06/Nov/19 00:44 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9811: [BEAM-8402] Create a class hierarchy to represent environments URL: https://github.com/apache/beam/pull/9811#issuecomment-550089460 Yes, this is mostly about the public API. This does create a module of environments, with several pre-defined, which gets us a step closer to just using these in the public API, which is not I think a scalable/portable way forward. We should at least add a caveat to that module that these are intended for internal use only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339098) Time Spent: 2h (was: 1h 50m) > Create a class hierarchy to represent environments > -- > > Key: BEAM-8402 > URL: https://issues.apache.org/jira/browse/BEAM-8402 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > As a first step towards making it possible to assign different environments > to sections of a pipeline, we first need to expose environment classes to the > pipeline API. Unlike PTransforms, PCollections, Coders, and Windowings, > environments exists solely in the portability framework as protobuf objects. > By creating a hierarchy of "native" classes that represent the various > environment types -- external, docker, process, etc -- users will be able to > instantiate these and assign them to parts of the pipeline. The assignment > portion will be covered in a follow-up issue/PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"
[ https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=339095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339095 ] ASF GitHub Bot logged work on BEAM-8348: Author: ASF GitHub Bot Created on: 06/Nov/19 00:41 Start Date: 06/Nov/19 00:41 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9789: [BEAM-8348] set job_name in portable_runner.py job request URL: https://github.com/apache/beam/pull/9789#issuecomment-550088785 I haven't made any progress on this. It's not a terribly important feature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339095) Time Spent: 2h 50m (was: 2h 40m) > Portable Python job name hard-coded to "job" > > > Key: BEAM-8348 > URL: https://issues.apache.org/jira/browse/BEAM-8348 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > See [1]. `job_name` is already taken by Google Cloud options [2], so I guess > we should create a new option (maybe `portable_job_name` to avoid disruption). > [[1] > https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294] > [2] > [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"
[ https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=339093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339093 ] ASF GitHub Bot logged work on BEAM-8348: Author: ASF GitHub Bot Created on: 06/Nov/19 00:35 Start Date: 06/Nov/19 00:35 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9789: [BEAM-8348] set job_name in portable_runner.py job request URL: https://github.com/apache/beam/pull/9789#issuecomment-550087290 Any update on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339093) Time Spent: 2h 40m (was: 2.5h) > Portable Python job name hard-coded to "job" > > > Key: BEAM-8348 > URL: https://issues.apache.org/jira/browse/BEAM-8348 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > See [1]. `job_name` is already taken by Google Cloud options [2], so I guess > we should create a new option (maybe `portable_job_name` to avoid disruption). > [[1] > https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294] > [2] > [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339090 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:27 Start Date: 06/Nov/19 00:27 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342865760 ## File path: sdks/python/apache_beam/options/value_provider.py ## @@ -23,6 +23,7 @@ from builtins import object from functools import wraps +from typing import Set Review comment: I don't follow. Are you seeing an error? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339090) Time Spent: 18h 50m (was: 18h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 18h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339089 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:26 Start Date: 06/Nov/19 00:26 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342865507 ## File path: sdks/python/apache_beam/io/textio.py ## @@ -92,7 +93,7 @@ def __init__(self, min_bundle_size, compression_type, strip_trailing_newlines, - coder, + coder, # type: coders.Coder Review comment: Yeah, the latter. It'll be good to go back and fill in the blanks, but I was trying to maximize my time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339089) Time Spent: 18h 40m (was: 18.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 18h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339087 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:24 Start Date: 06/Nov/19 00:24 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342865083 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -838,11 +911,12 @@ def encode_to_stream(self, value, out, nested): out.write_var_int64(0) def decode_from_stream(self, in_stream, nested): +# type: (create_InputStream, bool) -> Sequence size = in_stream.read_bigendian_int32() if size >= 0: elements = [self._elem_coder.decode_from_stream(in_stream, True) - for _ in range(size)] + for _ in range(size)] # type: Iterable[Any] Review comment: This is a bit less obvious than that. mypy knows that this is a `List[Any]`, but below this `element` is rebound to `_ConcatSequence`. Without this annotation, we get this error: ``` apache_beam/coders/coder_impl.py:935: error: Incompatible types in assignment (expression has type "_ConcatSequence", variable has type "List[Any]") [assignment] ``` Here's the full context: ```python if size >= 0: elements = [self._elem_coder.decode_from_stream(in_stream, True) for _ in range(size)] # type: Iterable[Any] else: elements = [] count = in_stream.read_var_int64() while count > 0: for _ in range(count): elements.append(self._elem_coder.decode_from_stream(in_stream, True)) count = in_stream.read_var_int64() if count == -1: if self._read_state is None: raise ValueError( 'Cannot read state-written iterable without state reader.') state_token = in_stream.read_all(True) elements = _ConcatSequence( elements, self._read_state(state_token, self._elem_coder)) ``` The first time that a variable appears in the code we can take that opportunity to relax the type definition, in this case from `List` to `Iterable`. `_ConcatSequence` is also an `Iterable`, so it meets the type requirement is allowed to be assigned to `elements`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339087) Time Spent: 18.5h (was: 18h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 18.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339085 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:22 Start Date: 06/Nov/19 00:22 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clarify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. I think we have a little bit trade off here: 1. What we have here: Determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. Doing it with string comparison, we don't introduce new dependency into DataflowRunner. 2. Deduce if the job is started from a notebook environment. We'll introduce [interactive] dependencies including at least ipython into DataflowRunner. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported but not necessarily used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339085) Time Spent: 7h 40m (was: 7.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339084 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:20 Start Date: 06/Nov/19 00:20 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clarify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. I think we have a little bit trade off here: 1. What we have here: Determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. Doing it with string comparison, we don't introduce new dependency into DataflowRunner. 2. Deduce if the job is started from a notebook environment. We'll introduce [interactive] dependencies including at least ipython into DataflowRunner. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339084) Time Spent: 7.5h (was: 7h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339083 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:17 Start Date: 06/Nov/19 00:17 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342855200 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: Thanks! If we track `interactive` as a property of runner, we cannot implicitly pass along the property from runner to runner. And if we deduce `interactive` from the environment, we'll introduce new dependencies into DataflowRunner. See below comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339083) Time Spent: 7h 20m (was: 7h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339082 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:16 Start Date: 06/Nov/19 00:16 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clartify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. I think we have a little bit trade off here: 1. What we have here: Determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. Doing it with string comparison, we don't introduce new dependency into DataflowRunner. 2. Deduce if the job is started from a notebook environment. We'll introduce [interactive] dependencies including at least ipython into DataflowRunner. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339082) Time Spent: 7h 10m (was: 7h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339081 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:14 Start Date: 06/Nov/19 00:14 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clartify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. I think we have a little bit trade off here: 1. What we have here: Determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. Doing it with string comparison 2. Deduce if the job is started from a notebook environment. We'll introduce [interactive] dependencies including ipython into DataflowRunner. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339081) Time Spent: 7h (was: 6h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339080 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:12 Start Date: 06/Nov/19 00:12 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342855200 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: Thanks! If we track `interactive` as a property of runner, we cannot implicitly pass along the property from runner to runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339080) Time Spent: 6h 50m (was: 6h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (BEAM-8193) Python PostCommit Tests are Flaky
[ https://issues.apache.org/jira/browse/BEAM-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw reopened BEAM-8193: --- > Python PostCommit Tests are Flaky > - > > Key: BEAM-8193 > URL: https://issues.apache.org/jira/browse/BEAM-8193 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Major > Fix For: Not applicable > > > I will use this as an umbrella bug, create sub task and assign to folks > working on Python in general. I will try to do a round-robin assignment. > The goal is to improve stability of Py2, 3.5, 3.6, 3.7 postcommits to be > working >80% of the time [1]. > /cc (you might get issues assigned) [~pabloem] [~udim] [~chamikara] > [~tvalentyn] > [1] http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8416) ZipFileArtifactServiceTest.test_concurrent_requests flaky
[ https://issues.apache.org/jira/browse/BEAM-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw resolved BEAM-8416. --- Fix Version/s: Not applicable Resolution: Fixed > ZipFileArtifactServiceTest.test_concurrent_requests flaky > - > > Key: BEAM-8416 > URL: https://issues.apache.org/jira/browse/BEAM-8416 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Robert Bradshaw >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 10m > Remaining Estimate: 0h > > {code} > Traceback (most recent call last): > File "/usr/lib/python3.7/unittest/case.py", line 59, in testPartExecutor > yield > File "/usr/lib/python3.7/unittest/case.py", line 615, in run > testMethod() > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py", > line 215, in test_concurrent_requests > _ = list(pool.map(check, range(100))) > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in > result_iterator > yield fs.pop().result() > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result > return self.__get_result() > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in > __get_result > raise self._exception > File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run > result = self.fn(*self.args, **self.kwargs) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py", > line 208, in check > self._service, tokens[session(index)], name(index))) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py", > line 73, in retrieve_artifact > name=name))) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py", > line 70, in > return b''.join(chunk.data for chunk in retrieval_service.GetArtifact( > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service.py", > line 133, in GetArtifact > chunk = fin.read(self._chunk_size) > File "/usr/lib/python3.7/zipfile.py", line 899, in read > data = self._read1(n) > File "/usr/lib/python3.7/zipfile.py", line 989, in _read1 > self._update_crc(data) > File "/usr/lib/python3.7/zipfile.py", line 917, in _update_crc > raise BadZipFile("Bad CRC-32 for file %r" % self.name) > zipfile.BadZipFile: Bad CRC-32 for file > '/3b2b55eb92de23535010b7ac80d553ec2d4bae872ac5606bc3042ce9313dff87/e1d492628cc0c1d0c1b736184f689be54fa03a996de918268ad834560e77305f' > {code} > and: > {code} > Traceback (most recent call last): > File "/usr/lib/python3.7/unittest/case.py", line 59, in testPartExecutor > yield > File "/usr/lib/python3.7/unittest/case.py", line 615, in run > testMethod() > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py", > line 215, in test_concurrent_requests > _ = list(pool.map(check, range(100))) > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in > result_iterator > yield fs.pop().result() > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result > return self.__get_result() > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in > __get_result > raise self._exception > File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run > result = self.fn(*self.args, **self.kwargs) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py", > line 208, in check > self._service, tokens[session(index)], name(index))) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/artifact_service_test.py", > line 73, in retrieve_artifact > name=name))) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/por
[jira] [Resolved] (BEAM-8193) Python PostCommit Tests are Flaky
[ https://issues.apache.org/jira/browse/BEAM-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw resolved BEAM-8193. --- Fix Version/s: Not applicable Resolution: Fixed > Python PostCommit Tests are Flaky > - > > Key: BEAM-8193 > URL: https://issues.apache.org/jira/browse/BEAM-8193 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Ahmet Altay >Assignee: Ahmet Altay >Priority: Major > Fix For: Not applicable > > > I will use this as an umbrella bug, create sub task and assign to folks > working on Python in general. I will try to do a round-robin assignment. > The goal is to improve stability of Py2, 3.5, 3.6, 3.7 postcommits to be > working >80% of the time [1]. > /cc (you might get issues assigned) [~pabloem] [~udim] [~chamikara] > [~tvalentyn] > [1] http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339076 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 06/Nov/19 00:03 Start Date: 06/Nov/19 00:03 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r342859843 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java ## @@ -128,22 +128,13 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) { FieldAccessDescriptor.withFieldNames(fieldNames).resolve(getSchema()); final Schema newSchema = SelectHelpers.getOutputSchema(getSchema(), resolved); -TypedRead builder = getBigQueryReadBuilder(newSchema); +TypedRead builder = getBigQueryReadBuilder(newSchema).withSelectedFields(fieldNames); Review comment: Will add `if (!fieldNames.isEmpty())` back later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339076) Time Spent: 2h 40m (was: 2.5h) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339072 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342846522 ## File path: sdks/python/apache_beam/io/iobase.py ## @@ -136,7 +150,12 @@ def estimate_size(self): """ raise NotImplementedError - def split(self, desired_bundle_size, start_position=None, stop_position=None): + def split(self, +desired_bundle_size, # type: int +start_position=None, # type: Optional[int] Review comment: These should be Optional[Any]. E.g. the start and end could be (str) keys. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339072) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 18h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339071 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342849862 ## File path: sdks/python/apache_beam/io/iobase.py ## @@ -1275,6 +1306,7 @@ def defer_remainder(self, watermark=None): raise NotImplementedError def deferred_status(self): +# type: () -> Optional[Tuple[restriction_trackers.OffsetRange, Timestamp]] Review comment: Optional[Tuple[Any, Timestamp]] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339071) Time Spent: 18h (was: 17h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 18h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339074 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342850742 ## File path: sdks/python/apache_beam/io/textio.py ## @@ -92,7 +93,7 @@ def __init__(self, min_bundle_size, compression_type, strip_trailing_newlines, - coder, + coder, # type: coders.Coder Review comment: Was there a reason to only type some of these arguments? Enough to make mypy happy? Only ones that were not obvious? (I'm fine with this, just trying to understand the reasoning.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339074) Time Spent: 18h 20m (was: 18h 10m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 18h 20m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339065 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342837849 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -74,53 +86,66 @@ else: is_compiled = False fits_in_64_bits = lambda x: -(1 << 63) <= x <= (1 << 63) - 1 + # pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports _TIME_SHIFT = 1 << 63 MIN_TIMESTAMP_micros = MIN_TIMESTAMP.micros MAX_TIMESTAMP_micros = MAX_TIMESTAMP.micros +IterableStateReader = Callable[[bytes, 'CoderImpl'], Iterable] +IterableStateWriter = Callable[[Iterable, 'CoderImpl'], bytes] +Observables = List[Tuple[observable.ObservableMixin, 'CoderImpl']] class CoderImpl(object): """For internal use only; no backwards-compatibility guarantees.""" def encode_to_stream(self, value, stream, nested): +# type: (Any, create_OutputStream, bool) -> None Review comment: Technically create_OutputStream is a constructor, not the type itself (due to some hacking around working in both compiled and non-compiled mode), but if this makes the type checker happy I guess that's OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339065) Time Spent: 17.5h (was: 17h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 17.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339068 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342842027 ## File path: sdks/python/apache_beam/coders/coders.py ## @@ -248,7 +281,26 @@ def __ne__(self, other): def __hash__(self): return hash(type(self)) - _known_urns = {} + _known_urns = {} # type: Dict[str, Tuple[type, ConstructorFn]] + + @classmethod Review comment: I'm not quite following what you're trying to declare here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339068) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 17h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339067 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342840025 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -838,11 +911,12 @@ def encode_to_stream(self, value, out, nested): out.write_var_int64(0) def decode_from_stream(self, in_stream, nested): +# type: (create_InputStream, bool) -> Sequence size = in_stream.read_bigendian_int32() if size >= 0: elements = [self._elem_coder.decode_from_stream(in_stream, True) - for _ in range(size)] + for _ in range(size)] # type: Iterable[Any] Review comment: mypy doesn't know a list comprehension is at least an `Iterable[Any]`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339067) Time Spent: 17h 40m (was: 17.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 17h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339066 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342839770 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -724,10 +790,12 @@ def _construct_from_components(self, components): class _ConcatSequence(object): def __init__(self, head, tail): +# type: (Iterable[Any], Iterable[Any]) -> None Review comment: Should we use some time variable here? I suppose for that to be useful _ConcatSequence would have to be a generic type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339066) Time Spent: 17h 40m (was: 17.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 17h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339073 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342851318 ## File path: sdks/python/apache_beam/options/value_provider.py ## @@ -23,6 +23,7 @@ from builtins import object from functools import wraps +from typing import Set Review comment: pylint ignore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339073) Time Spent: 18h 10m (was: 18h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 18h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339070 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342838959 ## File path: sdks/python/apache_beam/coders/coder_impl.py ## @@ -171,14 +199,17 @@ class StreamCoderImpl(CoderImpl): Subclass of CoderImpl implementing encode/decode using stream methods.""" def encode(self, value): +# type: (Any) -> bytes Review comment: Does mypy have a mode to assume inherited methods have the same type signature? I'm seeing a lot of DRY violation here... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339070) Time Spent: 17h 50m (was: 17h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 17h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339069 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Nov/19 00:02 Start Date: 06/Nov/19 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r342846545 ## File path: sdks/python/apache_beam/io/iobase.py ## @@ -153,7 +172,11 @@ def split(self, desired_bundle_size, start_position=None, stop_position=None): """ raise NotImplementedError - def get_range_tracker(self, start_position, stop_position): + def get_range_tracker(self, +start_position, # type: Optional[int] Review comment: Same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339069) Time Spent: 17h 50m (was: 17h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 17h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339064 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:01 Start Date: 06/Nov/19 00:01 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clartify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339064) Time Spent: 6h 40m (was: 6.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339061 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 23:57 Start Date: 05/Nov/19 23:57 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clartify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339061) Time Spent: 6.5h (was: 6h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8123) Support CloudPickle as pickler for Apache Beam.
[ https://issues.apache.org/jira/browse/BEAM-8123?focusedWorklogId=339057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339057 ] ASF GitHub Bot logged work on BEAM-8123: Author: ASF GitHub Bot Created on: 05/Nov/19 23:49 Start Date: 05/Nov/19 23:49 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #8978: [BEAM-8123] [DO NOT MERGE] POC: use cloudpickle for pickling. URL: https://github.com/apache/beam/pull/8978#issuecomment-550075988 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339057) Time Spent: 50m (was: 40m) > Support CloudPickle as pickler for Apache Beam. > --- > > Key: BEAM-8123 > URL: https://issues.apache.org/jira/browse/BEAM-8123 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8123) Support CloudPickle as pickler for Apache Beam.
[ https://issues.apache.org/jira/browse/BEAM-8123?focusedWorklogId=339058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339058 ] ASF GitHub Bot logged work on BEAM-8123: Author: ASF GitHub Bot Created on: 05/Nov/19 23:49 Start Date: 05/Nov/19 23:49 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #8978: [BEAM-8123] [DO NOT MERGE] POC: use cloudpickle for pickling. URL: https://github.com/apache/beam/pull/8978 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339058) Time Spent: 1h (was: 50m) > Support CloudPickle as pickler for Apache Beam. > --- > > Key: BEAM-8123 > URL: https://issues.apache.org/jira/browse/BEAM-8123 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339054 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 23:48 Start Date: 05/Nov/19 23:48 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342855715 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +400,57 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" - + def run(self, test_runner_api=True, runner=None, options=None, Review comment: IIRC, we want to allow the user to switch to `DataflowRunner` using the `p.run()` pattern instead of limiting the user to `Runner().run_pipeline(p, options)`. Do you think we should put it into a separate PR, or simply not supporting it at all? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339054) Time Spent: 6h 20m (was: 6h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8562) source packages don't include gradlew
Robert Lugg created BEAM-8562: - Summary: source packages don't include gradlew Key: BEAM-8562 URL: https://issues.apache.org/jira/browse/BEAM-8562 Project: Beam Issue Type: Bug Components: build-system Affects Versions: 2.16.0 Environment: linux Reporter: Robert Lugg I want to run using Flink. Procedure says source code is required, but gradlew isn't there h3. Executing a Beam pipeline on a Flink Cluster As of now you will need a copy of Apache Beam’s source code. You can download it on the [Downloads page|https://beam.apache.org/get-started/downloads/]. Following a couple of links, I get to: [http://www.apache.org/dyn/closer.cgi/beam/2.16.0/apache-beam-2.16.0-source-release.zip] But I don't see any gradlew wrappers where they should be. If I grab current using 'git clone', then the files are there. Either the instructions should be modified or the source releases should contain all the tools. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339053 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 23:46 Start Date: 05/Nov/19 23:46 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342855200 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: Thanks! I'll go with your suggestion below to deduce if the job is started from a notebook environment instead of determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339053) Time Spent: 6h 10m (was: 6h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Larsen updated BEAM-8561: --- Summary: Add ThriftIO to Support IO for Thrift Files (was: Add ThriftIO to Support for ) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Priority: Minor > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8561) Add ThriftIO to Support for
Chris Larsen created BEAM-8561: -- Summary: Add ThriftIO to Support for Key: BEAM-8561 URL: https://issues.apache.org/jira/browse/BEAM-8561 Project: Beam Issue Type: New Feature Components: io-java-files Reporter: Chris Larsen Similar to AvroIO it would be very useful to support reading and writing to/from Thrift files with a native connector. Functionality would include: # read() - Reading from one or more Thrift files. # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out
[ https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=339049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339049 ] ASF GitHub Bot logged work on BEAM-8294: Author: ASF GitHub Bot Created on: 05/Nov/19 23:24 Start Date: 05/Nov/19 23:24 Worklog Time Spent: 10m Work Description: ibzib commented on issue #: [BEAM-8294] run Spark portable validates runner tests in parallel URL: https://github.com/apache/beam/pull/#issuecomment-550030660 The tests are [failing](https://issues.apache.org/jira/browse/BEAM-8556), which as improvement over timing out and being aborted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339049) Time Spent: 1.5h (was: 1h 20m) > Spark portable validates runner tests timing out > > > Key: BEAM-8294 > URL: https://issues.apache.org/jira/browse/BEAM-8294 > Project: Beam > Issue Type: Improvement > Components: runner-spark, test-failures, testing >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing, portability-spark > Time Spent: 1.5h > Remaining Estimate: 0h > > This postcommit has been timing out for 11 days. > [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. > Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK > worker management stack caused this to slow down. > [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339038 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 05/Nov/19 22:56 Start Date: 05/Nov/19 22:56 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r342840896 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java ## @@ -123,7 +129,12 @@ public void onMatch(RelOptRuleCall call) { // 1. Calc only does projects and renames. //And // 2. Predicate can be completely pushed-down to IO level. -if (isProjectRenameOnlyProgram(program) && tableFilter.getNotSupported().isEmpty()) { +//And +// 3. And IO supports project push-down OR all fields are projected by a Calc. +if (isProjectRenameOnlyProgram(program) +&& tableFilter.getNotSupported().isEmpty() +&& (beamSqlTable.supportsProjects() +|| calc.getRowType().getFieldCount() == calcInputRowType.getFieldCount())) { Review comment: Moved this check into a separate method to keep things readable. Updated the check to compare a list of projected field names to the list passed to a `Calc` as input. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339038) Time Spent: 2.5h (was: 2h 20m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=339037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339037 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 05/Nov/19 22:53 Start Date: 05/Nov/19 22:53 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r342839842 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java ## @@ -124,11 +126,20 @@ public void onMatch(RelOptRuleCall call) { RelDataType calcInputType = CalciteUtils.toCalciteRowType(newSchema, ioSourceRel.getCluster().getTypeFactory()); +// TODO: Check if an IO supports field reordering and drop a Calc when it does (1). // Check if the calc can be dropped: -// 1. Calc only does projects and renames. +// 1. Calc only does projects and renames of fields in the same order. //And // 2. Predicate can be completely pushed-down to IO level. -if (isProjectRenameOnlyProgram(program) && tableFilter.getNotSupported().isEmpty()) { +//And +// 3. And IO supports project push-down OR all fields are projected by a Calc. +if (isProjectRenameOnlyProgram(program, beamSqlTable.supportsProjects()) Review comment: This should be resolved now, IOs can specify whether they support field reordering. When reordering in not supported and fields are projected in a different order (from Schema) - Calc should not be dropped. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339037) Time Spent: 2h 20m (was: 2h 10m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339031 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 22:32 Start Date: 05/Nov/19 22:32 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342831745 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +400,57 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" - + def run(self, test_runner_api=True, runner=None, options=None, Review comment: Why are we adding runner and options parameters here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339031) Time Spent: 5h 50m (was: 5h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339030&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339030 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 22:32 Start Date: 05/Nov/19 22:32 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342830295 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: I believe one more suggestion from the previous PR was to not keepting track of interactivity as part of the Pipeline but making it a property of the runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339030) Time Spent: 5h 40m (was: 5.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8460) Portable Flink runner fails UsesStrictTimerOrdering category tests
[ https://issues.apache.org/jira/browse/BEAM-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver reassigned BEAM-8460: - Assignee: Kyle Weaver > Portable Flink runner fails UsesStrictTimerOrdering category tests > -- > > Key: BEAM-8460 > URL: https://issues.apache.org/jira/browse/BEAM-8460 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.17.0 >Reporter: Jan Lukavský >Assignee: Kyle Weaver >Priority: Major > > BEAM-7520 introduced new set of validatesRunner tests that test that timers > are fired exactly in order of increasing timestamp. Portable Flink runner > fails these added tests (are currently ignored). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339032 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 22:32 Start Date: 05/Nov/19 22:32 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342832459 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: The change could be limited to: - here detect, whether we are in interactive environment or not. (For example, check whether certain imports are loaded?) Could also be a utility method somewhere, to check whether this in an interactive method. - If yes, add the labels. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339032) Time Spent: 6h (was: 5h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=339029&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339029 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 05/Nov/19 22:25 Start Date: 05/Nov/19 22:25 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#issuecomment-550050927 Hello reviewers, Thanks for the feedback so far! There are a few questions that I've responded to that are now in your court. I'd love to get these addressed soon so that I can minimize the merge conflicts that are piling up. Once I have those answers your feedback will be integrated into the smaller "part 1" PR here: https://github.com/apache/beam/pull/9915 thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339029) Time Spent: 17h 20m (was: 17h 10m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 17h 20m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out
[ https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=339011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339011 ] ASF GitHub Bot logged work on BEAM-8294: Author: ASF GitHub Bot Created on: 05/Nov/19 21:33 Start Date: 05/Nov/19 21:33 Worklog Time Spent: 10m Work Description: ibzib commented on issue #: [BEAM-8294] run Spark portable validates runner tests in parallel URL: https://github.com/apache/beam/pull/#issuecomment-550030660 The tests are failing, which as improvement over timing out and being aborted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339011) Time Spent: 1h 20m (was: 1h 10m) > Spark portable validates runner tests timing out > > > Key: BEAM-8294 > URL: https://issues.apache.org/jira/browse/BEAM-8294 > Project: Beam > Issue Type: Improvement > Components: runner-spark, test-failures, testing >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing, portability-spark > Time Spent: 1h 20m > Remaining Estimate: 0h > > This postcommit has been timing out for 11 days. > [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. > Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK > worker management stack caused this to slow down. > [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8452) TriggerLoadJobs.process in bigquery_file_loads schema is type str
[ https://issues.apache.org/jira/browse/BEAM-8452?focusedWorklogId=339006&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339006 ] ASF GitHub Bot logged work on BEAM-8452: Author: ASF GitHub Bot Created on: 05/Nov/19 21:27 Start Date: 05/Nov/19 21:27 Worklog Time Spent: 10m Work Description: noah-goodrich commented on pull request #1: BEAM-8452 - TriggerLoadJobs.process in bigquery_file_loads schema is type str URL: https://github.com/apache/beam/pull/1 When trying to run a BigQueryFileLoads job, I receive the following error: ``` ValidationError: Expected type for field schema, found {"fields": [{"name": "id", "type": "INTEGER", "mode": "required"}, {"name": "description", "type" : "STRING", "mode": "nullable"}]} (type ) ``` R: @chamikaramj Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ X] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [X ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ X] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https
[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time
[ https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=338988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338988 ] ASF GitHub Bot logged work on BEAM-3288: Author: ASF GitHub Bot Created on: 05/Nov/19 20:37 Start Date: 05/Nov/19 20:37 Worklog Time Spent: 10m Work Description: je-ik commented on pull request #9960: [BEAM-3288] Guard against unsafe triggers at construction time URL: https://github.com/apache/beam/pull/9960#discussion_r342784933 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java ## @@ -162,6 +164,45 @@ public static void applicableTo(PCollection input) { throw new IllegalStateException( "GroupByKey must have a valid Window merge function. " + "Invalid because: " + cause); } + +// Validate that the trigger does not finish before garbage collection time +if (!triggerIsSafe(windowingStrategy)) { Review comment: Correction - stateful ParDo doesn't use triggers. But Combine might still be affected by this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338988) Time Spent: 2h 10m (was: 2h) > Guard against unsafe triggers at construction time > --- > > Key: BEAM-3288 > URL: https://issues.apache.org/jira/browse/BEAM-3288 > Project: Beam > Issue Type: Task > Components: sdk-java-core, sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Current Beam trigger semantics are rather confusing and in some cases > extremely unsafe, especially if the pipeline includes multiple chained GBKs. > One example of that is https://issues.apache.org/jira/browse/BEAM-3169 . > There's multiple issues: > The API allows users to specify terminating top-level triggers (e.g. "trigger > a pane after receiving 1 elements in the window, and that's it"), but > experience from user support shows that this is nearly always a mistake and > the user did not intend to drop all further data. > In general, triggers are the only place in Beam where data is being dropped > without making a lot of very loud noise about it - a practice for which the > PTransform style guide uses the language: "never, ever, ever do this". > Continuation triggers are still worse. For context: continuation trigger is > the trigger that's set on the output of a GBK and controls further > aggregation of the results of this aggregation by downstream GBKs. The output > shouldn't just use the same trigger as the input, because e.g. if the input > trigger said "wait for an hour before emitting a pane", that doesn't mean > that we should wait for another hour before emitting a result of aggregating > the result of the input trigger. Continuation triggers try to simulate the > behavior "as if a pane of the input propagated through the entire pipeline", > but the implementation of individual continuation triggers doesn't do that. > E.g. the continuation of "first N elements in pane" trigger is "first 1 > element in pane", and if the results of a first GBK are further grouped by a > second GBK onto more coarse key (e.g. if everything is grouped onto the same > key), that effectively means that, of the keys of the first GBK, only one > survives and all others are dropped (what happened in the data loss bug). > The ultimate fix to all of these things is > https://s.apache.org/beam-sink-triggers . However, it is a huge model change, > and meanwhile we have to do something. The options are, in order of > increasing backward incompatibility (but incompatibility in a "rejecting > something that previously was accepted but extremely dangerous" kind of way): > - Make the continuation trigger of most triggers be the "always-fire" > trigger. Seems that this should be the case for all triggers except the > watermark trigger. This will definitely increase safety, but lead to more > eager firing of downstream aggregations. It also will violate a user's > expectation that a fire-once trigger fires everything downstream only once, > but that expectation appears impossible to satisfy safely. > - Make the continuation trigger of some triggers be the "invalid" trigger, > i.e. require the user to set it explicitly: there's in general no good and > safe way to infer what a trigger on a second GBK "truly" should be, based on >
[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken
[ https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338987 ] ASF GitHub Bot logged work on BEAM-8504: Author: ASF GitHub Bot Created on: 05/Nov/19 20:35 Start Date: 05/Nov/19 20:35 Worklog Time Spent: 10m Work Description: kanterov commented on issue #9987: [BEAM-8504] Fix a bug related to zero-row responses URL: https://github.com/apache/beam/pull/9987#issuecomment-550008537 Thanks. It looks good! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338987) Time Spent: 1h 40m (was: 1.5h) > BigQueryIO DIRECT_READ is broken > > > Key: BEAM-8504 > URL: https://issues.apache.org/jira/browse/BEAM-8504 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0 >Reporter: Gleb Kanterov >Assignee: Aryan Naraghi >Priority: Major > Fix For: 2.17.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT > (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with > 2.15.0. > {code} > java.io.IOException: Failed to start reading from source: name: > "projects//locations/eu/streams/" > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Fraction consumed from > previous response (0.0) is not less than fraction consumed from current > response (0.0). > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206) > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken
[ https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967844#comment-16967844 ] Gleb Kanterov commented on BEAM-8504: - [~aryann] thanks! [~Ardagan] the change looks very self-contained, given that it is a regression, is there any chance we can have it as a part of 2.17.0? > BigQueryIO DIRECT_READ is broken > > > Key: BEAM-8504 > URL: https://issues.apache.org/jira/browse/BEAM-8504 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0 >Reporter: Gleb Kanterov >Assignee: Aryan Naraghi >Priority: Major > Fix For: 2.17.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT > (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with > 2.15.0. > {code} > java.io.IOException: Failed to start reading from source: name: > "projects//locations/eu/streams/" > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Fraction consumed from > previous response (0.0) is not less than fraction consumed from current > response (0.0). > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206) > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out
[ https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338984 ] ASF GitHub Bot logged work on BEAM-8294: Author: ASF GitHub Bot Created on: 05/Nov/19 20:26 Start Date: 05/Nov/19 20:26 Worklog Time Spent: 10m Work Description: ibzib commented on issue #: [BEAM-8294] run Spark portable validates runner tests in parallel URL: https://github.com/apache/beam/pull/#issuecomment-550005046 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338984) Time Spent: 1h 10m (was: 1h) > Spark portable validates runner tests timing out > > > Key: BEAM-8294 > URL: https://issues.apache.org/jira/browse/BEAM-8294 > Project: Beam > Issue Type: Improvement > Components: runner-spark, test-failures, testing >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing, portability-spark > Time Spent: 1h 10m > Remaining Estimate: 0h > > This postcommit has been timing out for 11 days. > [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. > Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK > worker management stack caused this to slow down. > [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out
[ https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338983 ] ASF GitHub Bot logged work on BEAM-8294: Author: ASF GitHub Bot Created on: 05/Nov/19 20:22 Start Date: 05/Nov/19 20:22 Worklog Time Spent: 10m Work Description: ibzib commented on issue #: [BEAM-8294] run Spark portable validates runner tests in parallel URL: https://github.com/apache/beam/pull/#issuecomment-550003251 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338983) Time Spent: 1h (was: 50m) > Spark portable validates runner tests timing out > > > Key: BEAM-8294 > URL: https://issues.apache.org/jira/browse/BEAM-8294 > Project: Beam > Issue Type: Improvement > Components: runner-spark, test-failures, testing >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing, portability-spark > Time Spent: 1h > Remaining Estimate: 0h > > This postcommit has been timing out for 11 days. > [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. > Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK > worker management stack caused this to slow down. > [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out
[ https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338982 ] ASF GitHub Bot logged work on BEAM-8294: Author: ASF GitHub Bot Created on: 05/Nov/19 20:21 Start Date: 05/Nov/19 20:21 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #: [BEAM-8294] run Spark portable validates runner tests in parallel URL: https://github.com/apache/beam/pull/ Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/
[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"
[ https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=338978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338978 ] ASF GitHub Bot logged work on BEAM-8512: Author: ASF GitHub Bot Created on: 05/Nov/19 20:08 Start Date: 05/Nov/19 20:08 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9998: [BEAM-8512] Add integration tests for flink_runner.py URL: https://github.com/apache/beam/pull/9998 This is just a very basic smoke test to ensure that FlinkRunner works as intended with the default settings. This does _not_ test behavior with an actual Flink cluster. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.or
[jira] [Work logged] (BEAM-8545) don't docker pull before docker run
[ https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=338958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338958 ] ASF GitHub Bot logged work on BEAM-8545: Author: ASF GitHub Bot Created on: 05/Nov/19 19:36 Start Date: 05/Nov/19 19:36 Worklog Time Spent: 10m Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker pull before docker run URL: https://github.com/apache/beam/pull/9972#issuecomment-549623033 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338958) Time Spent: 1h 40m (was: 1.5h) > don't docker pull before docker run > > > Key: BEAM-8545 > URL: https://issues.apache.org/jira/browse/BEAM-8545 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Since 'docker run' automatically pulls when the image doesn't exist locally, > I think it's safe to remove explicit 'docker pull' before 'docker run'. > Without 'docker pull', we won't update the local image with the remote image > (for the same tag) but it shouldn't be a problem in prod that the unique tag > is assumed for each released version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8545) don't docker pull before docker run
[ https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=338957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338957 ] ASF GitHub Bot logged work on BEAM-8545: Author: ASF GitHub Bot Created on: 05/Nov/19 19:36 Start Date: 05/Nov/19 19:36 Worklog Time Spent: 10m Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker pull before docker run URL: https://github.com/apache/beam/pull/9972#issuecomment-549984848 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338957) Time Spent: 1.5h (was: 1h 20m) > don't docker pull before docker run > > > Key: BEAM-8545 > URL: https://issues.apache.org/jira/browse/BEAM-8545 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Since 'docker run' automatically pulls when the image doesn't exist locally, > I think it's safe to remove explicit 'docker pull' before 'docker run'. > Without 'docker pull', we won't update the local image with the remote image > (for the same tag) but it shouldn't be a problem in prod that the unique tag > is assumed for each released version. -- This message was sent by Atlassian Jira (v8.3.4#803005)