[jira] [Commented] (BEAM-9759) Pipeline creation with large number of shards/streams takes long time
[ https://issues.apache.org/jira/browse/BEAM-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095092#comment-17095092 ] Reuven Lax commented on BEAM-9759: -- The best solution would be to create a new SplittableDoFn version of the Kinesis runner. This would have several advantages: # It could support dynamic changes (at run time) of the list of Kinesis topics. I believe this is a major reason that you currently need to update the pipeline so often, and this would remove that need. 2. The splitting could then happen at run time instead of graph-construction time, and could also be parallelized. > Pipeline creation with large number of shards/streams takes long time > - > > Key: BEAM-9759 > URL: https://issues.apache.org/jira/browse/BEAM-9759 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis, runner-dataflow >Affects Versions: 2.19.0 >Reporter: Sebastian Graca >Priority: Major > > We are processing multiple Kinesis streams using pipelines running on > {{DataflowRunner}}. The time needed to start such pipeline from a pipeline > definition (execution of {{org.apache.beam.sdk.Pipeline.run()}} method) takes > considerable amount of time. In our case: > * a pipeline that consumes data from 196 streams (237 shards in total) > starts in 7 minutes > * a pipeline that consumes data from 111 streams (261 shards in total) > starts in 4 minutes > I've been investigating this and found out that when {{Pipeline.run}} is > invoked, the whole pipeline graph is traversed and serialized so it can be > passed to the Dataflow backend. Here's part of the stacktrace that shows this > traversal: > {code:java} > at > com.amazonaws.services.kinesis.AmazonKinesisClient.getRecords(AmazonKinesisClient.java:1252) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getRecords$2(SimplifiedKinesisClient.java:137) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getRecords(SimplifiedKinesisClient.java:134) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getRecords(SimplifiedKinesisClient.java:119) > at > org.apache.beam.sdk.io.kinesis.StartingPointShardsFinder.validateShards(StartingPointShardsFinder.java:195) > at > org.apache.beam.sdk.io.kinesis.StartingPointShardsFinder.findShardsAtStartingPoint(StartingPointShardsFinder.java:115) > at > org.apache.beam.sdk.io.kinesis.DynamicCheckpointGenerator.generate(DynamicCheckpointGenerator.java:59) > at org.apache.beam.sdk.io.kinesis.KinesisSource.split(KinesisSource.java:88) > at > org.apache.beam.runners.dataflow.internal.CustomSources.serializeToCloudSource(CustomSources.java:87) > at > org.apache.beam.runners.dataflow.ReadTranslator.translateReadHelper(ReadTranslator.java:51) > at > org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1630) > at > org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1627) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:494) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317) > at > org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251) > at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:433) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:192) > at > org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:795) > at > org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:186) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) > {code} > As you can see, during serialization, > {{org.apache.beam.sdk.io.kinesis.KinesisSource.split}} method is called. This > method finds all shards for the stream and also validates each shard by > reading from it. As this process is sequential it takes considerable time > that is dependent both on the number of streams
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428516 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 29/Apr/20 05:07 Start Date: 29/Apr/20 05:07 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11548: URL: https://github.com/apache/beam/pull/11548#issuecomment-620990823 @tvalentyn, I changed it to pull licenses only with Jenkins test. License pulling is skipped by default. Can you please take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428516) Time Spent: 26h (was: 25h 50m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 26h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428505 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 29/Apr/20 04:03 Start Date: 29/Apr/20 04:03 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11548: URL: https://github.com/apache/beam/pull/11548#issuecomment-620977492 Run PythonDocker PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428505) Time Spent: 25h 50m (was: 25h 40m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 25h 50m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428504 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 29/Apr/20 04:02 Start Date: 29/Apr/20 04:02 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11548: URL: https://github.com/apache/beam/pull/11548#issuecomment-620977305 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428504) Time Spent: 25h 40m (was: 25.5h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 25h 40m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=428503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428503 ] ASF GitHub Bot logged work on BEAM-9418: Author: ASF GitHub Bot Created on: 29/Apr/20 03:56 Start Date: 29/Apr/20 03:56 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11333: URL: https://github.com/apache/beam/pull/11333#issuecomment-620976106 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428503) Time Spent: 1h 40m (was: 1.5h) > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=428502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428502 ] ASF GitHub Bot logged work on BEAM-9418: Author: ASF GitHub Bot Created on: 29/Apr/20 03:53 Start Date: 29/Apr/20 03:53 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11333: URL: https://github.com/apache/beam/pull/11333#issuecomment-620975620 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428502) Time Spent: 1.5h (was: 1h 20m) > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428501 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 29/Apr/20 03:48 Start Date: 29/Apr/20 03:48 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11548: URL: https://github.com/apache/beam/pull/11548#issuecomment-620974622 Run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428501) Time Spent: 25.5h (was: 25h 20m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 25.5h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428492 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 29/Apr/20 03:25 Start Date: 29/Apr/20 03:25 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11552: URL: https://github.com/apache/beam/pull/11552#issuecomment-620970095 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428492) Time Spent: 25h 20m (was: 25h 10m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 25h 20m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.
[ https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-7885. - Fix Version/s: 2.22.0 Resolution: Fixed > DoFn.setup() don't run for streaming jobs on DirectRunner. > --- > > Key: BEAM-7885 > URL: https://issues.apache.org/jira/browse/BEAM-7885 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.14.0 > Environment: Python >Reporter: niklas Hansson >Assignee: Pablo Estrada >Priority: Minor > Fix For: 2.22.0 > > > From version 2.14.0 Python have introduced setup and teardown for DoFn in > order to "Called to prepare an instance for processing bundles of > elements.This is a good place to initialize transient in-memory resources, > such as network connections." > However when trying to use it for a unbounded job (pubsub source) it seams > like the DoFn.setup() is never called and the resources are never initialize. > [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the > Dataflow runner the DoFn.Setup seams to be called multiple times but then > never again when the pipeline is processing elements [UPDATE] . For the > direct runner I get: > > AttributeError: 'NoneType' object has no attribute 'predict' [while running > 'transform the data'] > """ > My source code: [https://github.com/NikeNano/DataflowSklearnStreaming] > > I am happy to contribute with example code for how to use setup as soon as I > get it running :) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.
[ https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada reassigned BEAM-7885: --- Assignee: Pablo Estrada > DoFn.setup() don't run for streaming jobs on DirectRunner. > --- > > Key: BEAM-7885 > URL: https://issues.apache.org/jira/browse/BEAM-7885 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.14.0 > Environment: Python >Reporter: niklas Hansson >Assignee: Pablo Estrada >Priority: Minor > > From version 2.14.0 Python have introduced setup and teardown for DoFn in > order to "Called to prepare an instance for processing bundles of > elements.This is a good place to initialize transient in-memory resources, > such as network connections." > However when trying to use it for a unbounded job (pubsub source) it seams > like the DoFn.setup() is never called and the resources are never initialize. > [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the > Dataflow runner the DoFn.Setup seams to be called multiple times but then > never again when the pipeline is processing elements [UPDATE] . For the > direct runner I get: > > AttributeError: 'NoneType' object has no attribute 'predict' [while running > 'transform the data'] > """ > My source code: [https://github.com/NikeNano/DataflowSklearnStreaming] > > I am happy to contribute with example code for how to use setup as soon as I > get it running :) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.
[ https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095009#comment-17095009 ] Pablo Estrada commented on BEAM-7885: - This is fixed in PR 11547 > DoFn.setup() don't run for streaming jobs on DirectRunner. > --- > > Key: BEAM-7885 > URL: https://issues.apache.org/jira/browse/BEAM-7885 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.14.0 > Environment: Python >Reporter: niklas Hansson >Priority: Minor > > From version 2.14.0 Python have introduced setup and teardown for DoFn in > order to "Called to prepare an instance for processing bundles of > elements.This is a good place to initialize transient in-memory resources, > such as network connections." > However when trying to use it for a unbounded job (pubsub source) it seams > like the DoFn.setup() is never called and the resources are never initialize. > [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the > Dataflow runner the DoFn.Setup seams to be called multiple times but then > never again when the pipeline is processing elements [UPDATE] . For the > direct runner I get: > > AttributeError: 'NoneType' object has no attribute 'predict' [while running > 'transform the data'] > """ > My source code: [https://github.com/NikeNano/DataflowSklearnStreaming] > > I am happy to contribute with example code for how to use setup as soon as I > get it running :) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.
[ https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094957#comment-17094957 ] Pablo Estrada commented on BEAM-9745: - I am trying to figure out whether this is a regression or not. I'll post an update by tomorrow morning. So far the tests aren't passing on 2.20.0, but they throw a different error, so I guess it is hard to just dismiss as a regression: {code:java} java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -177: java.lang.UnsupportedOperationException: BigQuery source must be split before being read at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173) at org.apache.beam.fn.harness.BoundedSourceRunner.runReadLoop(BoundedSourceRunner.java:159) at org.apache.beam.fn.harness.BoundedSourceRunner.start(BoundedSourceRunner.java:146) at org.apache.beam.fn.harness.data.PTransformFunctionRegistry.lambda$register$0(PTransformFunctionRegistry.java:108) at org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:282) at org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160) at org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57) at org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:332) at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85) at org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125) at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:412) at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:381) at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306) at org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:230) at org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:138) {code} > [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to > deserialize Custom DoFns and Custom Coders. > - > > Key: BEAM-9745 > URL: https://issues.apache.org/jira/browse/BEAM-9745 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, java-fn-execution, sdk-java-harness, > test-failures >Reporter: Daniel Oliveira >Assignee: Pablo Estrada >Priority: Blocker > Labels: currently-failing > Fix For: 2.21.0 > > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/] > * [Gradle Build > Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project] > Initial investigation: > The bug appears to be popping up on BigQuery tests mostly, but also a > BigTable and a Datastore test. > Here's an example stacktrace of the two errors, showing _only_ the error > messages themselves. Source: > [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe] > {noformat} > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -191: > java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With > Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > ... > Caused by: java.lang.RuntimeException: Error received from SDK harness for > instruction -191: java.lang.IllegalArgumentException: unable to deserialize > Custom DoFn With Execution Info > ... > Caused by: java.lang.ClassNotFoundException: > org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error > received from SDK harness for instruction -206: >
[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.
[ https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428453 ] ASF GitHub Bot logged work on BEAM-9802: Author: ASF GitHub Bot Created on: 29/Apr/20 00:51 Start Date: 29/Apr/20 00:51 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11495: URL: https://github.com/apache/beam/pull/11495#issuecomment-620929683 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428453) Time Spent: 1h 20m (was: 1h 10m) > Provide a way to customize automatically started services. > -- > > Key: BEAM-9802 > URL: https://issues.apache.org/jira/browse/BEAM-9802 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > This can be useful for testing and alternative production environments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python
[ https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=428451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428451 ] ASF GitHub Bot logged work on BEAM-8949: Author: ASF GitHub Bot Created on: 29/Apr/20 00:33 Start Date: 29/Apr/20 00:33 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11210: URL: https://github.com/apache/beam/pull/11210#issuecomment-620924876 @chamikaramj - could you review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428451) Time Spent: 7h 50m (was: 7h 40m) > Add Spanner IO Integration Test for Python > -- > > Key: BEAM-8949 > URL: https://issues.apache.org/jira/browse/BEAM-8949 > Project: Beam > Issue Type: Test > Components: io-py-gcp >Reporter: Shoaib Zafar >Assignee: Shoaib Zafar >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read > from the spanner. Currently, it only contains direct runner unit tests. In > order to make this functionality available for the users, integration tests > also need to be added. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work stopped] (BEAM-9117) Clustering coder does not get used for BQ multi-partitioned tables
[ https://issues.apache.org/jira/browse/BEAM-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9117 stopped by Chamikara Madhusanka Jayalath. --- > Clustering coder does not get used for BQ multi-partitioned tables > -- > > Key: BEAM-9117 > URL: https://issues.apache.org/jira/browse/BEAM-9117 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9117) Clustering coder does not get used for BQ multi-partitioned tables
[ https://issues.apache.org/jira/browse/BEAM-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9117 started by Chamikara Madhusanka Jayalath. --- > Clustering coder does not get used for BQ multi-partitioned tables > -- > > Key: BEAM-9117 > URL: https://issues.apache.org/jira/browse/BEAM-9117 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9847) Verify If Triggering allows emitting eager results when processing a single element in HL7v2IO.
[ https://issues.apache.org/jira/browse/BEAM-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094947#comment-17094947 ] Jacob Ferriero commented on BEAM-9847: -- This test seems to confirm it is not possible to emit outputs of processElement call before it has completed. [https://gist.github.com/jaketf/d3c2e70dde781bbb0ef1993446e34b71] > Verify If Triggering allows emitting eager results when processing a single > element in HL7v2IO. > --- > > Key: BEAM-9847 > URL: https://issues.apache.org/jira/browse/BEAM-9847 > Project: Beam > Issue Type: Task > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > > Due to the nature of the HL7v2 API, HL7v2IO.ListHL7v2Messages follows the > pattern of [paginating through all ListMessages results in a single > ProcessElement call.|#diff-9cd595f078378218ccc01ce7e19ca766R447]] > > Upon testing with customer against HL7v2 store with 350k messages we > observed that the ListMessages transform was not outputting any elements > "hanging" for a long time (which was assumed to be the single thread > paginating through all the results). > > We added the following triggering in hopes that it would emit early results: > {code:java} > .apply( > Window.into(new GlobalWindows()) > .triggering( > AfterWatermark.pastEndOfWindow() > .withEarlyFirings( > AfterProcessingTime.pastFirstElementInPane() > .plusDelayOf(Duration.standardSeconds(1 > .discardingFiredPanes()) > {code} > Our tests with this triggering seemed to indicate that it did not "hang" like > the first test and seemed to output a more steady stream of elements. > Reviewer states that bundles must be committed atomically so no output > elements of a (single process element call) can proceed to downstream stages > until all output elements for that process element call are ready. > There may be other things at play here. Will seek to reproduce in a way that > definitively confirms output elements can be eagerly output during the > execution of a single process element call before it completes. > CC: [~pabloem] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements
[ https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428446 ] ASF GitHub Bot logged work on BEAM-9831: Author: ASF GitHub Bot Created on: 29/Apr/20 00:18 Start Date: 29/Apr/20 00:18 Worklog Time Spent: 10m Work Description: jaketf commented on a change in pull request #11538: URL: https://github.com/apache/beam/pull/11538#discussion_r417000696 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java ## @@ -437,6 +444,20 @@ private Message fetchMessage(HealthcareApiClient client, String msgId) .apply(Create.of(this.hl7v2Stores)) .apply(ParDo.of(new ListHL7v2MessagesFn(this.filter))) .setCoder(new HL7v2MessageCoder()) + // Listing takes a long time for each input element (HL7v2 store) because it has to + // paginate through results in a single thread / ProcessElement call in order to keep + // track of page token. + // Eagerly emit data on 1 second intervals so downstream processing can get started before + // all of the list results have been paginated through. Review comment: @pabloem does this mean that all of a single element's output must be buffered in memory? or will runner be smart enough to spill to disk? Based on my initial investigation I was not able to reproduce the behavior reported by customer in a unit test. summarized in this [gist](https://gist.github.com/jaketf/d3c2e70dde781bbb0ef1993446e34b71) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428446) Time Spent: 1h 20m (was: 1h 10m) > HL7v2IO Improvements > > > Key: BEAM-9831 > URL: https://issues.apache.org/jira/browse/BEAM-9831 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > # HL7v2MessageCoder constructor should be public for use by end users > # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list > messages results before emitting any output data elements (due to high fan > out from a single input element). We should add early firings so that > downstream processing can proceed on early pages while later pages are still > being scrolled through. > # We should drop all output only fields of HL7v2Message and only keep data > and labels when calling ingestMessages, rather than expecting the user to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9849) Caching license files for license pulling
Hannah Jiang created BEAM-9849: -- Summary: Caching license files for license pulling Key: BEAM-9849 URL: https://issues.apache.org/jira/browse/BEAM-9849 Project: Beam Issue Type: Task Components: build-system Reporter: Hannah Jiang Licenses are pulled every time a docker image is created. We need to come up with a caching approach to cache the files so that the same file is pulled only once ever. This caching appraoch should be useable by all images release by Beam, including SDK docker images, Flink & Spark job server images etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.
[ https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428439 ] ASF GitHub Bot logged work on BEAM-9802: Author: ASF GitHub Bot Created on: 28/Apr/20 23:55 Start Date: 28/Apr/20 23:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11495: URL: https://github.com/apache/beam/pull/11495#issuecomment-620913946 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428439) Time Spent: 1h 10m (was: 1h) > Provide a way to customize automatically started services. > -- > > Key: BEAM-9802 > URL: https://issues.apache.org/jira/browse/BEAM-9802 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This can be useful for testing and alternative production environments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-58) Support Google Cloud Storage encryption keys
[ https://issues.apache.org/jira/browse/BEAM-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094931#comment-17094931 ] Udi Meiri commented on BEAM-58: --- I have not worked on this, only customer *managed* keys, not *supplied* ones. > Support Google Cloud Storage encryption keys > > > Key: BEAM-58 > URL: https://issues.apache.org/jira/browse/BEAM-58 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Dan Halperin >Assignee: Udi Meiri >Priority: Minor > > Customer-supplied encryption keys are now in Beta. > https://cloud.google.com/compute/docs/disks/customer-supplied-encryption -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.
[ https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=428436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428436 ] ASF GitHub Bot logged work on BEAM-9577: Author: ASF GitHub Bot Created on: 28/Apr/20 23:51 Start Date: 28/Apr/20 23:51 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11521: URL: https://github.com/apache/beam/pull/11521#issuecomment-620913082 R: @ihji This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428436) Time Spent: 20h (was: 19h 50m) > Update artifact staging and retrieval protocols to be dependency aware. > --- > > Key: BEAM-9577 > URL: https://issues.apache.org/jira/browse/BEAM-9577 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 20h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.
[ https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428435 ] ASF GitHub Bot logged work on BEAM-9802: Author: ASF GitHub Bot Created on: 28/Apr/20 23:49 Start Date: 28/Apr/20 23:49 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11495: URL: https://github.com/apache/beam/pull/11495#issuecomment-620912558 Thanks. Rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428435) Time Spent: 1h (was: 50m) > Provide a way to customize automatically started services. > -- > > Key: BEAM-9802 > URL: https://issues.apache.org/jira/browse/BEAM-9802 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > This can be useful for testing and alternative production environments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=428433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428433 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 28/Apr/20 23:32 Start Date: 28/Apr/20 23:32 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11452: URL: https://github.com/apache/beam/pull/11452#issuecomment-620907966 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428433) Time Spent: 1h 50m (was: 1h 40m) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python
[ https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=428432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428432 ] ASF GitHub Bot logged work on BEAM-8949: Author: ASF GitHub Bot Created on: 28/Apr/20 23:30 Start Date: 28/Apr/20 23:30 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11210: URL: https://github.com/apache/beam/pull/11210#issuecomment-620907421 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428432) Time Spent: 7h 40m (was: 7.5h) > Add Spanner IO Integration Test for Python > -- > > Key: BEAM-8949 > URL: https://issues.apache.org/jira/browse/BEAM-8949 > Project: Beam > Issue Type: Test > Components: io-py-gcp >Reporter: Shoaib Zafar >Assignee: Shoaib Zafar >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read > from the spanner. Currently, it only contains direct runner unit tests. In > order to make this functionality available for the users, integration tests > also need to be added. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9449) Consider passing pipeline options for expansion service.
[ https://issues.apache.org/jira/browse/BEAM-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9449 started by Brian Hulette. --- > Consider passing pipeline options for expansion service. > > > Key: BEAM-9449 > URL: https://issues.apache.org/jira/browse/BEAM-9449 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Brian Hulette >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9449) Consider passing pipeline options for expansion service.
[ https://issues.apache.org/jira/browse/BEAM-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-9449: Status: Open (was: Triage Needed) > Consider passing pipeline options for expansion service. > > > Key: BEAM-9449 > URL: https://issues.apache.org/jira/browse/BEAM-9449 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Brian Hulette >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9449) Consider passing pipeline options for expansion service.
[ https://issues.apache.org/jira/browse/BEAM-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned BEAM-9449: --- Assignee: Brian Hulette > Consider passing pipeline options for expansion service. > > > Key: BEAM-9449 > URL: https://issues.apache.org/jira/browse/BEAM-9449 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Robert Bradshaw >Assignee: Brian Hulette >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9848) Pass caller pipeline options to expansion service
[ https://issues.apache.org/jira/browse/BEAM-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette closed BEAM-9848. --- Fix Version/s: 2.22.0 Resolution: Duplicate > Pass caller pipeline options to expansion service > - > > Key: BEAM-9848 > URL: https://issues.apache.org/jira/browse/BEAM-9848 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.22.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9841) PortableRunner does not support wait_until_finish(duration=...)
[ https://issues.apache.org/jira/browse/BEAM-9841?focusedWorklogId=428429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428429 ] ASF GitHub Bot logged work on BEAM-9841: Author: ASF GitHub Bot Created on: 28/Apr/20 23:15 Start Date: 28/Apr/20 23:15 Worklog Time Spent: 10m Work Description: ibzib commented on a change in pull request #11556: URL: https://github.com/apache/beam/pull/11556#discussion_r416974908 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -433,13 +435,12 @@ def run_pipeline(self, pipeline, options): state_stream, cleanup_callbacks) if cleanup_callbacks: - # We wait here to ensure that we run the cleanup callbacks. + # Register an exit handler to ensure cleanup on exit. + atexit.register(functools.partial(result._cleanup, on_exit=True)) _LOGGER.info( - 'Waiting until the pipeline has finished because the ' - 'environment "%s" has started a component necessary for the ' - 'execution.', + 'Environment "%s" has started a component necessary for the ' + 'execution. Be sure to call wait_until_finish()', Review comment: We prefer the use of Python's `with` syntax instead of calling `wait_until_finish` explicitly. ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -535,7 +537,7 @@ def _last_error_message(self): else: return 'unknown error' - def wait_until_finish(self): + def wait_until_finish(self, duration=None): Review comment: A comment explaining the units for duration (milliseconds?) and that `duration=None` actually means "wait forever" would be helpful. (I find this naming somewhat counter-intuitive, but it's too late to change now.) ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -557,23 +559,47 @@ def read_messages(): t.daemon = True t.start() +if duration: + t2 = threading.Thread( Review comment: Please give `t` and `t2` descriptive variable names. ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -557,23 +559,47 @@ def read_messages(): t.daemon = True t.start() +if duration: + t2 = threading.Thread( + target=functools.partial(self._observe, t), + name='wait_until_finish_state_observer') + t2.daemon = True + t2.start() + start_time = time.time() + duration_secs = duration / 1000 + while time.time() - start_time < duration_secs and t2.is_alive(): +time.sleep(1) +else: + self._observe(t) + +if self._runtime_exception: + raise self._runtime_exception + +return self._state + + def _observe(self, message_thread): try: for state_response in self._state_stream: self._state = self._runner_api_state_to_pipeline_state( state_response.state) if state_response.state in TERMINAL_STATES: # Wait for any last messages. - t.join(10) + message_thread.join(10) break if self._state != runner.PipelineState.DONE: -raise RuntimeError( +self._runtime_exception = RuntimeError( 'Pipeline %s failed in state %s: %s' % (self._job_id, self._state, self._last_error_message())) - return self._state +except Exception as e: + self._runtime_exception = e finally: self._cleanup() - def _cleanup(self): + def _cleanup(self, on_exit=False): +if on_exit and self._cleanup_callbacks: + _LOGGER.info( + 'Running cleanup on exit. If your local pipeline should continue ' + 'running, be sure to call pipeline.run().wait_until_finish().') Review comment: `with` (see above comment) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428429) Time Spent: 20m (was: 10m) > PortableRunner does not support wait_until_finish(duration=...) > --- > > Key: BEAM-9841 > URL: https://issues.apache.org/jira/browse/BEAM-9841 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Other runners in the Python SDK support waiting for a finite amount of time
[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python
[ https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=428430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428430 ] ASF GitHub Bot logged work on BEAM-8949: Author: ASF GitHub Bot Created on: 28/Apr/20 23:15 Start Date: 28/Apr/20 23:15 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11210: URL: https://github.com/apache/beam/pull/11210#issuecomment-620903075 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428430) Time Spent: 7.5h (was: 7h 20m) > Add Spanner IO Integration Test for Python > -- > > Key: BEAM-8949 > URL: https://issues.apache.org/jira/browse/BEAM-8949 > Project: Beam > Issue Type: Test > Components: io-py-gcp >Reporter: Shoaib Zafar >Assignee: Shoaib Zafar >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read > from the spanner. Currently, it only contains direct runner unit tests. In > order to make this functionality available for the users, integration tests > also need to be added. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9848) Pass caller pipeline options to expansion service
Brian Hulette created BEAM-9848: --- Summary: Pass caller pipeline options to expansion service Key: BEAM-9848 URL: https://issues.apache.org/jira/browse/BEAM-9848 Project: Beam Issue Type: Improvement Components: beam-model Reporter: Brian Hulette Assignee: Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO
[ https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=428428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428428 ] ASF GitHub Bot logged work on BEAM-9008: Author: ASF GitHub Bot Created on: 28/Apr/20 23:05 Start Date: 28/Apr/20 23:05 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #10546: URL: https://github.com/apache/beam/pull/10546#discussion_r416976043 ## File path: sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java ## @@ -391,36 +393,57 @@ static Cluster getCluster( .withSplitCount(splitCount) .withMapperFactoryFn(this.mapperFactoryFn()); -if (isMurmur3Partitioner(cluster)) { - LOG.info("Murmur3Partitioner detected, splitting"); - - List tokens = - cluster.getMetadata().getTokenRanges().stream() - .map(tokenRange -> new BigInteger(tokenRange.getEnd().getValue().toString())) - .collect(Collectors.toList()); - - SplitGenerator splitGenerator = - new SplitGenerator(cluster.getMetadata().getPartitioner()); - - List> splits = - splitGenerator.generateSplits(splitCount, tokens).stream() - .map(rr -> CassandraIO.read().withRingRange(rr)) - .collect(Collectors.toList()); +return input +.apply(Create.of(this)) +.apply(ParDo.of(new SplitFn())) +.setCoder(SerializableCoder.of(new TypeDescriptor>() {})) +.apply(Reshuffle.viaRandomKey()) +.apply(readAll); Review comment: Now we need to tackle this in two parts maybe, one is to implement the read with a `ReadFn` like method and as a next step to get rid of all the methods on `ReadAll` to simplify it to its core. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428428) Time Spent: 12h (was: 11h 50m) > Add readAll() method to CassandraIO > --- > > Key: BEAM-9008 > URL: https://issues.apache.org/jira/browse/BEAM-9008 > Project: Beam > Issue Type: New Feature > Components: io-java-cassandra >Affects Versions: 2.16.0 >Reporter: vincent marquez >Assignee: vincent marquez >Priority: Minor > Time Spent: 12h > Remaining Estimate: 0h > > When querying a large cassandra database, it's often *much* more useful to > programatically generate the queries needed to to be run rather than reading > all partitions and attempting some filtering. > As an example: > {code:java} > public class Event { >@PartitionKey(0) public UUID accountId; >@PartitionKey(1)public String yearMonthDay; >@ClusteringKey public UUID eventId; >//other data... > }{code} > If there is ten years worth of data, you may want to only query one year's > worth. Here each token range would represent one 'token' but all events for > the day. > {code:java} > Set accounts = getRelevantAccounts(); > Set dateRange = generateDateRange("2018-01-01", "2019-01-01"); > PCollection tokens = generateTokens(accounts, dateRange); > {code} > > I propose an additional _readAll()_ PTransform that can take a PCollection > of token ranges and can return a PCollection of what the query would > return. > *Question: How much code should be in common between both methods?* > Currently the read connector already groups all partitions into a List of > Token Ranges, so it would be simple to refactor the current read() based > method to a 'ParDo' based one and have them both share the same function. > Reasons against sharing code between read and readAll > * Not having the read based method return a BoundedSource connector would > mean losing the ability to know the size of the data returned > * Currently the CassandraReader executes all the grouped TokenRange queries > *asynchronously* which is (maybe?) fine when all that's happening is > splitting up all the partition ranges but terrible for executing potentially > millions of queries. > Reasons _for_ sharing code would be simplified code base and that both of > the above issues would most likely have a negligable performance impact. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428426 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 28/Apr/20 23:05 Start Date: 28/Apr/20 23:05 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on a change in pull request #11548: URL: https://github.com/apache/beam/pull/11548#discussion_r416975896 ## File path: sdks/java/container/build.gradle ## @@ -101,16 +84,44 @@ docker { project.rootProject["docker-tag"] : project.sdk_version) dockerfile project.file("./${dockerfileName}") files "./build/" + buildArgs(['pull_licenses': !project.rootProject.hasProperty(["no-licenses"])]) Review comment: The purpose of checking urls was to make sure files can be pulled from the urls when create images. If we set it to skip, no url checking process happened for the PRs and we may see many license issues when we create release images. Then release managers need to fix the issues. If we add the checking urls for each PR, contributors should fix the issues if any and will not have to fix them during release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428426) Time Spent: 25h (was: 24h 50m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 25h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428427 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 28/Apr/20 23:05 Start Date: 28/Apr/20 23:05 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on a change in pull request #11548: URL: https://github.com/apache/beam/pull/11548#discussion_r416976074 ## File path: sdks/java/container/build.gradle ## @@ -101,16 +84,44 @@ docker { project.rootProject["docker-tag"] : project.sdk_version) dockerfile project.file("./${dockerfileName}") files "./build/" + buildArgs(['pull_licenses': !project.rootProject.hasProperty(["no-licenses"])]) Review comment: There are some more discussion at the thread today, in case you haven't seen it yet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428427) Time Spent: 25h 10m (was: 25h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 25h 10m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO
[ https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=428425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428425 ] ASF GitHub Bot logged work on BEAM-9008: Author: ASF GitHub Bot Created on: 28/Apr/20 23:04 Start Date: 28/Apr/20 23:04 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #10546: URL: https://github.com/apache/beam/pull/10546#discussion_r416975426 ## File path: sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java ## @@ -391,36 +393,57 @@ static Cluster getCluster( .withSplitCount(splitCount) .withMapperFactoryFn(this.mapperFactoryFn()); -if (isMurmur3Partitioner(cluster)) { - LOG.info("Murmur3Partitioner detected, splitting"); - - List tokens = - cluster.getMetadata().getTokenRanges().stream() - .map(tokenRange -> new BigInteger(tokenRange.getEnd().getValue().toString())) - .collect(Collectors.toList()); - - SplitGenerator splitGenerator = - new SplitGenerator(cluster.getMetadata().getPartitioner()); - - List> splits = - splitGenerator.generateSplits(splitCount, tokens).stream() - .map(rr -> CassandraIO.read().withRingRange(rr)) - .collect(Collectors.toList()); +return input +.apply(Create.of(this)) +.apply(ParDo.of(new SplitFn())) +.setCoder(SerializableCoder.of(new TypeDescriptor>() {})) +.apply(Reshuffle.viaRandomKey()) +.apply(readAll); + } +} - return input.apply("Creating splits", Create.of(splits)).apply("readAll", readAll); +private static class SplitFn extends DoFn, Read> { Review comment: This looks perfect! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428425) Time Spent: 11h 50m (was: 11h 40m) > Add readAll() method to CassandraIO > --- > > Key: BEAM-9008 > URL: https://issues.apache.org/jira/browse/BEAM-9008 > Project: Beam > Issue Type: New Feature > Components: io-java-cassandra >Affects Versions: 2.16.0 >Reporter: vincent marquez >Assignee: vincent marquez >Priority: Minor > Time Spent: 11h 50m > Remaining Estimate: 0h > > When querying a large cassandra database, it's often *much* more useful to > programatically generate the queries needed to to be run rather than reading > all partitions and attempting some filtering. > As an example: > {code:java} > public class Event { >@PartitionKey(0) public UUID accountId; >@PartitionKey(1)public String yearMonthDay; >@ClusteringKey public UUID eventId; >//other data... > }{code} > If there is ten years worth of data, you may want to only query one year's > worth. Here each token range would represent one 'token' but all events for > the day. > {code:java} > Set accounts = getRelevantAccounts(); > Set dateRange = generateDateRange("2018-01-01", "2019-01-01"); > PCollection tokens = generateTokens(accounts, dateRange); > {code} > > I propose an additional _readAll()_ PTransform that can take a PCollection > of token ranges and can return a PCollection of what the query would > return. > *Question: How much code should be in common between both methods?* > Currently the read connector already groups all partitions into a List of > Token Ranges, so it would be simple to refactor the current read() based > method to a 'ParDo' based one and have them both share the same function. > Reasons against sharing code between read and readAll > * Not having the read based method return a BoundedSource connector would > mean losing the ability to know the size of the data returned > * Currently the CassandraReader executes all the grouped TokenRange queries > *asynchronously* which is (maybe?) fine when all that's happening is > splitting up all the partition ranges but terrible for executing potentially > millions of queries. > Reasons _for_ sharing code would be simplified code base and that both of > the above issues would most likely have a negligable performance impact. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9847) Verify If Triggering allows emitting eager results when processing a single element in HL7v2IO.
Jacob Ferriero created BEAM-9847: Summary: Verify If Triggering allows emitting eager results when processing a single element in HL7v2IO. Key: BEAM-9847 URL: https://issues.apache.org/jira/browse/BEAM-9847 Project: Beam Issue Type: Task Components: io-java-gcp Reporter: Jacob Ferriero Assignee: Jacob Ferriero Due to the nature of the HL7v2 API, HL7v2IO.ListHL7v2Messages follows the pattern of [paginating through all ListMessages results in a single ProcessElement call.|#diff-9cd595f078378218ccc01ce7e19ca766R447]] Upon testing with customer against HL7v2 store with 350k messages we observed that the ListMessages transform was not outputting any elements "hanging" for a long time (which was assumed to be the single thread paginating through all the results). We added the following triggering in hopes that it would emit early results: {code:java} .apply( Window.into(new GlobalWindows()) .triggering( AfterWatermark.pastEndOfWindow() .withEarlyFirings( AfterProcessingTime.pastFirstElementInPane() .plusDelayOf(Duration.standardSeconds(1 .discardingFiredPanes()) {code} Our tests with this triggering seemed to indicate that it did not "hang" like the first test and seemed to output a more steady stream of elements. Reviewer states that bundles must be committed atomically so no output elements of a (single process element call) can proceed to downstream stages until all output elements for that process element call are ready. There may be other things at play here. Will seek to reproduce in a way that definitively confirms output elements can be eagerly output during the execution of a single process element call before it completes. CC: [~pabloem] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9285) Add Postcommit ValidatesRunner CI Job for Flink on Java 11
[ https://issues.apache.org/jira/browse/BEAM-9285?focusedWorklogId=428418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428418 ] ASF GitHub Bot logged work on BEAM-9285: Author: ASF GitHub Bot Created on: 28/Apr/20 22:40 Start Date: 28/Apr/20 22:40 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #11480: URL: https://github.com/apache/beam/pull/11480#issuecomment-620892620 Run Flink ValidatesRunner Java 11 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428418) Time Spent: 3h 50m (was: 3h 40m) > Add Postcommit ValidatesRunner CI Job for Flink on Java 11 > -- > > Key: BEAM-9285 > URL: https://issues.apache.org/jira/browse/BEAM-9285 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: Not applicable > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Flink 1.10 runner introduces support for Java 11. We need to add a job in the > CI that covers the complete suite of Flink 1.10 runner tests on Java 11. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9285) Add Postcommit ValidatesRunner CI Job for Flink on Java 11
[ https://issues.apache.org/jira/browse/BEAM-9285?focusedWorklogId=428417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428417 ] ASF GitHub Bot logged work on BEAM-9285: Author: ASF GitHub Bot Created on: 28/Apr/20 22:40 Start Date: 28/Apr/20 22:40 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #11480: URL: https://github.com/apache/beam/pull/11480#issuecomment-620892562 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428417) Time Spent: 3h 40m (was: 3.5h) > Add Postcommit ValidatesRunner CI Job for Flink on Java 11 > -- > > Key: BEAM-9285 > URL: https://issues.apache.org/jira/browse/BEAM-9285 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: Not applicable > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Flink 1.10 runner introduces support for Java 11. We need to add a job in the > CI that covers the complete suite of Flink 1.10 runner tests on Java 11. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428415 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 28/Apr/20 22:30 Start Date: 28/Apr/20 22:30 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #11548: URL: https://github.com/apache/beam/pull/11548#discussion_r416962452 ## File path: sdks/java/container/build.gradle ## @@ -101,16 +84,44 @@ docker { project.rootProject["docker-tag"] : project.sdk_version) dockerfile project.file("./${dockerfileName}") files "./build/" + buildArgs(['pull_licenses': !project.rootProject.hasProperty(["no-licenses"])]) Review comment: Right, so to produce lightweight images we can skip license download. Why would we ever want to 'check' urls but not downloading the licenses? Quoting your comment on the mailing list: > I tried to check if urls are valid instead of pulling the files and it > reduced only 1 min of running time. So, it's not an option here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428415) Time Spent: 24h 50m (was: 24h 40m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 24h 50m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428414 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 28/Apr/20 22:29 Start Date: 28/Apr/20 22:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #11548: URL: https://github.com/apache/beam/pull/11548#discussion_r416962452 ## File path: sdks/java/container/build.gradle ## @@ -101,16 +84,44 @@ docker { project.rootProject["docker-tag"] : project.sdk_version) dockerfile project.file("./${dockerfileName}") files "./build/" + buildArgs(['pull_licenses': !project.rootProject.hasProperty(["no-licenses"])]) Review comment: Right, so to produce lightweight images we can skip license download. Why would we ever want to 'check' urls but not downloading the licenses? From your comment: ``` I tried to check if urls are valid instead of pulling the files and it reduced only 1 min of running time. So, it's not an option here. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428414) Time Spent: 24h 40m (was: 24.5h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 24h 40m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=428409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428409 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 28/Apr/20 22:18 Start Date: 28/Apr/20 22:18 Worklog Time Spent: 10m Work Description: jaketf commented on a change in pull request #11339: URL: https://github.com/apache/beam/pull/11339#discussion_r416957719 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java ## @@ -0,0 +1,977 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.healthcare; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.api.services.healthcare.v1beta1.model.HttpBody; +import com.google.api.services.healthcare.v1beta1.model.Operation; +import com.google.auto.value.AutoValue; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.UUID; +import java.util.concurrent.ThreadLocalRandom; +import javax.annotation.Nullable; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupIntoBatches; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.WithKeys; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.util.Sleeper; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionTuple; +import org.apache.beam.sdk.values.PInput; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.PValue; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.sdk.values.TupleTagList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; +import org.codehaus.jackson.JsonProcessingException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link FhirIO} provides an API for reading and writing resources to https://cloud.google.com/healthcare/docs/concepts/fhir;>Google Cloud Healthcare Fhir API. + * + * + * Read + * + * FHIR resources can be read with {@link FhirIO.Read} supports use cases where you have a + * ${@link PCollection} of message IDS. This is appropriate for reading the Fhir notifications from + * a Pub/Sub subscription with {@link PubsubIO#readStrings()} or in cases where you have a manually + * prepared list of messages that you need to process (e.g. in a text file read with {@link + * org.apache.beam.sdk.io.TextIO}*) . + * + * Fetch Resource contents from Fhir Store based on the {@link PCollection} of message ID strings + * {@link FhirIO.Read.Result} where one can call {@link Read.Result#getResources()} to retrieved a + * {@link PCollection} containing the successfully fetched {@link HttpBody}s and/or {@link + * FhirIO.Read.Result#getFailedReads()}* to retrieve a {@link PCollection} of {@link + * HealthcareIOError}* containing the resource ID that could not be fetched and the exception as a + * {@link HealthcareIOError}, this can be used to write to the dead letter storage system of your + * choosing. This error handling is mainly to transparently
[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements
[ https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428408 ] ASF GitHub Bot logged work on BEAM-9831: Author: ASF GitHub Bot Created on: 28/Apr/20 22:16 Start Date: 28/Apr/20 22:16 Worklog Time Spent: 10m Work Description: jaketf commented on a change in pull request #11538: URL: https://github.com/apache/beam/pull/11538#discussion_r416957276 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java ## @@ -437,6 +444,20 @@ private Message fetchMessage(HealthcareApiClient client, String msgId) .apply(Create.of(this.hl7v2Stores)) .apply(ParDo.of(new ListHL7v2MessagesFn(this.filter))) .setCoder(new HL7v2MessageCoder()) + // Listing takes a long time for each input element (HL7v2 store) because it has to + // paginate through results in a single thread / ProcessElement call in order to keep + // track of page token. + // Eagerly emit data on 1 second intervals so downstream processing can get started before + // all of the list results have been paginated through. Review comment: Each "page" of responses is a collection of messages. It don't think it make sense to page through all the pages (dropping the real data) to then re-fetch it in the downstream parallelized step. In testing w/ customer when pointing at an HL7v2 store with many, many messages (and therefore pages) they reported before this change: there was a long time before any elements were output. so long that they gave up and killed the pipeline. after this change: there was data coming out more regularly. This could have been a misunderstanding or a bad test scenario. I will try to come up with a test that reproduces this behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428408) Time Spent: 1h 10m (was: 1h) > HL7v2IO Improvements > > > Key: BEAM-9831 > URL: https://issues.apache.org/jira/browse/BEAM-9831 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > # HL7v2MessageCoder constructor should be public for use by end users > # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list > messages results before emitting any output data elements (due to high fan > out from a single input element). We should add early firings so that > downstream processing can proceed on early pages while later pages are still > being scrolled through. > # We should drop all output only fields of HL7v2Message and only keep data > and labels when calling ingestMessages, rather than expecting the user to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9699) Add ability to use ZetaSQL in Python SqlTransform
[ https://issues.apache.org/jira/browse/BEAM-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned BEAM-9699: --- Assignee: Brian Hulette > Add ability to use ZetaSQL in Python SqlTransform > - > > Key: BEAM-9699 > URL: https://issues.apache.org/jira/browse/BEAM-9699 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > > This may just work when the [plannerName pipeline > option|https://github.com/apache/beam/blob/1e52e4298085eda8e88e1215c7a73d52658b31f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29] > is exposed to Python -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9699) Add ability to use ZetaSQL in Python SqlTransform
[ https://issues.apache.org/jira/browse/BEAM-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9699 started by Brian Hulette. --- > Add ability to use ZetaSQL in Python SqlTransform > - > > Key: BEAM-9699 > URL: https://issues.apache.org/jira/browse/BEAM-9699 > Project: Beam > Issue Type: Improvement > Components: dsl-sql, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > > This may just work when the [plannerName pipeline > option|https://github.com/apache/beam/blob/1e52e4298085eda8e88e1215c7a73d52658b31f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29] > is exposed to Python -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=428401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428401 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 28/Apr/20 21:57 Start Date: 28/Apr/20 21:57 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #11477: URL: https://github.com/apache/beam/pull/11477#issuecomment-620876614 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428401) Time Spent: 8.5h (was: 8h 20m) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support
[ https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=428402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428402 ] ASF GitHub Bot logged work on BEAM-9650: Author: ASF GitHub Bot Created on: 28/Apr/20 21:57 Start Date: 28/Apr/20 21:57 Worklog Time Spent: 10m Work Description: Ardagan removed a comment on pull request #11477: URL: https://github.com/apache/beam/pull/11477#issuecomment-620876614 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428402) Time Spent: 8h 40m (was: 8.5h) > Add consistent slowly changing side inputs support > -- > > Key: BEAM-9650 > URL: https://issues.apache.org/jira/browse/BEAM-9650 > Project: Beam > Issue Type: Bug > Components: io-ideas >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > Add implementation for slowly changing dimentions based on [design > doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9643) Add user-facing Go SDF documentation.
[ https://issues.apache.org/jira/browse/BEAM-9643?focusedWorklogId=428400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428400 ] ASF GitHub Bot logged work on BEAM-9643: Author: ASF GitHub Bot Created on: 28/Apr/20 21:54 Start Date: 28/Apr/20 21:54 Worklog Time Spent: 10m Work Description: lostluck commented on a change in pull request #11517: URL: https://github.com/apache/beam/pull/11517#discussion_r416930600 ## File path: sdks/go/examples/stringsplit/offsetrange/offsetrange.go ## @@ -89,21 +89,23 @@ func (tracker *Tracker) GetError() error { } // TrySplit splits at the nearest integer greater than the given fraction of the remainder. -func (tracker *Tracker) TrySplit(fraction float64) (interface{}, error) { +func (tracker *Tracker) TrySplit(fraction float64) (interface{}, interface{}, error) { if tracker.Stopped || tracker.IsDone() { - return nil, nil + return tracker.Rest, nil, nil } - if fraction < 0 || fraction > 1 { - return nil, errors.New("fraction must be in range [0, 1]") + if fraction < 0 { + fraction = 0 Review comment: Might be worth documenting this behavior in the comment. ## File path: sdks/go/examples/stringsplit/offsetrange/offsetrange.go ## @@ -89,21 +89,23 @@ func (tracker *Tracker) GetError() error { } // TrySplit splits at the nearest integer greater than the given fraction of the remainder. -func (tracker *Tracker) TrySplit(fraction float64) (interface{}, error) { +func (tracker *Tracker) TrySplit(fraction float64) (interface{}, interface{}, error) { Review comment: Given this is the example, using named return values (primary, residual ...) is appropriate here for documentation purposes (but not so one can use an empty return.) ## File path: sdks/go/pkg/beam/pardo.go ## @@ -222,6 +222,87 @@ func ParDo0(s Scope, dofn interface{}, col PCollection, opts ...Option) { // DoFn instance via output PCollections, in the absence of external // communication mechanisms written by user code. // +// Splittable DoFns (Experimental) +// +// Warning: Splittable DoFns are still experimental, largely untested, and +// likely to have bugs. +// +// Splittable DoFns are DoFns that are able to split work within an element, +// as opposed to only at element boundaries like normal DoFns. This is useful +// for DoFns that emit many outputs per input element and can distribute that +// work among multiple workers. The most common examples of this are sources. +// +// In order to split work within an element, splittable DoFns use the concept of +// restrictions, which are objects that are associated with an element and +// describe a portion of work on that element. For example, a restriction +// associated with a filename might describe what byte range within that file to +// process. In addition to restrictions, splittable DoFns also rely on +// restriction trackers to track progress and perform splits on a restriction +// currently being processed. See the `RTracker` interface in core/sdf/sdf.go +// for more details. +// +// Splitting +// +// Splitting means taking one restriction and splitting into two or more that +// cover the entire input space of the original one. In other words, processing +// all the split restrictions should produce identical output to processing +// the original one. +// +// Splitting occurs in two stages. The initial splitting occurs before any +// restrictions have started processing. This step is used to split large +// restrictions into smaller ones that can then be distributed among multiple +// workers for processing. Initial splitting is user-defined and optional. +// +// Dynamic splitting occurs during the processing of a restriction in runners +// that have implemented it. If there are available workers, runners may split +// the unprocessed portion of work from a busy worker and shard it to available +// workers in order to better distribute work. With unsplittable DoFns this can +// only occur on element boundaries, but for splittable DoFns this split +// can land within a restriction and will require splitting that restriction. +// +// * Note: The Go SDK currently does not support dynamic splitting for SDFs, +// only initial splitting. Only initially split restrictions can be +// distributed by liquid sharding. Stragglers will not be split during +// execution with dynamic splitting. +// +// Splittable DoFn Methods +// +// Making a splittable DoFn requires the following methods to be implemented on +// a DoFn in addition to the usual DoFn requirements. In the following +// method signatures `elem` represents the main input elements to the DoFn, and +// should match the types used in ProcessElement. `restriction` represents the +// user-defined restriction, and can
[jira] [Work logged] (BEAM-9393) support schemas in state API
[ https://issues.apache.org/jira/browse/BEAM-9393?focusedWorklogId=428397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428397 ] ASF GitHub Bot logged work on BEAM-9393: Author: ASF GitHub Bot Created on: 28/Apr/20 21:51 Start Date: 28/Apr/20 21:51 Worklog Time Spent: 10m Work Description: dpmills commented on pull request #10983: URL: https://github.com/apache/beam/pull/10983#issuecomment-620874181 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428397) Time Spent: 50m (was: 40m) > support schemas in state API > > > Key: BEAM-9393 > URL: https://issues.apache.org/jira/browse/BEAM-9393 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9393) support schemas in state API
[ https://issues.apache.org/jira/browse/BEAM-9393?focusedWorklogId=428394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428394 ] ASF GitHub Bot logged work on BEAM-9393: Author: ASF GitHub Bot Created on: 28/Apr/20 21:40 Start Date: 28/Apr/20 21:40 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #10983: URL: https://github.com/apache/beam/pull/10983#issuecomment-620869944 @dpmills any comments on this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428394) Time Spent: 0.5h (was: 20m) > support schemas in state API > > > Key: BEAM-9393 > URL: https://issues.apache.org/jira/browse/BEAM-9393 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=428389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428389 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 28/Apr/20 21:28 Start Date: 28/Apr/20 21:28 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11039: URL: https://github.com/apache/beam/pull/11039#issuecomment-620864937 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428389) Time Spent: 6.5h (was: 6h 20m) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=428387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428387 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 28/Apr/20 21:27 Start Date: 28/Apr/20 21:27 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11039: URL: https://github.com/apache/beam/pull/11039#issuecomment-620864665 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428387) Time Spent: 6h 10m (was: 6h) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=428388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428388 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 28/Apr/20 21:27 Start Date: 28/Apr/20 21:27 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11039: URL: https://github.com/apache/beam/pull/11039#issuecomment-620864801 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428388) Time Spent: 6h 20m (was: 6h 10m) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6661) FnApi gRPC setup/teardown glitch
[ https://issues.apache.org/jira/browse/BEAM-6661?focusedWorklogId=428386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428386 ] ASF GitHub Bot logged work on BEAM-6661: Author: ASF GitHub Bot Created on: 28/Apr/20 21:26 Start Date: 28/Apr/20 21:26 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11537: URL: https://github.com/apache/beam/pull/11537#issuecomment-620864491 > @ibzib Maybe also worth backporting? Might as well. `SEVERE: *~*~*~ Beam is horribly broken! *~*~*~` definitely tends to frighten new users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428386) Time Spent: 1h 10m (was: 1h) > FnApi gRPC setup/teardown glitch > > > Key: BEAM-6661 > URL: https://issues.apache.org/jira/browse/BEAM-6661 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Affects Versions: 2.11.0 >Reporter: Heejong Lee >Priority: Major > Fix For: 2.22.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Multiple exceptions are observed during FnApi gRPC setup/teardown. The > examples are > {noformat} > 14:53:22 [grpc-default-executor-1] WARN > org.apache.beam.runners.fnexecution.logging.GrpcLoggingService - Logging > client failed unexpectedly. > 14:53:22 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: > CANCELLED: cancelled before receiving half close > 14:53:22 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517) > 14:53:22 at{noformat} > {noformat} > > 14:52:56 [main] INFO org.apache.beam.runners.flink.FlinkJobServerDriver - > JobService started on localhost:58179 > 14:52:57 [grpc-default-executor-0] ERROR > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - > Encountered Unexpected Exception for Invocation > job_bfb7df0e-408e-4bfd-bb3c-432e946ca819 > 14:52:57 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: > NOT_FOUND > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534) > 14:52:57 at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341) > 14:52:57 at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262) > 14:52:57 at > org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:693) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > 14:52:57 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 14:52:57 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > {noformat} > {noformat} > 14:54:50 Feb 07, 2019 10:54:50 PM > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > 14:54:50 SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=628, > target=localhost:41409} was not shutdown properly!!! ~*~*~* > 14:54:50 Make sure to call shutdown()/shutdownNow() and wait until > awaitTermination() returns true. > 14:54:50 java.lang.RuntimeException: ManagedChannel allocation site > 14:54:50
[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.
[ https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428385=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428385 ] ASF GitHub Bot logged work on BEAM-9802: Author: ASF GitHub Bot Created on: 28/Apr/20 21:24 Start Date: 28/Apr/20 21:24 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11495: URL: https://github.com/apache/beam/pull/11495#issuecomment-620863418 Modulo the conflict. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428385) Time Spent: 50m (was: 40m) > Provide a way to customize automatically started services. > -- > > Key: BEAM-9802 > URL: https://issues.apache.org/jira/browse/BEAM-9802 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > This can be useful for testing and alternative production environments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements
[ https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428384 ] ASF GitHub Bot logged work on BEAM-9831: Author: ASF GitHub Bot Created on: 28/Apr/20 21:23 Start Date: 28/Apr/20 21:23 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11538: URL: https://github.com/apache/beam/pull/11538#issuecomment-620862736 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428384) Time Spent: 1h (was: 50m) > HL7v2IO Improvements > > > Key: BEAM-9831 > URL: https://issues.apache.org/jira/browse/BEAM-9831 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > # HL7v2MessageCoder constructor should be public for use by end users > # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list > messages results before emitting any output data elements (due to high fan > out from a single input element). We should add early firings so that > downstream processing can proceed on early pages while later pages are still > being scrolled through. > # We should drop all output only fields of HL7v2Message and only keep data > and labels when calling ingestMessages, rather than expecting the user to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies
[ https://issues.apache.org/jira/browse/BEAM-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094883#comment-17094883 ] Ismaël Mejía edited comment on BEAM-4087 at 4/28/20, 9:20 PM: -- This is still an issue even if the PR / approach was not merged. We still do not have a practical way to test multiple provided versions of a dependency, and this is probably the only issue bugging me still after 2 years of the move to gradle, we still work by faith based validation for things like multi version compatibility of Kafka and Spark. It is a pity because this is so simple to do in maven that still surprises me that it demands so much effort to get it done in gradle. I still think it is worth to explore a general way to tackle this because multiple modules may require it, or is there some new simpler way to do it now that I missed? was (Author: iemejia): This is still an issue even if the PR / approach was not merged. We still do not have a practical way to test multiple provided versions of a dependency, and this is probably the only issue bugging me still after 2 years of the move to gradle, we still work by faith based validation for things like multi version compatibility of Kafka and Spark. It is a pity because this is so simple to do in maven that still surprises me that it demands so much effort to get it done in gradle. I still think it is worth to explore a way to tackle this, or is there some new simpler way to do it now that I missed? > Gradle build does not allow to overwrite versions of provided dependencies > -- > > Key: BEAM-4087 > URL: https://issues.apache.org/jira/browse/BEAM-4087 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.5.0 >Reporter: Ismaël Mejía >Priority: Major > Labels: gradle > Fix For: Not applicable > > Time Spent: 4h 50m > Remaining Estimate: 0h > > In order to test modules with provided dependencies in maven we can execute > for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 > -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this > with gradle because the version of the dependencies are defined locally and > not in the gradle.properties. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies
[ https://issues.apache.org/jira/browse/BEAM-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reopened BEAM-4087: This is still an issue even if the PR / approach was not merged. We still do not have a practical way to test multiple provided versions of a dependency, and this is probably the only issue bugging me still after 2 years of the move to gradle, we still work by faith based validation for things like multi version compatibility of Kafka and Spark. It is a pity because this is so simple to do in maven that still surprises me that it demands so much effort to get it done in gradle. I still think it is worth to explore a way to tackle this, or is there some new simpler way to do it now that I missed? > Gradle build does not allow to overwrite versions of provided dependencies > -- > > Key: BEAM-4087 > URL: https://issues.apache.org/jira/browse/BEAM-4087 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.5.0 >Reporter: Ismaël Mejía >Priority: Major > Labels: gradle > Fix For: Not applicable > > Time Spent: 4h 50m > Remaining Estimate: 0h > > In order to test modules with provided dependencies in maven we can execute > for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 > -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this > with gradle because the version of the dependencies are defined locally and > not in the gradle.properties. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies
[ https://issues.apache.org/jira/browse/BEAM-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-4087: --- Status: Open (was: Triage Needed) > Gradle build does not allow to overwrite versions of provided dependencies > -- > > Key: BEAM-4087 > URL: https://issues.apache.org/jira/browse/BEAM-4087 > Project: Beam > Issue Type: Improvement > Components: build-system >Affects Versions: 2.5.0 >Reporter: Ismaël Mejía >Priority: Major > Labels: gradle > Fix For: Not applicable > > Time Spent: 4h 50m > Remaining Estimate: 0h > > In order to test modules with provided dependencies in maven we can execute > for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 > -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this > with gradle because the version of the dependencies are defined locally and > not in the gradle.properties. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428381 ] ASF GitHub Bot logged work on BEAM-9815: Author: ASF GitHub Bot Created on: 28/Apr/20 21:18 Start Date: 28/Apr/20 21:18 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11524: URL: https://github.com/apache/beam/pull/11524#issuecomment-620860334 "No artifacts staged" state has been verified. Merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428381) Time Spent: 2h 40m (was: 2.5h) > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 2h 40m > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements
[ https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428383 ] ASF GitHub Bot logged work on BEAM-9831: Author: ASF GitHub Bot Created on: 28/Apr/20 21:18 Start Date: 28/Apr/20 21:18 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11538: URL: https://github.com/apache/beam/pull/11538#issuecomment-620860467 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428383) Time Spent: 50m (was: 40m) > HL7v2IO Improvements > > > Key: BEAM-9831 > URL: https://issues.apache.org/jira/browse/BEAM-9831 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > # HL7v2MessageCoder constructor should be public for use by end users > # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list > messages results before emitting any output data elements (due to high fan > out from a single input element). We should add early firings so that > downstream processing can proceed on early pages while later pages are still > being scrolled through. > # We should drop all output only fields of HL7v2Message and only keep data > and labels when calling ingestMessages, rather than expecting the user to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements
[ https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428380 ] ASF GitHub Bot logged work on BEAM-9831: Author: ASF GitHub Bot Created on: 28/Apr/20 21:17 Start Date: 28/Apr/20 21:17 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11538: URL: https://github.com/apache/beam/pull/11538#issuecomment-620859719 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428380) Time Spent: 40m (was: 0.5h) > HL7v2IO Improvements > > > Key: BEAM-9831 > URL: https://issues.apache.org/jira/browse/BEAM-9831 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > # HL7v2MessageCoder constructor should be public for use by end users > # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list > messages results before emitting any output data elements (due to high fan > out from a single input element). We should add early firings so that > downstream processing can proceed on early pages while later pages are still > being scrolled through. > # We should drop all output only fields of HL7v2Message and only keep data > and labels when calling ingestMessages, rather than expecting the user to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements
[ https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428379 ] ASF GitHub Bot logged work on BEAM-9831: Author: ASF GitHub Bot Created on: 28/Apr/20 21:16 Start Date: 28/Apr/20 21:16 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #11538: URL: https://github.com/apache/beam/pull/11538#discussion_r416060393 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java ## @@ -475,7 +497,14 @@ public void initClient() throws IOException { public void listMessages(ProcessContext context) throws IOException { String hl7v2Store = context.element(); // Output all elements of all pages. - this.client.getHL7v2MessageStream(hl7v2Store, this.filter).forEach(context::output); + HttpHealthcareApiClient.HL7v2MessagePages pages = + new HttpHealthcareApiClient.HL7v2MessagePages(client, hl7v2Store, this.filter); + long reqestTime = Instant.now().getMillis(); Review comment: `requestTime`? ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java ## @@ -437,6 +444,20 @@ private Message fetchMessage(HealthcareApiClient client, String msgId) .apply(Create.of(this.hl7v2Stores)) .apply(ParDo.of(new ListHL7v2MessagesFn(this.filter))) .setCoder(new HL7v2MessageCoder()) + // Listing takes a long time for each input element (HL7v2 store) because it has to + // paginate through results in a single thread / ProcessElement call in order to keep + // track of page token. + // Eagerly emit data on 1 second intervals so downstream processing can get started before + // all of the list results have been paginated through. Review comment: Unfortunately, this is not possible. If you are paginating from inside the single DoFn `processelement` call, the data coming out of it will only go downstream after the element is done being processed, so this windowing is not changing that in the execution. This is because bundle execution is committed atomically, so the whole bundle executes before data can go downstream. You do touch on an interesting example, which is one of the reasons that we came up with SplittableDoFn. Something you could try to do is: ``` PColll pages = hl7v2Stores.apply(ParDo.of(new RetrieveAndOutputPagesFn())) pages.apply(Reshuffle.viaRandomKey()).apply(ParDo.of(new FetchEachPageFn()) ``` Though I don't know if you can actually do that : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428379) Time Spent: 0.5h (was: 20m) > HL7v2IO Improvements > > > Key: BEAM-9831 > URL: https://issues.apache.org/jira/browse/BEAM-9831 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > # HL7v2MessageCoder constructor should be public for use by end users > # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list > messages results before emitting any output data elements (due to high fan > out from a single input element). We should add early firings so that > downstream processing can proceed on early pages while later pages are still > being scrolled through. > # We should drop all output only fields of HL7v2Message and only keep data > and labels when calling ingestMessages, rather than expecting the user to do > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9844) ParDoTest.KeyTests failing for Spark
[ https://issues.apache.org/jira/browse/BEAM-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094766#comment-17094766 ] Rehman Murad Ali edited comment on BEAM-9844 at 4/28/20, 9:14 PM: -- Fixed here https://github.com/apache/beam/pull/11559 was (Author: rehmanmuradali): Fixed here https://github.com/apache/beam/pull/11154 > ParDoTest.KeyTests failing for Spark > - > > Key: BEAM-9844 > URL: https://issues.apache.org/jira/browse/BEAM-9844 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > Fix For: Not applicable > > > Seems like these tests were added recently and > beam_PostCommit_Java_ValidatesRunner_Spark is perma red due to this. > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7365/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7364/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/] > I see two different errors. > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimer/] > rg.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NumberFormatException: null > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimerWithGenericKey/] > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NullPointerException > Rehman, can you take a look ? > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-4830) Determine why go vet failures invoked by ./gradlew check were not caught be jenkins build build
[ https://issues.apache.org/jira/browse/BEAM-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-4830: --- Status: Open (was: Triage Needed) > Determine why go vet failures invoked by ./gradlew check were not caught be > jenkins build build > > > Key: BEAM-4830 > URL: https://issues.apache.org/jira/browse/BEAM-4830 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Alex Amato >Assignee: Robert Burke >Priority: Major > Labels: gradle, jenkins > > The purpose of this is to catch errors developers see when they first start > contributing to beam. Let's ensure we run the same commands in the > [contributing guide|https://beam.apache.org/contribute/]. > > Note: check runs more than build, so we are not catching these problems in > the continuous Jenkins testing. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428378 ] ASF GitHub Bot logged work on BEAM-9815: Author: ASF GitHub Bot Created on: 28/Apr/20 21:07 Start Date: 28/Apr/20 21:07 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11524: URL: https://github.com/apache/beam/pull/11524#issuecomment-620855224 I'll merge once I sanity check that we're back to "No artifacts" for the Dataflow tests, and otherwise passed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428378) Time Spent: 2.5h (was: 2h 20m) > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 2.5h > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9845) Stage dependencies over the expansion service.
[ https://issues.apache.org/jira/browse/BEAM-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9845: --- Status: Open (was: Triage Needed) > Stage dependencies over the expansion service. > -- > > Key: BEAM-9845 > URL: https://issues.apache.org/jira/browse/BEAM-9845 > Project: Beam > Issue Type: Bug > Components: sdk-java-core, sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > This will obviate the need for jar_packages. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428377 ] ASF GitHub Bot logged work on BEAM-9815: Author: ASF GitHub Bot Created on: 28/Apr/20 21:06 Start Date: 28/Apr/20 21:06 Worklog Time Spent: 10m Work Description: lostluck commented on a change in pull request #11524: URL: https://github.com/apache/beam/pull/11524#discussion_r416922273 ## File path: sdks/go/test/run_integration_tests.sh ## @@ -102,9 +102,8 @@ case $key in esac done -if [[ "$RUNNER" != "universal" ]]; then - PUSH_CONTAINER_TO_GCR='yes' -else +PUSH_CONTAINER_TO_GCR='yes' Review comment: A fair observation. I ended up with reversing the if clause anyway since the positive check is clearer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428377) Time Spent: 2h 20m (was: 2h 10m) > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 2h 20m > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9844) ParDoTest.KeyTests failing for Spark
[ https://issues.apache.org/jira/browse/BEAM-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-9844. Fix Version/s: Not applicable Resolution: Duplicate > ParDoTest.KeyTests failing for Spark > - > > Key: BEAM-9844 > URL: https://issues.apache.org/jira/browse/BEAM-9844 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > Fix For: Not applicable > > > Seems like these tests were added recently and > beam_PostCommit_Java_ValidatesRunner_Spark is perma red due to this. > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7365/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7364/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/] > I see two different errors. > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimer/] > rg.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NumberFormatException: null > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimerWithGenericKey/] > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NullPointerException > Rehman, can you take a look ? > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428376 ] ASF GitHub Bot logged work on BEAM-9815: Author: ASF GitHub Bot Created on: 28/Apr/20 21:05 Start Date: 28/Apr/20 21:05 Worklog Time Spent: 10m Work Description: lostluck commented on a change in pull request #11524: URL: https://github.com/apache/beam/pull/11524#discussion_r416922004 ## File path: sdks/go/test/run_integration_tests.sh ## @@ -118,7 +117,7 @@ test -d sdks/go/test command -v docker docker -v -if [[ PUSH_CONTAINER_TO_GCR == 'yes' ]]; then +if [ "$PUSH_CONTAINER_TO_GCR" = "yes" ]; then Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428376) Time Spent: 2h 10m (was: 2h) > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 2h 10m > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9844) ParDoTest.KeyTests failing for Spark
[ https://issues.apache.org/jira/browse/BEAM-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-9844: -- Assignee: (was: Rehman Murad Ali) > ParDoTest.KeyTests failing for Spark > - > > Key: BEAM-9844 > URL: https://issues.apache.org/jira/browse/BEAM-9844 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > Seems like these tests were added recently and > beam_PostCommit_Java_ValidatesRunner_Spark is perma red due to this. > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7365/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7364/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/] > I see two different errors. > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimer/] > rg.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NumberFormatException: null > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimerWithGenericKey/] > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NullPointerException > Rehman, can you take a look ? > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container
[ https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428374 ] ASF GitHub Bot logged work on BEAM-9815: Author: ASF GitHub Bot Created on: 28/Apr/20 21:05 Start Date: 28/Apr/20 21:05 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11524: URL: https://github.com/apache/beam/pull/11524#issuecomment-620854348 Run Go Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428374) Time Spent: 2h (was: 1h 50m) > beam_PostCommit_Go perma red due to failing to start container > -- > > Key: BEAM-9815 > URL: https://issues.apache.org/jira/browse/BEAM-9815 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Chamikara Madhusanka Jayalath >Assignee: Robert Bradshaw >Priority: Critical > Labels: currently-failing > Time Spent: 2h > Remaining Estimate: 0h > > For example, > [https://builds.apache.org/job/beam_PostCommit_Go/6847/] > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9801) Setting a timer from a timer callback fails
[ https://issues.apache.org/jira/browse/BEAM-9801?focusedWorklogId=428375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428375 ] ASF GitHub Bot logged work on BEAM-9801: Author: ASF GitHub Bot Created on: 28/Apr/20 21:05 Start Date: 28/Apr/20 21:05 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #11492: URL: https://github.com/apache/beam/pull/11492#discussion_r416921999 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -845,10 +845,12 @@ def process_bundle(self, (result_future.is_done() and result_future.get().error)): if isinstance(output, beam_fn_api_pb2.Elements.Timers) and not dry_run: with BundleManager._lock: -self.bundle_context_manager.get_buffer( +timer_buffer = self.bundle_context_manager.get_buffer( expected_output_timers[( output.transform_id, output.timer_family_id)], -output.transform_id).append(output.timers) +output.transform_id) +timer_buffer.cleared = False Review comment: Did you run into these errors when setting a timer from the timer call, Max? I think it would be good to explicitly reset the buffer, rather than manipulate its flag (e.g. write a timer_buffer.reset function). Or at least check `if timer_buffer.cleared: timer_buffer.cleared = False`, to confirm that the rest of the internal context in `ListBuffer` is cleared. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428375) Time Spent: 5h 50m (was: 5h 40m) > Setting a timer from a timer callback fails > --- > > Key: BEAM-9801 > URL: https://issues.apache.org/jira/browse/BEAM-9801 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Maximilian Michels >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Hi, > I'm trying to set a timer from a timer callback in the Python SDK: > {code:Python} > class GenerateLoad(beam.DoFn): > timer_spec = userstate.TimerSpec('timer', userstate.TimeDomain.WATERMARK) > def process(self, element, timer=beam.DoFn.TimerParam(timer_spec)): > self.key = element[0] > timer.set(0) > @userstate.on_timer(timer_spec) > def process_timer(self, timer=beam.DoFn.TimerParam(timer_spec)): > timer.set(0) > {code} > This yields the following Python stack trace: > {noformat} > INFO:apache_beam.utils.subprocess_server:Caused by: > java.lang.RuntimeException: Error received from SDK harness for instruction > 4: Traceback (most recent call last): > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 245, in _execute > INFO:apache_beam.utils.subprocess_server: response = task() > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 302, in > INFO:apache_beam.utils.subprocess_server: lambda: > self.create_worker().do_instruction(request), request) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction > INFO:apache_beam.utils.subprocess_server: getattr(request, request_type), > request.instruction_id) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle > INFO:apache_beam.utils.subprocess_server: > bundle_processor.process_bundle(instruction_id)) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/bundle_processor.py", line 910, in process_bundle > INFO:apache_beam.utils.subprocess_server: element.timer_family_id, timer_data) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/operations.py", line 688, in process_timer > INFO:apache_beam.utils.subprocess_server: timer_data.fire_timestamp) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 990, in process_user_timer > INFO:apache_beam.utils.subprocess_server: self._reraise_augmented(exn) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 1043, in _reraise_augmented > INFO:apache_beam.utils.subprocess_server: raise_with_traceback(new_exn) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 988, in process_user_timer >
[jira] [Updated] (BEAM-9844) ParDoTest.KeyTests failing for Spark
[ https://issues.apache.org/jira/browse/BEAM-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9844: --- Status: Open (was: Triage Needed) > ParDoTest.KeyTests failing for Spark > - > > Key: BEAM-9844 > URL: https://issues.apache.org/jira/browse/BEAM-9844 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Chamikara Madhusanka Jayalath >Assignee: Rehman Murad Ali >Priority: Major > > Seems like these tests were added recently and > beam_PostCommit_Java_ValidatesRunner_Spark is perma red due to this. > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7365/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7364/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/] > I see two different errors. > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimer/] > rg.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NumberFormatException: null > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimerWithGenericKey/] > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NullPointerException > Rehman, can you take a look ? > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test
[ https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428370 ] ASF GitHub Bot logged work on BEAM-8742: Author: ASF GitHub Bot Created on: 28/Apr/20 20:50 Start Date: 28/Apr/20 20:50 Worklog Time Spent: 10m Work Description: mxm removed a comment on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-620847201 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428370) Time Spent: 3h 40m (was: 3.5h) > Add stateful processing to ParDo load test > -- > > Key: BEAM-8742 > URL: https://issues.apache.org/jira/browse/BEAM-8742 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > So far, the ParDo load test is not stateful. We should add a basic counter to > test the stateful processing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test
[ https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428369 ] ASF GitHub Bot logged work on BEAM-8742: Author: ASF GitHub Bot Created on: 28/Apr/20 20:50 Start Date: 28/Apr/20 20:50 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-620847201 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428369) Time Spent: 3.5h (was: 3h 20m) > Add stateful processing to ParDo load test > -- > > Key: BEAM-8742 > URL: https://issues.apache.org/jira/browse/BEAM-8742 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > So far, the ParDo load test is not stateful. We should add a basic counter to > test the stateful processing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test
[ https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428367 ] ASF GitHub Bot logged work on BEAM-8742: Author: ASF GitHub Bot Created on: 28/Apr/20 20:49 Start Date: 28/Apr/20 20:49 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-620846828 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428367) Time Spent: 3h 20m (was: 3h 10m) > Add stateful processing to ParDo load test > -- > > Key: BEAM-8742 > URL: https://issues.apache.org/jira/browse/BEAM-8742 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > So far, the ParDo load test is not stateful. We should add a basic counter to > test the stateful processing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9561) Run pandas tests with Beam Dataframe API
[ https://issues.apache.org/jira/browse/BEAM-9561?focusedWorklogId=428364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428364 ] ASF GitHub Bot logged work on BEAM-9561: Author: ASF GitHub Bot Created on: 28/Apr/20 20:47 Start Date: 28/Apr/20 20:47 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11419: URL: https://github.com/apache/beam/pull/11419#issuecomment-620845876 Thanks. Merging as soon as tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428364) Time Spent: 0.5h (was: 20m) > Run pandas tests with Beam Dataframe API > > > Key: BEAM-9561 > URL: https://issues.apache.org/jira/browse/BEAM-9561 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8414) Cleanup Python codebase to enable some of the excluded Python lint checks.
[ https://issues.apache.org/jira/browse/BEAM-8414?focusedWorklogId=428362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428362 ] ASF GitHub Bot logged work on BEAM-8414: Author: ASF GitHub Bot Created on: 28/Apr/20 20:41 Start Date: 28/Apr/20 20:41 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11523: URL: https://github.com/apache/beam/pull/11523#issuecomment-620842927 Hm lint and autoformat are complaining : ( Lint: ``` 13:08:40 apache_beam/runners/interactive/interactive_runner.py:30:0: W0611: Unused import sys (unused-import) ``` you can run the autoformatter via `tox -e py3-yapf` to format your code and ensure that test will pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428362) Time Spent: 4h 40m (was: 4.5h) > Cleanup Python codebase to enable some of the excluded Python lint checks. > --- > > Key: BEAM-8414 > URL: https://issues.apache.org/jira/browse/BEAM-8414 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Stephen O'Kennedy >Priority: Minor > Labels: beginner, easy, easy-fix, easyfix, newbie, starter > Time Spent: 4h 40m > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/9725 upgraded lint checker, however Beam > codebase is not fully compliant with some of the checks new linter supports, > so we excluded such checks. We would like to have some checks permanently > excluded (see discussion on the PR), however we would like to re-enable the > following checks: > consider-using-set-comprehension > chained-comparison > consider-using-sys-exit > To reenable these checks, we should: > 1) remove them from disabled checks in .pylintrc [1] > https://github.com/apache/beam/blob/master/sdks/python/.pylintrc and > 2) cleanup the codebase to make it compliant. > [1] > https://github.com/apache/beam/blob/3330069291d8168c56c77acfef84c2566af05ec6/sdks/python/.pylintrc#L81 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9801) Setting a timer from a timer callback fails
[ https://issues.apache.org/jira/browse/BEAM-9801?focusedWorklogId=428360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428360 ] ASF GitHub Bot logged work on BEAM-9801: Author: ASF GitHub Bot Created on: 28/Apr/20 20:40 Start Date: 28/Apr/20 20:40 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11492: URL: https://github.com/apache/beam/pull/11492#issuecomment-620842616 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428360) Time Spent: 5h 40m (was: 5.5h) > Setting a timer from a timer callback fails > --- > > Key: BEAM-9801 > URL: https://issues.apache.org/jira/browse/BEAM-9801 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Maximilian Michels >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Hi, > I'm trying to set a timer from a timer callback in the Python SDK: > {code:Python} > class GenerateLoad(beam.DoFn): > timer_spec = userstate.TimerSpec('timer', userstate.TimeDomain.WATERMARK) > def process(self, element, timer=beam.DoFn.TimerParam(timer_spec)): > self.key = element[0] > timer.set(0) > @userstate.on_timer(timer_spec) > def process_timer(self, timer=beam.DoFn.TimerParam(timer_spec)): > timer.set(0) > {code} > This yields the following Python stack trace: > {noformat} > INFO:apache_beam.utils.subprocess_server:Caused by: > java.lang.RuntimeException: Error received from SDK harness for instruction > 4: Traceback (most recent call last): > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 245, in _execute > INFO:apache_beam.utils.subprocess_server: response = task() > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 302, in > INFO:apache_beam.utils.subprocess_server: lambda: > self.create_worker().do_instruction(request), request) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction > INFO:apache_beam.utils.subprocess_server: getattr(request, request_type), > request.instruction_id) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle > INFO:apache_beam.utils.subprocess_server: > bundle_processor.process_bundle(instruction_id)) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/bundle_processor.py", line 910, in process_bundle > INFO:apache_beam.utils.subprocess_server: element.timer_family_id, timer_data) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/operations.py", line 688, in process_timer > INFO:apache_beam.utils.subprocess_server: timer_data.fire_timestamp) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 990, in process_user_timer > INFO:apache_beam.utils.subprocess_server: self._reraise_augmented(exn) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 1043, in _reraise_augmented > INFO:apache_beam.utils.subprocess_server: raise_with_traceback(new_exn) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 988, in process_user_timer > INFO:apache_beam.utils.subprocess_server: > self.do_fn_invoker.invoke_user_timer(timer_spec, key, window, timestamp) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 517, in invoke_user_timer > INFO:apache_beam.utils.subprocess_server: self.user_state_context, key, > window, timestamp)) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 1093, in process_outputs > INFO:apache_beam.utils.subprocess_server: for result in results: > INFO:apache_beam.utils.subprocess_server: File > "/Users/max/Dev/beam/sdks/python/apache_beam/testing/load_tests/pardo_test.py", > line 185, in process_timer > INFO:apache_beam.utils.subprocess_server: timer.set(0) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/bundle_processor.py", line 589, in set > INFO:apache_beam.utils.subprocess_server: > self._timer_coder_impl.encode_to_stream(timer, self._output_stream, True) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/coders/coder_impl.py", line 651, in encode_to_stream >
[jira] [Work logged] (BEAM-9801) Setting a timer from a timer callback fails
[ https://issues.apache.org/jira/browse/BEAM-9801?focusedWorklogId=428359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428359 ] ASF GitHub Bot logged work on BEAM-9801: Author: ASF GitHub Bot Created on: 28/Apr/20 20:40 Start Date: 28/Apr/20 20:40 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11492: URL: https://github.com/apache/beam/pull/11492#issuecomment-620842521 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428359) Time Spent: 5.5h (was: 5h 20m) > Setting a timer from a timer callback fails > --- > > Key: BEAM-9801 > URL: https://issues.apache.org/jira/browse/BEAM-9801 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Maximilian Michels >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Hi, > I'm trying to set a timer from a timer callback in the Python SDK: > {code:Python} > class GenerateLoad(beam.DoFn): > timer_spec = userstate.TimerSpec('timer', userstate.TimeDomain.WATERMARK) > def process(self, element, timer=beam.DoFn.TimerParam(timer_spec)): > self.key = element[0] > timer.set(0) > @userstate.on_timer(timer_spec) > def process_timer(self, timer=beam.DoFn.TimerParam(timer_spec)): > timer.set(0) > {code} > This yields the following Python stack trace: > {noformat} > INFO:apache_beam.utils.subprocess_server:Caused by: > java.lang.RuntimeException: Error received from SDK harness for instruction > 4: Traceback (most recent call last): > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 245, in _execute > INFO:apache_beam.utils.subprocess_server: response = task() > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 302, in > INFO:apache_beam.utils.subprocess_server: lambda: > self.create_worker().do_instruction(request), request) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction > INFO:apache_beam.utils.subprocess_server: getattr(request, request_type), > request.instruction_id) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle > INFO:apache_beam.utils.subprocess_server: > bundle_processor.process_bundle(instruction_id)) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/bundle_processor.py", line 910, in process_bundle > INFO:apache_beam.utils.subprocess_server: element.timer_family_id, timer_data) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/operations.py", line 688, in process_timer > INFO:apache_beam.utils.subprocess_server: timer_data.fire_timestamp) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 990, in process_user_timer > INFO:apache_beam.utils.subprocess_server: self._reraise_augmented(exn) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 1043, in _reraise_augmented > INFO:apache_beam.utils.subprocess_server: raise_with_traceback(new_exn) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 988, in process_user_timer > INFO:apache_beam.utils.subprocess_server: > self.do_fn_invoker.invoke_user_timer(timer_spec, key, window, timestamp) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 517, in invoke_user_timer > INFO:apache_beam.utils.subprocess_server: self.user_state_context, key, > window, timestamp)) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 1093, in process_outputs > INFO:apache_beam.utils.subprocess_server: for result in results: > INFO:apache_beam.utils.subprocess_server: File > "/Users/max/Dev/beam/sdks/python/apache_beam/testing/load_tests/pardo_test.py", > line 185, in process_timer > INFO:apache_beam.utils.subprocess_server: timer.set(0) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/bundle_processor.py", line 589, in set > INFO:apache_beam.utils.subprocess_server: > self._timer_coder_impl.encode_to_stream(timer, self._output_stream, True) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/coders/coder_impl.py", line 651, in encode_to_stream >
[jira] [Work logged] (BEAM-6661) FnApi gRPC setup/teardown glitch
[ https://issues.apache.org/jira/browse/BEAM-6661?focusedWorklogId=428358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428358 ] ASF GitHub Bot logged work on BEAM-6661: Author: ASF GitHub Bot Created on: 28/Apr/20 20:39 Start Date: 28/Apr/20 20:39 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11537: URL: https://github.com/apache/beam/pull/11537#issuecomment-620842028 @ibzib Maybe also worth backporting? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428358) Time Spent: 1h (was: 50m) > FnApi gRPC setup/teardown glitch > > > Key: BEAM-6661 > URL: https://issues.apache.org/jira/browse/BEAM-6661 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Affects Versions: 2.11.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Multiple exceptions are observed during FnApi gRPC setup/teardown. The > examples are > {noformat} > 14:53:22 [grpc-default-executor-1] WARN > org.apache.beam.runners.fnexecution.logging.GrpcLoggingService - Logging > client failed unexpectedly. > 14:53:22 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: > CANCELLED: cancelled before receiving half close > 14:53:22 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517) > 14:53:22 at{noformat} > {noformat} > > 14:52:56 [main] INFO org.apache.beam.runners.flink.FlinkJobServerDriver - > JobService started on localhost:58179 > 14:52:57 [grpc-default-executor-0] ERROR > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - > Encountered Unexpected Exception for Invocation > job_bfb7df0e-408e-4bfd-bb3c-432e946ca819 > 14:52:57 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: > NOT_FOUND > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534) > 14:52:57 at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341) > 14:52:57 at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262) > 14:52:57 at > org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:693) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > 14:52:57 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 14:52:57 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > {noformat} > {noformat} > 14:54:50 Feb 07, 2019 10:54:50 PM > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > 14:54:50 SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=628, > target=localhost:41409} was not shutdown properly!!! ~*~*~* > 14:54:50 Make sure to call shutdown()/shutdownNow() and wait until > awaitTermination() returns true. > 14:54:50 java.lang.RuntimeException: ManagedChannel allocation site > 14:54:50 at >
[jira] [Resolved] (BEAM-6661) FnApi gRPC setup/teardown glitch
[ https://issues.apache.org/jira/browse/BEAM-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels resolved BEAM-6661. -- Fix Version/s: 2.22.0 Assignee: (was: Heejong Lee) Resolution: Fixed > FnApi gRPC setup/teardown glitch > > > Key: BEAM-6661 > URL: https://issues.apache.org/jira/browse/BEAM-6661 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Affects Versions: 2.11.0 >Reporter: Heejong Lee >Priority: Major > Fix For: 2.22.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Multiple exceptions are observed during FnApi gRPC setup/teardown. The > examples are > {noformat} > 14:53:22 [grpc-default-executor-1] WARN > org.apache.beam.runners.fnexecution.logging.GrpcLoggingService - Logging > client failed unexpectedly. > 14:53:22 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: > CANCELLED: cancelled before receiving half close > 14:53:22 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517) > 14:53:22 at{noformat} > {noformat} > > 14:52:56 [main] INFO org.apache.beam.runners.flink.FlinkJobServerDriver - > JobService started on localhost:58179 > 14:52:57 [grpc-default-executor-0] ERROR > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - > Encountered Unexpected Exception for Invocation > job_bfb7df0e-408e-4bfd-bb3c-432e946ca819 > 14:52:57 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: > NOT_FOUND > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534) > 14:52:57 at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341) > 14:52:57 at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262) > 14:52:57 at > org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:693) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > 14:52:57 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 14:52:57 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > {noformat} > {noformat} > 14:54:50 Feb 07, 2019 10:54:50 PM > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > 14:54:50 SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=628, > target=localhost:41409} was not shutdown properly!!! ~*~*~* > 14:54:50 Make sure to call shutdown()/shutdownNow() and wait until > awaitTermination() returns true. > 14:54:50 java.lang.RuntimeException: ManagedChannel allocation site > 14:54:50 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103) > 14:54:50 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53) > 14:54:50 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44) > 14:54:50 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:410) > 14:54:50 at > org.apache.beam.sdk.fn.channel.ManagedChannelFactory.forDescriptor(ManagedChannelFactory.java:44) > 14:54:50 at >
[jira] [Work logged] (BEAM-6661) FnApi gRPC setup/teardown glitch
[ https://issues.apache.org/jira/browse/BEAM-6661?focusedWorklogId=428357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428357 ] ASF GitHub Bot logged work on BEAM-6661: Author: ASF GitHub Bot Created on: 28/Apr/20 20:38 Start Date: 28/Apr/20 20:38 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11537: URL: https://github.com/apache/beam/pull/11537#issuecomment-620841780 Merging, we can follow-up with the nits if we feel like it, since they are very minor. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428357) Time Spent: 50m (was: 40m) > FnApi gRPC setup/teardown glitch > > > Key: BEAM-6661 > URL: https://issues.apache.org/jira/browse/BEAM-6661 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Affects Versions: 2.11.0 >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Multiple exceptions are observed during FnApi gRPC setup/teardown. The > examples are > {noformat} > 14:53:22 [grpc-default-executor-1] WARN > org.apache.beam.runners.fnexecution.logging.GrpcLoggingService - Logging > client failed unexpectedly. > 14:53:22 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: > CANCELLED: cancelled before receiving half close > 14:53:22 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517) > 14:53:22 at{noformat} > {noformat} > > 14:52:56 [main] INFO org.apache.beam.runners.flink.FlinkJobServerDriver - > JobService started on localhost:58179 > 14:52:57 [grpc-default-executor-0] ERROR > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - > Encountered Unexpected Exception for Invocation > job_bfb7df0e-408e-4bfd-bb3c-432e946ca819 > 14:52:57 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: > NOT_FOUND > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534) > 14:52:57 at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341) > 14:52:57 at > org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262) > 14:52:57 at > org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:693) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > 14:52:57 at > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > 14:52:57 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 14:52:57 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > {noformat} > {noformat} > 14:54:50 Feb 07, 2019 10:54:50 PM > org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference > cleanQueue > 14:54:50 SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=628, > target=localhost:41409} was not shutdown properly!!! ~*~*~* > 14:54:50 Make sure to call shutdown()/shutdownNow() and wait until > awaitTermination() returns true. > 14:54:50 java.lang.RuntimeException: ManagedChannel allocation site > 14:54:50 at >
[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test
[ https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428356 ] ASF GitHub Bot logged work on BEAM-8742: Author: ASF GitHub Bot Created on: 28/Apr/20 20:35 Start Date: 28/Apr/20 20:35 Worklog Time Spent: 10m Work Description: mxm removed a comment on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-620839972 Run Website_Stage_GCS PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428356) Time Spent: 3h 10m (was: 3h) > Add stateful processing to ParDo load test > -- > > Key: BEAM-8742 > URL: https://issues.apache.org/jira/browse/BEAM-8742 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > So far, the ParDo load test is not stateful. We should add a basic counter to > test the stateful processing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test
[ https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428355 ] ASF GitHub Bot logged work on BEAM-8742: Author: ASF GitHub Bot Created on: 28/Apr/20 20:34 Start Date: 28/Apr/20 20:34 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-620839972 Run Website_Stage_GCS PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428355) Time Spent: 3h (was: 2h 50m) > Add stateful processing to ParDo load test > -- > > Key: BEAM-8742 > URL: https://issues.apache.org/jira/browse/BEAM-8742 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > So far, the ParDo load test is not stateful. We should add a basic counter to > test the stateful processing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9801) Setting a timer from a timer callback fails
[ https://issues.apache.org/jira/browse/BEAM-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094864#comment-17094864 ] Maximilian Michels commented on BEAM-9801: -- I'm not sure if it was working at some point and broke due to introducing new features. It appears to be untested, so it likely never worked. It was relatively easy to fix though, see the PR. > Setting a timer from a timer callback fails > --- > > Key: BEAM-9801 > URL: https://issues.apache.org/jira/browse/BEAM-9801 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Maximilian Michels >Priority: Critical > Fix For: 2.21.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Hi, > I'm trying to set a timer from a timer callback in the Python SDK: > {code:Python} > class GenerateLoad(beam.DoFn): > timer_spec = userstate.TimerSpec('timer', userstate.TimeDomain.WATERMARK) > def process(self, element, timer=beam.DoFn.TimerParam(timer_spec)): > self.key = element[0] > timer.set(0) > @userstate.on_timer(timer_spec) > def process_timer(self, timer=beam.DoFn.TimerParam(timer_spec)): > timer.set(0) > {code} > This yields the following Python stack trace: > {noformat} > INFO:apache_beam.utils.subprocess_server:Caused by: > java.lang.RuntimeException: Error received from SDK harness for instruction > 4: Traceback (most recent call last): > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 245, in _execute > INFO:apache_beam.utils.subprocess_server: response = task() > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 302, in > INFO:apache_beam.utils.subprocess_server: lambda: > self.create_worker().do_instruction(request), request) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction > INFO:apache_beam.utils.subprocess_server: getattr(request, request_type), > request.instruction_id) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle > INFO:apache_beam.utils.subprocess_server: > bundle_processor.process_bundle(instruction_id)) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/bundle_processor.py", line 910, in process_bundle > INFO:apache_beam.utils.subprocess_server: element.timer_family_id, timer_data) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/operations.py", line 688, in process_timer > INFO:apache_beam.utils.subprocess_server: timer_data.fire_timestamp) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 990, in process_user_timer > INFO:apache_beam.utils.subprocess_server: self._reraise_augmented(exn) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 1043, in _reraise_augmented > INFO:apache_beam.utils.subprocess_server: raise_with_traceback(new_exn) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 988, in process_user_timer > INFO:apache_beam.utils.subprocess_server: > self.do_fn_invoker.invoke_user_timer(timer_spec, key, window, timestamp) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 517, in invoke_user_timer > INFO:apache_beam.utils.subprocess_server: self.user_state_context, key, > window, timestamp)) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/common.py", line 1093, in process_outputs > INFO:apache_beam.utils.subprocess_server: for result in results: > INFO:apache_beam.utils.subprocess_server: File > "/Users/max/Dev/beam/sdks/python/apache_beam/testing/load_tests/pardo_test.py", > line 185, in process_timer > INFO:apache_beam.utils.subprocess_server: timer.set(0) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/runners/worker/bundle_processor.py", line 589, in set > INFO:apache_beam.utils.subprocess_server: > self._timer_coder_impl.encode_to_stream(timer, self._output_stream, True) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/coders/coder_impl.py", line 651, in encode_to_stream > INFO:apache_beam.utils.subprocess_server: value.hold_timestamp, out, True) > INFO:apache_beam.utils.subprocess_server: File > "apache_beam/coders/coder_impl.py", line 608, in encode_to_stream > INFO:apache_beam.utils.subprocess_server: millis = value.micros // 1000 > INFO:apache_beam.utils.subprocess_server:AttributeError: 'NoneType' object > has no attribute 'micros' [while running 'GenerateLoad'] > {noformat} > Looking at the code base, I'm not sure we have tests for timer output > timestamps. Am I missing something? -- This message was sent by Atlassian
[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test
[ https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428352 ] ASF GitHub Bot logged work on BEAM-8742: Author: ASF GitHub Bot Created on: 28/Apr/20 20:29 Start Date: 28/Apr/20 20:29 Worklog Time Spent: 10m Work Description: mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-620837611 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428352) Time Spent: 2h 50m (was: 2h 40m) > Add stateful processing to ParDo load test > -- > > Key: BEAM-8742 > URL: https://issues.apache.org/jira/browse/BEAM-8742 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > So far, the ParDo load test is not stateful. We should add a basic counter to > test the stateful processing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9546) Support for batching a schema-aware PCollection and processing as a Dataframe
[ https://issues.apache.org/jira/browse/BEAM-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9546 started by Brian Hulette. --- > Support for batching a schema-aware PCollection and processing as a Dataframe > - > > Key: BEAM-9546 > URL: https://issues.apache.org/jira/browse/BEAM-9546 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8414) Cleanup Python codebase to enable some of the excluded Python lint checks.
[ https://issues.apache.org/jira/browse/BEAM-8414?focusedWorklogId=428342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428342 ] ASF GitHub Bot logged work on BEAM-8414: Author: ASF GitHub Bot Created on: 28/Apr/20 20:04 Start Date: 28/Apr/20 20:04 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11523: URL: https://github.com/apache/beam/pull/11523#issuecomment-620825431 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428342) Time Spent: 4.5h (was: 4h 20m) > Cleanup Python codebase to enable some of the excluded Python lint checks. > --- > > Key: BEAM-8414 > URL: https://issues.apache.org/jira/browse/BEAM-8414 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Stephen O'Kennedy >Priority: Minor > Labels: beginner, easy, easy-fix, easyfix, newbie, starter > Time Spent: 4.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/9725 upgraded lint checker, however Beam > codebase is not fully compliant with some of the checks new linter supports, > so we excluded such checks. We would like to have some checks permanently > excluded (see discussion on the PR), however we would like to re-enable the > following checks: > consider-using-set-comprehension > chained-comparison > consider-using-sys-exit > To reenable these checks, we should: > 1) remove them from disabled checks in .pylintrc [1] > https://github.com/apache/beam/blob/master/sdks/python/.pylintrc and > 2) cleanup the codebase to make it compliant. > [1] > https://github.com/apache/beam/blob/3330069291d8168c56c77acfef84c2566af05ec6/sdks/python/.pylintrc#L81 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9846) Remove references to native Java BQ source and sink
[ https://issues.apache.org/jira/browse/BEAM-9846?focusedWorklogId=428341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428341 ] ASF GitHub Bot logged work on BEAM-9846: Author: ASF GitHub Bot Created on: 28/Apr/20 20:00 Start Date: 28/Apr/20 20:00 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11562: URL: https://github.com/apache/beam/pull/11562#issuecomment-620823343 R: @yifanzou This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428341) Time Spent: 20m (was: 10m) > Remove references to native Java BQ source and sink > --- > > Key: BEAM-9846 > URL: https://issues.apache.org/jira/browse/BEAM-9846 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Remove references to enable_custom_bigquery_source and > enable_custom_bigquery_sink experiments. > These experiments have not been used to enable the native Dataflow BQ > source/sink for 10+ releases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9846) Remove references to native Java BQ source and sink
[ https://issues.apache.org/jira/browse/BEAM-9846?focusedWorklogId=428340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428340 ] ASF GitHub Bot logged work on BEAM-9846: Author: ASF GitHub Bot Created on: 28/Apr/20 19:58 Start Date: 28/Apr/20 19:58 Worklog Time Spent: 10m Work Description: lukecwik opened a new pull request #11562: URL: https://github.com/apache/beam/pull/11562 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Updated] (BEAM-9846) Remove references to native Java BQ source and sink
[ https://issues.apache.org/jira/browse/BEAM-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik updated BEAM-9846: Status: Open (was: Triage Needed) > Remove references to native Java BQ source and sink > --- > > Key: BEAM-9846 > URL: https://issues.apache.org/jira/browse/BEAM-9846 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > > Remove references to enable_custom_bigquery_source and > enable_custom_bigquery_sink experiments. > These experiments have not been used to enable the native Dataflow BQ > source/sink for 10+ releases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9846) Remove references to native Java BQ source and sink
Luke Cwik created BEAM-9846: --- Summary: Remove references to native Java BQ source and sink Key: BEAM-9846 URL: https://issues.apache.org/jira/browse/BEAM-9846 Project: Beam Issue Type: Bug Components: io-java-gcp Reporter: Luke Cwik Assignee: Luke Cwik Remove references to enable_custom_bigquery_source and enable_custom_bigquery_sink experiments. These experiments have not been used to enable the native Dataflow BQ source/sink for 10+ releases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428338 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 28/Apr/20 19:40 Start Date: 28/Apr/20 19:40 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on a change in pull request #11548: URL: https://github.com/apache/beam/pull/11548#discussion_r416874101 ## File path: sdks/java/container/build.gradle ## @@ -101,16 +84,44 @@ docker { project.rootProject["docker-tag"] : project.sdk_version) dockerfile project.file("./${dockerfileName}") files "./build/" + buildArgs(['pull_licenses': !project.rootProject.hasProperty(["no-licenses"])]) Review comment: In short, users want to create a lightweight images, without adding licenses. Lightweight images are welcomed for Jenkins test as well, it reduces image size by 85MB for Java image. More discussion can be found at [here](https://lists.apache.org/thread.html/rff9f05e08de6adf7c39c0f4c59f97ae1a2f3602768480fe9e31e0428%40%3Cdev.beam.apache.org%3E) Naming suggestion sounds good to me, thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428338) Time Spent: 24.5h (was: 24h 20m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 24.5h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8542) Add async write to AWS SNS IO & remove retry logic
[ https://issues.apache.org/jira/browse/BEAM-8542?focusedWorklogId=428335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428335 ] ASF GitHub Bot logged work on BEAM-8542: Author: ASF GitHub Bot Created on: 28/Apr/20 19:17 Start Date: 28/Apr/20 19:17 Worklog Time Spent: 10m Work Description: Akshay-Iyangar commented on pull request #10078: URL: https://github.com/apache/beam/pull/10078#issuecomment-620803551 @aromanenko-dev - could you also have a look at this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428335) Time Spent: 8h 50m (was: 8h 40m) > Add async write to AWS SNS IO & remove retry logic > -- > > Key: BEAM-8542 > URL: https://issues.apache.org/jira/browse/BEAM-8542 > Project: Beam > Issue Type: Improvement > Components: io-java-aws >Reporter: Ajo Thomas >Assignee: Ajo Thomas >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > - While working with SNS IO for one of my work-related projects, I found that > the IO uses synchronous publishes during writes. I had a simple mock pipeline > where I was reading from a kinesis stream and publishing it to SNS using > Beam's SNS IO. For comparison, I also had a lamdba which did the same using > asynchronous publishes but was about 5x faster. Changing the SNS IO to use > async publishes would improve publish latencies. > - SNS IO also has some retry logic which isn't required as SNS clients can > handle retries. The retry logic in the SNS client is user-configurable and > therefore, an explicit retry logic in SNS IO is not required > I have a working version of the IO with these changes, will create a PR > linking this ticket to it once I get some feedback here. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428332 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 28/Apr/20 19:08 Start Date: 28/Apr/20 19:08 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #11548: URL: https://github.com/apache/beam/pull/11548#discussion_r416855737 ## File path: sdks/java/container/build.gradle ## @@ -101,16 +84,44 @@ docker { project.rootProject["docker-tag"] : project.sdk_version) dockerfile project.file("./${dockerfileName}") files "./build/" + buildArgs(['pull_licenses': !project.rootProject.hasProperty(["no-licenses"])]) Review comment: Sounds good. Sorry if this was covered in the discussion that I didn't read, but why would we want to check urls but not pull? Feel free to point me to the discussion. Naming suggestion: licenses-in-docker-images=add licenses-in-docker-images=skip licenses-in-docker-images=check_urls This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428332) Time Spent: 24h 20m (was: 24h 10m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 24h 20m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9720) Add custom AWS Http Client Configuration capability for AWS client 1.0/2.0
[ https://issues.apache.org/jira/browse/BEAM-9720?focusedWorklogId=428330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428330 ] ASF GitHub Bot logged work on BEAM-9720: Author: ASF GitHub Bot Created on: 28/Apr/20 19:05 Start Date: 28/Apr/20 19:05 Worklog Time Spent: 10m Work Description: Akshay-Iyangar commented on a change in pull request #11341: URL: https://github.com/apache/beam/pull/11341#discussion_r416853789 ## File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsOptions.java ## @@ -103,6 +103,34 @@ public AWSCredentialsProvider create(PipelineOptions options) { void setClientConfiguration(ClientConfiguration clientConfiguration); Review comment: Yes, you are right doesn't bring in any breaking change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428330) Time Spent: 3.5h (was: 3h 20m) > Add custom AWS Http Client Configuration capability for AWS client 1.0/2.0 > -- > > Key: BEAM-9720 > URL: https://issues.apache.org/jira/browse/BEAM-9720 > Project: Beam > Issue Type: Improvement > Components: io-java-aws >Reporter: Akshay Iyangar >Assignee: Akshay Iyangar >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently, there is no way to set custom client configuration abilities to > AWS client service. > Enable a way to pass these custom client configuration options as pipeline > options. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7923) Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=428324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428324 ] ASF GitHub Bot logged work on BEAM-7923: Author: ASF GitHub Bot Created on: 28/Apr/20 18:53 Start Date: 28/Apr/20 18:53 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11507: URL: https://github.com/apache/beam/pull/11507#issuecomment-620791900 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428324) Time Spent: 22h 50m (was: 22h 40m) > Interactive Beam > > > Key: BEAM-7923 > URL: https://issues.apache.org/jira/browse/BEAM-7923 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 22h 50m > Remaining Estimate: 0h > > This is the top level ticket for all efforts leveraging [interactive > Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]] > As the development goes, blocking tickets will be added to this one. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.
[ https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428323 ] ASF GitHub Bot logged work on BEAM-9802: Author: ASF GitHub Bot Created on: 28/Apr/20 18:53 Start Date: 28/Apr/20 18:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11495: URL: https://github.com/apache/beam/pull/11495#issuecomment-620792019 Thanks looking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 428323) Time Spent: 40m (was: 0.5h) > Provide a way to customize automatically started services. > -- > > Key: BEAM-9802 > URL: https://issues.apache.org/jira/browse/BEAM-9802 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > This can be useful for testing and alternative production environments. -- This message was sent by Atlassian Jira (v8.3.4#803005)