[jira] [Commented] (BEAM-9759) Pipeline creation with large number of shards/streams takes long time

2020-04-28 Thread Reuven Lax (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095092#comment-17095092
 ] 

Reuven Lax commented on BEAM-9759:
--

The best solution would be to create a new SplittableDoFn version of the 
Kinesis runner. This would have several advantages:
 # It could support dynamic changes (at run time) of the list of Kinesis 
topics. I believe this is a major reason that you currently need to update the 
pipeline so often, and this would remove that need.

     2. The splitting could then happen at run time instead of 
graph-construction time, and could also be parallelized.

 

> Pipeline creation with large number of shards/streams takes long time
> -
>
> Key: BEAM-9759
> URL: https://issues.apache.org/jira/browse/BEAM-9759
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis, runner-dataflow
>Affects Versions: 2.19.0
>Reporter: Sebastian Graca
>Priority: Major
>
> We are processing multiple Kinesis streams using pipelines running on 
> {{DataflowRunner}}. The time needed to start such pipeline from a pipeline 
> definition (execution of {{org.apache.beam.sdk.Pipeline.run()}} method) takes 
> considerable amount of time. In our case:
>  * a pipeline that consumes data from 196 streams (237 shards in total) 
> starts in 7 minutes
>  * a pipeline that consumes data from 111 streams (261 shards in total) 
> starts in 4 minutes
> I've been investigating this and found out that when {{Pipeline.run}} is 
> invoked, the whole pipeline graph is traversed and serialized so it can be 
> passed to the Dataflow backend. Here's part of the stacktrace that shows this 
> traversal:
> {code:java}
> at 
> com.amazonaws.services.kinesis.AmazonKinesisClient.getRecords(AmazonKinesisClient.java:1252)
> at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getRecords$2(SimplifiedKinesisClient.java:137)
> at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210)
> at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getRecords(SimplifiedKinesisClient.java:134)
> at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getRecords(SimplifiedKinesisClient.java:119)
> at 
> org.apache.beam.sdk.io.kinesis.StartingPointShardsFinder.validateShards(StartingPointShardsFinder.java:195)
> at 
> org.apache.beam.sdk.io.kinesis.StartingPointShardsFinder.findShardsAtStartingPoint(StartingPointShardsFinder.java:115)
> at 
> org.apache.beam.sdk.io.kinesis.DynamicCheckpointGenerator.generate(DynamicCheckpointGenerator.java:59)
> at org.apache.beam.sdk.io.kinesis.KinesisSource.split(KinesisSource.java:88)
> at 
> org.apache.beam.runners.dataflow.internal.CustomSources.serializeToCloudSource(CustomSources.java:87)
> at 
> org.apache.beam.runners.dataflow.ReadTranslator.translateReadHelper(ReadTranslator.java:51)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1630)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner$StreamingUnboundedRead$ReadWithIdsTranslator.translate(DataflowRunner.java:1627)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:494)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
> at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
> at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:433)
> at 
> org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:192)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:795)
> at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:186)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
> {code}
> As you can see, during serialization, 
> {{org.apache.beam.sdk.io.kinesis.KinesisSource.split}} method is called. This 
> method finds all shards for the stream and also validates each shard by 
> reading from it. As this process is sequential it takes considerable time 
> that is dependent both on the number of streams 

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428516
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 29/Apr/20 05:07
Start Date: 29/Apr/20 05:07
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11548:
URL: https://github.com/apache/beam/pull/11548#issuecomment-620990823


   @tvalentyn, I changed it to pull licenses only with Jenkins test. License 
pulling is skipped by default. Can you please take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428516)
Time Spent: 26h  (was: 25h 50m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 26h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428505
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 29/Apr/20 04:03
Start Date: 29/Apr/20 04:03
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11548:
URL: https://github.com/apache/beam/pull/11548#issuecomment-620977492


   Run PythonDocker PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428505)
Time Spent: 25h 50m  (was: 25h 40m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 25h 50m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428504
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 29/Apr/20 04:02
Start Date: 29/Apr/20 04:02
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11548:
URL: https://github.com/apache/beam/pull/11548#issuecomment-620977305


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428504)
Time Spent: 25h 40m  (was: 25.5h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=428503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428503
 ]

ASF GitHub Bot logged work on BEAM-9418:


Author: ASF GitHub Bot
Created on: 29/Apr/20 03:56
Start Date: 29/Apr/20 03:56
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11333:
URL: https://github.com/apache/beam/pull/11333#issuecomment-620976106


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428503)
Time Spent: 1h 40m  (was: 1.5h)

> Support ANY_VALUE aggregation functions
> ---
>
> Key: BEAM-9418
> URL: https://issues.apache.org/jira/browse/BEAM-9418
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Support the following functionality in BeamSQL:
> {code:java}
> "select t.key, ANY_VALUE(t.column) from t group by t.key";
> {code}
> Spec link: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9418) Support ANY_VALUE aggregation functions

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9418?focusedWorklogId=428502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428502
 ]

ASF GitHub Bot logged work on BEAM-9418:


Author: ASF GitHub Bot
Created on: 29/Apr/20 03:53
Start Date: 29/Apr/20 03:53
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11333:
URL: https://github.com/apache/beam/pull/11333#issuecomment-620975620


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428502)
Time Spent: 1.5h  (was: 1h 20m)

> Support ANY_VALUE aggregation functions
> ---
>
> Key: BEAM-9418
> URL: https://issues.apache.org/jira/browse/BEAM-9418
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Support the following functionality in BeamSQL:
> {code:java}
> "select t.key, ANY_VALUE(t.column) from t group by t.key";
> {code}
> Spec link: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428501
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 29/Apr/20 03:48
Start Date: 29/Apr/20 03:48
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11548:
URL: https://github.com/apache/beam/pull/11548#issuecomment-620974622


   Run seed job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428501)
Time Spent: 25.5h  (was: 25h 20m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 25.5h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428492
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 29/Apr/20 03:25
Start Date: 29/Apr/20 03:25
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11552:
URL: https://github.com/apache/beam/pull/11552#issuecomment-620970095


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428492)
Time Spent: 25h 20m  (was: 25h 10m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 25h 20m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.

2020-04-28 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-7885.
-
Fix Version/s: 2.22.0
   Resolution: Fixed

> DoFn.setup() don't run for streaming jobs on DirectRunner. 
> ---
>
> Key: BEAM-7885
> URL: https://issues.apache.org/jira/browse/BEAM-7885
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.14.0
> Environment: Python
>Reporter: niklas Hansson
>Assignee: Pablo Estrada
>Priority: Minor
> Fix For: 2.22.0
>
>
> From version 2.14.0 Python have introduced setup and teardown for DoFn in 
> order to "Called to prepare an instance for processing bundles of 
> elements.This is a good place to initialize transient in-memory resources, 
> such as network connections."
> However when trying to use it for a unbounded job (pubsub source) it seams 
> like the DoFn.setup() is never called and the resources are never initialize. 
> [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the 
> Dataflow runner the DoFn.Setup seams to be called multiple times but then 
> never again when the pipeline is processing elements [UPDATE] . For the 
> direct runner I get:
>  
> AttributeError: 'NoneType' object has no attribute 'predict' [while running 
> 'transform the data']
> """
> My source code: [https://github.com/NikeNano/DataflowSklearnStreaming]
>  
> I am happy to contribute with example code for how to use setup as soon as I 
> get it running :)  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.

2020-04-28 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-7885:
---

Assignee: Pablo Estrada

> DoFn.setup() don't run for streaming jobs on DirectRunner. 
> ---
>
> Key: BEAM-7885
> URL: https://issues.apache.org/jira/browse/BEAM-7885
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.14.0
> Environment: Python
>Reporter: niklas Hansson
>Assignee: Pablo Estrada
>Priority: Minor
>
> From version 2.14.0 Python have introduced setup and teardown for DoFn in 
> order to "Called to prepare an instance for processing bundles of 
> elements.This is a good place to initialize transient in-memory resources, 
> such as network connections."
> However when trying to use it for a unbounded job (pubsub source) it seams 
> like the DoFn.setup() is never called and the resources are never initialize. 
> [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the 
> Dataflow runner the DoFn.Setup seams to be called multiple times but then 
> never again when the pipeline is processing elements [UPDATE] . For the 
> direct runner I get:
>  
> AttributeError: 'NoneType' object has no attribute 'predict' [while running 
> 'transform the data']
> """
> My source code: [https://github.com/NikeNano/DataflowSklearnStreaming]
>  
> I am happy to contribute with example code for how to use setup as soon as I 
> get it running :)  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7885) DoFn.setup() don't run for streaming jobs on DirectRunner.

2020-04-28 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095009#comment-17095009
 ] 

Pablo Estrada commented on BEAM-7885:
-

This is fixed in PR 11547

> DoFn.setup() don't run for streaming jobs on DirectRunner. 
> ---
>
> Key: BEAM-7885
> URL: https://issues.apache.org/jira/browse/BEAM-7885
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.14.0
> Environment: Python
>Reporter: niklas Hansson
>Priority: Minor
>
> From version 2.14.0 Python have introduced setup and teardown for DoFn in 
> order to "Called to prepare an instance for processing bundles of 
> elements.This is a good place to initialize transient in-memory resources, 
> such as network connections."
> However when trying to use it for a unbounded job (pubsub source) it seams 
> like the DoFn.setup() is never called and the resources are never initialize. 
> [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the 
> Dataflow runner the DoFn.Setup seams to be called multiple times but then 
> never again when the pipeline is processing elements [UPDATE] . For the 
> direct runner I get:
>  
> AttributeError: 'NoneType' object has no attribute 'predict' [while running 
> 'transform the data']
> """
> My source code: [https://github.com/NikeNano/DataflowSklearnStreaming]
>  
> I am happy to contribute with example code for how to use setup as soon as I 
> get it running :)  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9745) [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to deserialize Custom DoFns and Custom Coders.

2020-04-28 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094957#comment-17094957
 ] 

Pablo Estrada commented on BEAM-9745:
-

I am trying to figure out whether this is a regression or not. I'll post an 
update by tomorrow morning.

So far the tests aren't passing on 2.20.0, but they throw a different error, so 
I guess it is hard to just dismiss as a regression:
{code:java}
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: Error received from SDK harness for instruction 
-177: java.lang.UnsupportedOperationException: BigQuery source must be split 
before being read
at 
org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173)
at 
org.apache.beam.fn.harness.BoundedSourceRunner.runReadLoop(BoundedSourceRunner.java:159)
at 
org.apache.beam.fn.harness.BoundedSourceRunner.start(BoundedSourceRunner.java:146)
at 
org.apache.beam.fn.harness.data.PTransformFunctionRegistry.lambda$register$0(PTransformFunctionRegistry.java:108)
at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:282)
at 
org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)
at 
org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
at 
org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:332)
at 
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
at 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
at 
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:412)
at 
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:381)
at 
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306)
at 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:230)
at 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:138)
{code}

> [beam_PostCommit_Java_PortabilityApi] Various GCP IO tests failing, unable to 
> deserialize Custom DoFns and Custom Coders.
> -
>
> Key: BEAM-9745
> URL: https://issues.apache.org/jira/browse/BEAM-9745
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, java-fn-execution, sdk-java-harness, 
> test-failures
>Reporter: Daniel Oliveira
>Assignee: Pablo Estrada
>Priority: Blocker
>  Labels: currently-failing
> Fix For: 2.21.0
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4657/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/c3izncsa4u24k/tests/by-project]
> Initial investigation:
> The bug appears to be popping up on BigQuery tests mostly, but also a 
> BigTable and a Datastore test.
> Here's an example stacktrace of the two errors, showing _only_ the error 
> messages themselves. Source: 
> [https://scans.gradle.com/s/c3izncsa4u24k/tests/efn4wciuamvqq-ccxt3jvofvqbe]
> {noformat}
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -191: 
> java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With 
> Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> ...
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -191: java.lang.IllegalArgumentException: unable to deserialize 
> Custom DoFn With Execution Info
> ...
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
> received from SDK harness for instruction -206: 
> 

[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428453
 ]

ASF GitHub Bot logged work on BEAM-9802:


Author: ASF GitHub Bot
Created on: 29/Apr/20 00:51
Start Date: 29/Apr/20 00:51
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11495:
URL: https://github.com/apache/beam/pull/11495#issuecomment-620929683


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428453)
Time Spent: 1h 20m  (was: 1h 10m)

> Provide a way to customize automatically started services.
> --
>
> Key: BEAM-9802
> URL: https://issues.apache.org/jira/browse/BEAM-9802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This can be useful for testing and alternative production environments. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=428451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428451
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 29/Apr/20 00:33
Start Date: 29/Apr/20 00:33
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11210:
URL: https://github.com/apache/beam/pull/11210#issuecomment-620924876


   @chamikaramj - could you review this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428451)
Time Spent: 7h 50m  (was: 7h 40m)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work stopped] (BEAM-9117) Clustering coder does not get used for BQ multi-partitioned tables

2020-04-28 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9117 stopped by Chamikara Madhusanka Jayalath.
---
> Clustering coder does not get used for BQ multi-partitioned tables
> --
>
> Key: BEAM-9117
> URL: https://issues.apache.org/jira/browse/BEAM-9117
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9117) Clustering coder does not get used for BQ multi-partitioned tables

2020-04-28 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9117 started by Chamikara Madhusanka Jayalath.
---
> Clustering coder does not get used for BQ multi-partitioned tables
> --
>
> Key: BEAM-9117
> URL: https://issues.apache.org/jira/browse/BEAM-9117
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9847) Verify If Triggering allows emitting eager results when processing a single element in HL7v2IO.

2020-04-28 Thread Jacob Ferriero (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094947#comment-17094947
 ] 

Jacob Ferriero commented on BEAM-9847:
--

This test seems to confirm it is not possible to emit outputs of processElement 
call before it has completed.

[https://gist.github.com/jaketf/d3c2e70dde781bbb0ef1993446e34b71]

> Verify If Triggering allows emitting eager results when processing a single 
> element in HL7v2IO.
> ---
>
> Key: BEAM-9847
> URL: https://issues.apache.org/jira/browse/BEAM-9847
> Project: Beam
>  Issue Type: Task
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>
> Due to the nature of the HL7v2 API, HL7v2IO.ListHL7v2Messages follows the 
> pattern of [paginating through all ListMessages results in a single 
> ProcessElement call.|#diff-9cd595f078378218ccc01ce7e19ca766R447]]
>  
> Upon testing with customer against HL7v2 store with 350k messages  we 
> observed that the ListMessages transform was not outputting any elements 
> "hanging" for a long time (which was assumed to be the single thread 
> paginating through all the results).
>  
> We added the following triggering in hopes that it would emit early results:
> {code:java}
>   .apply(
>   Window.into(new GlobalWindows())
>   .triggering(
>   AfterWatermark.pastEndOfWindow()
>   .withEarlyFirings(
>   AfterProcessingTime.pastFirstElementInPane()
>   .plusDelayOf(Duration.standardSeconds(1
>   .discardingFiredPanes())
> {code}
> Our tests with this triggering seemed to indicate that it did not "hang" like 
> the first test and seemed to output a more steady stream of elements.
> Reviewer states that bundles must be committed atomically so no output 
> elements of a (single process element call) can proceed to downstream stages 
> until all output elements for that process element call are ready.
> There may be other things at play here. Will seek to reproduce in a way that 
> definitively confirms output elements can be eagerly output during the 
> execution of a single process element call before it completes.
> CC: [~pabloem]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428446
 ]

ASF GitHub Bot logged work on BEAM-9831:


Author: ASF GitHub Bot
Created on: 29/Apr/20 00:18
Start Date: 29/Apr/20 00:18
Worklog Time Spent: 10m 
  Work Description: jaketf commented on a change in pull request #11538:
URL: https://github.com/apache/beam/pull/11538#discussion_r417000696



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##
@@ -437,6 +444,20 @@ private Message fetchMessage(HealthcareApiClient client, 
String msgId)
   .apply(Create.of(this.hl7v2Stores))
   .apply(ParDo.of(new ListHL7v2MessagesFn(this.filter)))
   .setCoder(new HL7v2MessageCoder())
+  // Listing takes a long time for each input element (HL7v2 store) 
because it has to
+  // paginate through results in a single thread / ProcessElement call 
in order to keep
+  // track of page token.
+  // Eagerly emit data on 1 second intervals so downstream processing 
can get started before
+  // all of the list results have been paginated through.

Review comment:
   @pabloem does this mean that all of a single element's output must be 
buffered in memory? or will runner be smart enough to spill to disk?
   
   Based on my initial investigation I was not able to reproduce the behavior 
reported by customer in a unit test.
   summarized in this 
[gist](https://gist.github.com/jaketf/d3c2e70dde781bbb0ef1993446e34b71)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428446)
Time Spent: 1h 20m  (was: 1h 10m)

> HL7v2IO Improvements
> 
>
> Key: BEAM-9831
> URL: https://issues.apache.org/jira/browse/BEAM-9831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> # HL7v2MessageCoder constructor should be public for use by end users
>  # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list 
> messages results before emitting any output data elements (due to high fan 
> out from a single input element). We should add early firings so that 
> downstream processing can proceed on early pages while later pages are still 
> being scrolled through.
>  # We should drop all output only fields of HL7v2Message and only keep data 
> and labels when calling ingestMessages, rather than expecting the user to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9849) Caching license files for license pulling

2020-04-28 Thread Hannah Jiang (Jira)
Hannah Jiang created BEAM-9849:
--

 Summary: Caching license files for license pulling
 Key: BEAM-9849
 URL: https://issues.apache.org/jira/browse/BEAM-9849
 Project: Beam
  Issue Type: Task
  Components: build-system
Reporter: Hannah Jiang


Licenses are pulled every time a docker image is created.
We need to come up with a caching approach to cache the files so that the same 
file is pulled only once ever.
This caching appraoch should be useable by all images release by Beam, 
including SDK docker images, Flink & Spark job server images etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428439
 ]

ASF GitHub Bot logged work on BEAM-9802:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:55
Start Date: 28/Apr/20 23:55
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11495:
URL: https://github.com/apache/beam/pull/11495#issuecomment-620913946


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428439)
Time Spent: 1h 10m  (was: 1h)

> Provide a way to customize automatically started services.
> --
>
> Key: BEAM-9802
> URL: https://issues.apache.org/jira/browse/BEAM-9802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This can be useful for testing and alternative production environments. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-58) Support Google Cloud Storage encryption keys

2020-04-28 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094931#comment-17094931
 ] 

Udi Meiri commented on BEAM-58:
---

I have not worked on this, only customer *managed* keys, not *supplied* ones.

> Support Google Cloud Storage encryption keys
> 
>
> Key: BEAM-58
> URL: https://issues.apache.org/jira/browse/BEAM-58
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Dan Halperin
>Assignee: Udi Meiri
>Priority: Minor
>
> Customer-supplied encryption keys are now in Beta. 
> https://cloud.google.com/compute/docs/disks/customer-supplied-encryption



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=428436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428436
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:51
Start Date: 28/Apr/20 23:51
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11521:
URL: https://github.com/apache/beam/pull/11521#issuecomment-620913082


   R: @ihji 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428436)
Time Spent: 20h  (was: 19h 50m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428435
 ]

ASF GitHub Bot logged work on BEAM-9802:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:49
Start Date: 28/Apr/20 23:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11495:
URL: https://github.com/apache/beam/pull/11495#issuecomment-620912558


   Thanks. Rebased. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428435)
Time Spent: 1h  (was: 50m)

> Provide a way to customize automatically started services.
> --
>
> Key: BEAM-9802
> URL: https://issues.apache.org/jira/browse/BEAM-9802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This can be useful for testing and alternative production environments. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=428433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428433
 ]

ASF GitHub Bot logged work on BEAM-9692:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:32
Start Date: 28/Apr/20 23:32
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11452:
URL: https://github.com/apache/beam/pull/11452#issuecomment-620907966


   R: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428433)
Time Spent: 1h 50m  (was: 1h 40m)

> Clean Python DataflowRunner to use portable pipelines
> -
>
> Key: BEAM-9692
> URL: https://issues.apache.org/jira/browse/BEAM-9692
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=428432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428432
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:30
Start Date: 28/Apr/20 23:30
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11210:
URL: https://github.com/apache/beam/pull/11210#issuecomment-620907421


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428432)
Time Spent: 7h 40m  (was: 7.5h)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9449) Consider passing pipeline options for expansion service.

2020-04-28 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9449 started by Brian Hulette.
---
> Consider passing pipeline options for expansion service.
> 
>
> Key: BEAM-9449
> URL: https://issues.apache.org/jira/browse/BEAM-9449
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9449) Consider passing pipeline options for expansion service.

2020-04-28 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9449:

Status: Open  (was: Triage Needed)

> Consider passing pipeline options for expansion service.
> 
>
> Key: BEAM-9449
> URL: https://issues.apache.org/jira/browse/BEAM-9449
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9449) Consider passing pipeline options for expansion service.

2020-04-28 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-9449:
---

Assignee: Brian Hulette

> Consider passing pipeline options for expansion service.
> 
>
> Key: BEAM-9449
> URL: https://issues.apache.org/jira/browse/BEAM-9449
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9848) Pass caller pipeline options to expansion service

2020-04-28 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette closed BEAM-9848.
---
Fix Version/s: 2.22.0
   Resolution: Duplicate

> Pass caller pipeline options to expansion service
> -
>
> Key: BEAM-9848
> URL: https://issues.apache.org/jira/browse/BEAM-9848
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9841) PortableRunner does not support wait_until_finish(duration=...)

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9841?focusedWorklogId=428429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428429
 ]

ASF GitHub Bot logged work on BEAM-9841:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:15
Start Date: 28/Apr/20 23:15
Worklog Time Spent: 10m 
  Work Description: ibzib commented on a change in pull request #11556:
URL: https://github.com/apache/beam/pull/11556#discussion_r416974908



##
File path: sdks/python/apache_beam/runners/portability/portable_runner.py
##
@@ -433,13 +435,12 @@ def run_pipeline(self, pipeline, options):
 state_stream,
 cleanup_callbacks)
 if cleanup_callbacks:
-  # We wait here to ensure that we run the cleanup callbacks.
+  # Register an exit handler to ensure cleanup on exit.
+  atexit.register(functools.partial(result._cleanup, on_exit=True))
   _LOGGER.info(
-  'Waiting until the pipeline has finished because the '
-  'environment "%s" has started a component necessary for the '
-  'execution.',
+  'Environment "%s" has started a component necessary for the '
+  'execution. Be sure to call wait_until_finish()',

Review comment:
   We prefer the use of Python's `with` syntax instead of calling 
`wait_until_finish` explicitly.

##
File path: sdks/python/apache_beam/runners/portability/portable_runner.py
##
@@ -535,7 +537,7 @@ def _last_error_message(self):
 else:
   return 'unknown error'
 
-  def wait_until_finish(self):
+  def wait_until_finish(self, duration=None):

Review comment:
   A comment explaining the units for duration (milliseconds?) and that 
`duration=None` actually means "wait forever" would be helpful. (I find this 
naming somewhat counter-intuitive, but it's too late to change now.)

##
File path: sdks/python/apache_beam/runners/portability/portable_runner.py
##
@@ -557,23 +559,47 @@ def read_messages():
 t.daemon = True
 t.start()
 
+if duration:
+  t2 = threading.Thread(

Review comment:
   Please give `t` and `t2` descriptive variable names.

##
File path: sdks/python/apache_beam/runners/portability/portable_runner.py
##
@@ -557,23 +559,47 @@ def read_messages():
 t.daemon = True
 t.start()
 
+if duration:
+  t2 = threading.Thread(
+  target=functools.partial(self._observe, t),
+  name='wait_until_finish_state_observer')
+  t2.daemon = True
+  t2.start()
+  start_time = time.time()
+  duration_secs = duration / 1000
+  while time.time() - start_time < duration_secs and t2.is_alive():
+time.sleep(1)
+else:
+  self._observe(t)
+
+if self._runtime_exception:
+  raise self._runtime_exception
+
+return self._state
+
+  def _observe(self, message_thread):
 try:
   for state_response in self._state_stream:
 self._state = self._runner_api_state_to_pipeline_state(
 state_response.state)
 if state_response.state in TERMINAL_STATES:
   # Wait for any last messages.
-  t.join(10)
+  message_thread.join(10)
   break
   if self._state != runner.PipelineState.DONE:
-raise RuntimeError(
+self._runtime_exception = RuntimeError(
 'Pipeline %s failed in state %s: %s' %
 (self._job_id, self._state, self._last_error_message()))
-  return self._state
+except Exception as e:
+  self._runtime_exception = e
 finally:
   self._cleanup()
 
-  def _cleanup(self):
+  def _cleanup(self, on_exit=False):
+if on_exit and self._cleanup_callbacks:
+  _LOGGER.info(
+  'Running cleanup on exit. If your local pipeline should continue '
+  'running, be sure to call pipeline.run().wait_until_finish().')

Review comment:
   `with` (see above comment)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428429)
Time Spent: 20m  (was: 10m)

> PortableRunner does not support wait_until_finish(duration=...)
> ---
>
> Key: BEAM-9841
> URL: https://issues.apache.org/jira/browse/BEAM-9841
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Other runners in the Python SDK support waiting for a finite amount of time 

[jira] [Work logged] (BEAM-8949) Add Spanner IO Integration Test for Python

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8949?focusedWorklogId=428430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428430
 ]

ASF GitHub Bot logged work on BEAM-8949:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:15
Start Date: 28/Apr/20 23:15
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11210:
URL: https://github.com/apache/beam/pull/11210#issuecomment-620903075


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428430)
Time Spent: 7.5h  (was: 7h 20m)

> Add Spanner IO Integration Test for Python
> --
>
> Key: BEAM-8949
> URL: https://issues.apache.org/jira/browse/BEAM-8949
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shoaib Zafar
>Assignee: Shoaib Zafar
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
> from the spanner. Currently, it only contains direct runner unit tests. In 
> order to make this functionality available for the users, integration tests 
> also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9848) Pass caller pipeline options to expansion service

2020-04-28 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-9848:
---

 Summary: Pass caller pipeline options to expansion service
 Key: BEAM-9848
 URL: https://issues.apache.org/jira/browse/BEAM-9848
 Project: Beam
  Issue Type: Improvement
  Components: beam-model
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=428428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428428
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:05
Start Date: 28/Apr/20 23:05
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #10546:
URL: https://github.com/apache/beam/pull/10546#discussion_r416976043



##
File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
##
@@ -391,36 +393,57 @@ static Cluster getCluster(
 .withSplitCount(splitCount)
 .withMapperFactoryFn(this.mapperFactoryFn());
 
-if (isMurmur3Partitioner(cluster)) {
-  LOG.info("Murmur3Partitioner detected, splitting");
-
-  List tokens =
-  cluster.getMetadata().getTokenRanges().stream()
-  .map(tokenRange -> new 
BigInteger(tokenRange.getEnd().getValue().toString()))
-  .collect(Collectors.toList());
-
-  SplitGenerator splitGenerator =
-  new SplitGenerator(cluster.getMetadata().getPartitioner());
-
-  List> splits =
-  splitGenerator.generateSplits(splitCount, tokens).stream()
-  .map(rr -> CassandraIO.read().withRingRange(rr))
-  .collect(Collectors.toList());
+return input
+.apply(Create.of(this))
+.apply(ParDo.of(new SplitFn()))
+.setCoder(SerializableCoder.of(new TypeDescriptor>() {}))
+.apply(Reshuffle.viaRandomKey())
+.apply(readAll);

Review comment:
   Now we need to tackle this in two parts maybe, one is to implement the 
read with a `ReadFn` like method  and as a next step to get rid of all the 
methods on `ReadAll` to simplify it to its core.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428428)
Time Spent: 12h  (was: 11h 50m)

> Add readAll() method to CassandraIO
> ---
>
> Key: BEAM-9008
> URL: https://issues.apache.org/jira/browse/BEAM-9008
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-cassandra
>Affects Versions: 2.16.0
>Reporter: vincent marquez
>Assignee: vincent marquez
>Priority: Minor
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> When querying a large cassandra database, it's often *much* more useful to 
> programatically generate the queries needed to to be run rather than reading 
> all partitions and attempting some filtering.  
> As an example:
> {code:java}
> public class Event { 
>@PartitionKey(0) public UUID accountId;
>@PartitionKey(1)public String yearMonthDay; 
>@ClusteringKey public UUID eventId;  
>//other data...
> }{code}
> If there is ten years worth of data, you may want to only query one year's 
> worth.  Here each token range would represent one 'token' but all events for 
> the day. 
> {code:java}
> Set accounts = getRelevantAccounts();
> Set dateRange = generateDateRange("2018-01-01", "2019-01-01");
> PCollection tokens = generateTokens(accounts, dateRange); 
> {code}
>  
>  I propose an additional _readAll()_ PTransform that can take a PCollection 
> of token ranges and can return a PCollection of what the query would 
> return. 
> *Question: How much code should be in common between both methods?* 
> Currently the read connector already groups all partitions into a List of 
> Token Ranges, so it would be simple to refactor the current read() based 
> method to a 'ParDo' based one and have them both share the same function.  
> Reasons against sharing code between read and readAll
>  * Not having the read based method return a BoundedSource connector would 
> mean losing the ability to know the size of the data returned
>  * Currently the CassandraReader executes all the grouped TokenRange queries 
> *asynchronously* which is (maybe?) fine when all that's happening is 
> splitting up all the partition ranges but terrible for executing potentially 
> millions of queries. 
>  Reasons _for_ sharing code would be simplified code base and that both of 
> the above issues would most likely have a negligable performance impact. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428426
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:05
Start Date: 28/Apr/20 23:05
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on a change in pull request 
#11548:
URL: https://github.com/apache/beam/pull/11548#discussion_r416975896



##
File path: sdks/java/container/build.gradle
##
@@ -101,16 +84,44 @@ docker {
   project.rootProject["docker-tag"] : project.sdk_version)
   dockerfile project.file("./${dockerfileName}")
   files "./build/"
+  buildArgs(['pull_licenses': 
!project.rootProject.hasProperty(["no-licenses"])])

Review comment:
   The purpose of checking urls was to make sure files can be pulled from 
the urls when create images. If we set it to skip, no url checking process 
happened for the PRs and we may see many license issues when we create release 
images. Then release managers need to fix the issues. If we add the checking 
urls for each PR, contributors should fix the issues if any and will not have 
to fix them during release. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428426)
Time Spent: 25h  (was: 24h 50m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 25h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428427
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:05
Start Date: 28/Apr/20 23:05
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on a change in pull request 
#11548:
URL: https://github.com/apache/beam/pull/11548#discussion_r416976074



##
File path: sdks/java/container/build.gradle
##
@@ -101,16 +84,44 @@ docker {
   project.rootProject["docker-tag"] : project.sdk_version)
   dockerfile project.file("./${dockerfileName}")
   files "./build/"
+  buildArgs(['pull_licenses': 
!project.rootProject.hasProperty(["no-licenses"])])

Review comment:
   There are some more discussion at the thread today, in case you haven't 
seen it yet.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428427)
Time Spent: 25h 10m  (was: 25h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 25h 10m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=428425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428425
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 28/Apr/20 23:04
Start Date: 28/Apr/20 23:04
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #10546:
URL: https://github.com/apache/beam/pull/10546#discussion_r416975426



##
File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
##
@@ -391,36 +393,57 @@ static Cluster getCluster(
 .withSplitCount(splitCount)
 .withMapperFactoryFn(this.mapperFactoryFn());
 
-if (isMurmur3Partitioner(cluster)) {
-  LOG.info("Murmur3Partitioner detected, splitting");
-
-  List tokens =
-  cluster.getMetadata().getTokenRanges().stream()
-  .map(tokenRange -> new 
BigInteger(tokenRange.getEnd().getValue().toString()))
-  .collect(Collectors.toList());
-
-  SplitGenerator splitGenerator =
-  new SplitGenerator(cluster.getMetadata().getPartitioner());
-
-  List> splits =
-  splitGenerator.generateSplits(splitCount, tokens).stream()
-  .map(rr -> CassandraIO.read().withRingRange(rr))
-  .collect(Collectors.toList());
+return input
+.apply(Create.of(this))
+.apply(ParDo.of(new SplitFn()))
+.setCoder(SerializableCoder.of(new TypeDescriptor>() {}))
+.apply(Reshuffle.viaRandomKey())
+.apply(readAll);
+  }
+}
 
-  return input.apply("Creating splits", 
Create.of(splits)).apply("readAll", readAll);
+private static class SplitFn extends DoFn, Read> {

Review comment:
   This looks perfect!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428425)
Time Spent: 11h 50m  (was: 11h 40m)

> Add readAll() method to CassandraIO
> ---
>
> Key: BEAM-9008
> URL: https://issues.apache.org/jira/browse/BEAM-9008
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-cassandra
>Affects Versions: 2.16.0
>Reporter: vincent marquez
>Assignee: vincent marquez
>Priority: Minor
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> When querying a large cassandra database, it's often *much* more useful to 
> programatically generate the queries needed to to be run rather than reading 
> all partitions and attempting some filtering.  
> As an example:
> {code:java}
> public class Event { 
>@PartitionKey(0) public UUID accountId;
>@PartitionKey(1)public String yearMonthDay; 
>@ClusteringKey public UUID eventId;  
>//other data...
> }{code}
> If there is ten years worth of data, you may want to only query one year's 
> worth.  Here each token range would represent one 'token' but all events for 
> the day. 
> {code:java}
> Set accounts = getRelevantAccounts();
> Set dateRange = generateDateRange("2018-01-01", "2019-01-01");
> PCollection tokens = generateTokens(accounts, dateRange); 
> {code}
>  
>  I propose an additional _readAll()_ PTransform that can take a PCollection 
> of token ranges and can return a PCollection of what the query would 
> return. 
> *Question: How much code should be in common between both methods?* 
> Currently the read connector already groups all partitions into a List of 
> Token Ranges, so it would be simple to refactor the current read() based 
> method to a 'ParDo' based one and have them both share the same function.  
> Reasons against sharing code between read and readAll
>  * Not having the read based method return a BoundedSource connector would 
> mean losing the ability to know the size of the data returned
>  * Currently the CassandraReader executes all the grouped TokenRange queries 
> *asynchronously* which is (maybe?) fine when all that's happening is 
> splitting up all the partition ranges but terrible for executing potentially 
> millions of queries. 
>  Reasons _for_ sharing code would be simplified code base and that both of 
> the above issues would most likely have a negligable performance impact. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9847) Verify If Triggering allows emitting eager results when processing a single element in HL7v2IO.

2020-04-28 Thread Jacob Ferriero (Jira)
Jacob Ferriero created BEAM-9847:


 Summary: Verify If Triggering allows emitting eager results when 
processing a single element in HL7v2IO.
 Key: BEAM-9847
 URL: https://issues.apache.org/jira/browse/BEAM-9847
 Project: Beam
  Issue Type: Task
  Components: io-java-gcp
Reporter: Jacob Ferriero
Assignee: Jacob Ferriero


Due to the nature of the HL7v2 API, HL7v2IO.ListHL7v2Messages follows the 
pattern of [paginating through all ListMessages results in a single 
ProcessElement call.|#diff-9cd595f078378218ccc01ce7e19ca766R447]]

 

Upon testing with customer against HL7v2 store with 350k messages  we observed 
that the ListMessages transform was not outputting any elements "hanging" for a 
long time (which was assumed to be the single thread paginating through all the 
results).

 
We added the following triggering in hopes that it would emit early results:

{code:java}
  .apply(
  Window.into(new GlobalWindows())
  .triggering(
  AfterWatermark.pastEndOfWindow()
  .withEarlyFirings(
  AfterProcessingTime.pastFirstElementInPane()
  .plusDelayOf(Duration.standardSeconds(1
  .discardingFiredPanes())
{code}


Our tests with this triggering seemed to indicate that it did not "hang" like 
the first test and seemed to output a more steady stream of elements.

Reviewer states that bundles must be committed atomically so no output elements 
of a (single process element call) can proceed to downstream stages until all 
output elements for that process element call are ready.

There may be other things at play here. Will seek to reproduce in a way that 
definitively confirms output elements can be eagerly output during the 
execution of a single process element call before it completes.

CC: [~pabloem]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9285) Add Postcommit ValidatesRunner CI Job for Flink on Java 11

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9285?focusedWorklogId=428418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428418
 ]

ASF GitHub Bot logged work on BEAM-9285:


Author: ASF GitHub Bot
Created on: 28/Apr/20 22:40
Start Date: 28/Apr/20 22:40
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #11480:
URL: https://github.com/apache/beam/pull/11480#issuecomment-620892620


   Run Flink ValidatesRunner Java 11



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428418)
Time Spent: 3h 50m  (was: 3h 40m)

> Add Postcommit ValidatesRunner CI Job for Flink on Java 11
> --
>
> Key: BEAM-9285
> URL: https://issues.apache.org/jira/browse/BEAM-9285
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Flink 1.10 runner introduces support for Java 11. We need to add a job in the 
> CI that covers the complete suite of Flink 1.10 runner tests on Java 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9285) Add Postcommit ValidatesRunner CI Job for Flink on Java 11

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9285?focusedWorklogId=428417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428417
 ]

ASF GitHub Bot logged work on BEAM-9285:


Author: ASF GitHub Bot
Created on: 28/Apr/20 22:40
Start Date: 28/Apr/20 22:40
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #11480:
URL: https://github.com/apache/beam/pull/11480#issuecomment-620892562


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428417)
Time Spent: 3h 40m  (was: 3.5h)

> Add Postcommit ValidatesRunner CI Job for Flink on Java 11
> --
>
> Key: BEAM-9285
> URL: https://issues.apache.org/jira/browse/BEAM-9285
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Flink 1.10 runner introduces support for Java 11. We need to add a job in the 
> CI that covers the complete suite of Flink 1.10 runner tests on Java 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428415
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 28/Apr/20 22:30
Start Date: 28/Apr/20 22:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #11548:
URL: https://github.com/apache/beam/pull/11548#discussion_r416962452



##
File path: sdks/java/container/build.gradle
##
@@ -101,16 +84,44 @@ docker {
   project.rootProject["docker-tag"] : project.sdk_version)
   dockerfile project.file("./${dockerfileName}")
   files "./build/"
+  buildArgs(['pull_licenses': 
!project.rootProject.hasProperty(["no-licenses"])])

Review comment:
   Right, so to produce lightweight images we can skip license download. 
Why would we ever want to 'check' urls but not downloading the licenses? 
Quoting your comment on the mailing list:
   
   > I tried to check if urls are valid instead of pulling the files and it
   > reduced only 1 min of running time. So, it's not an option here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428415)
Time Spent: 24h 50m  (was: 24h 40m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 24h 50m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428414
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 28/Apr/20 22:29
Start Date: 28/Apr/20 22:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #11548:
URL: https://github.com/apache/beam/pull/11548#discussion_r416962452



##
File path: sdks/java/container/build.gradle
##
@@ -101,16 +84,44 @@ docker {
   project.rootProject["docker-tag"] : project.sdk_version)
   dockerfile project.file("./${dockerfileName}")
   files "./build/"
+  buildArgs(['pull_licenses': 
!project.rootProject.hasProperty(["no-licenses"])])

Review comment:
   Right, so to produce lightweight images we can skip license download. 
Why would we ever want to 'check' urls but not downloading the licenses? From 
your comment:
   ```
   I tried to check if urls are valid instead of pulling the files and it
   reduced only 1 min of running time. So, it's not an option here.
   ``` 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428414)
Time Spent: 24h 40m  (was: 24.5h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 24h 40m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=428409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428409
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 28/Apr/20 22:18
Start Date: 28/Apr/20 22:18
Worklog Time Spent: 10m 
  Work Description: jaketf commented on a change in pull request #11339:
URL: https://github.com/apache/beam/pull/11339#discussion_r416957719



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java
##
@@ -0,0 +1,977 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.healthcare;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.api.services.healthcare.v1beta1.model.HttpBody;
+import com.google.api.services.healthcare.v1beta1.model.Operation;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.ThreadLocalRandom;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.ResolveOptions.StandardResolveOptions;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupIntoBatches;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.Sleeper;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.PInput;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.PValue;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.TupleTagList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Throwables;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+import org.codehaus.jackson.JsonProcessingException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link FhirIO} provides an API for reading and writing resources to https://cloud.google.com/healthcare/docs/concepts/fhir;>Google Cloud 
Healthcare Fhir API.
+ * 
+ *
+ * Read
+ *
+ * FHIR resources can be read with {@link FhirIO.Read} supports use cases 
where you have a
+ * ${@link PCollection} of message IDS. This is appropriate for reading the 
Fhir notifications from
+ * a Pub/Sub subscription with {@link PubsubIO#readStrings()} or in cases 
where you have a manually
+ * prepared list of messages that you need to process (e.g. in a text file 
read with {@link
+ * org.apache.beam.sdk.io.TextIO}*) .
+ *
+ * Fetch Resource contents from Fhir Store based on the {@link PCollection} 
of message ID strings
+ * {@link FhirIO.Read.Result} where one can call {@link 
Read.Result#getResources()} to retrieved a
+ * {@link PCollection} containing the successfully fetched {@link HttpBody}s 
and/or {@link
+ * FhirIO.Read.Result#getFailedReads()}* to retrieve a {@link PCollection} of 
{@link
+ * HealthcareIOError}* containing the resource ID that could not be fetched 
and the exception as a
+ * {@link HealthcareIOError}, this can be used to write to the dead letter 
storage system of your
+ * choosing. This error handling is mainly to transparently 

[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428408=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428408
 ]

ASF GitHub Bot logged work on BEAM-9831:


Author: ASF GitHub Bot
Created on: 28/Apr/20 22:16
Start Date: 28/Apr/20 22:16
Worklog Time Spent: 10m 
  Work Description: jaketf commented on a change in pull request #11538:
URL: https://github.com/apache/beam/pull/11538#discussion_r416957276



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##
@@ -437,6 +444,20 @@ private Message fetchMessage(HealthcareApiClient client, 
String msgId)
   .apply(Create.of(this.hl7v2Stores))
   .apply(ParDo.of(new ListHL7v2MessagesFn(this.filter)))
   .setCoder(new HL7v2MessageCoder())
+  // Listing takes a long time for each input element (HL7v2 store) 
because it has to
+  // paginate through results in a single thread / ProcessElement call 
in order to keep
+  // track of page token.
+  // Eagerly emit data on 1 second intervals so downstream processing 
can get started before
+  // all of the list results have been paginated through.

Review comment:
   Each "page" of responses is a collection of messages. It don't think it 
make sense to page through all the pages (dropping the real data) to then 
re-fetch it in the downstream parallelized step. 
   
   In testing w/ customer when pointing at an HL7v2 store with many, many 
messages (and therefore pages) they reported 
   before this change:
   there was a long time before any elements were output. so long that they 
gave up and killed the pipeline. 
   after this change: 
   there was data coming out more regularly.
   
   This could have been a misunderstanding or a bad test scenario.
   I will try to come up with a test that reproduces this behavior.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428408)
Time Spent: 1h 10m  (was: 1h)

> HL7v2IO Improvements
> 
>
> Key: BEAM-9831
> URL: https://issues.apache.org/jira/browse/BEAM-9831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> # HL7v2MessageCoder constructor should be public for use by end users
>  # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list 
> messages results before emitting any output data elements (due to high fan 
> out from a single input element). We should add early firings so that 
> downstream processing can proceed on early pages while later pages are still 
> being scrolled through.
>  # We should drop all output only fields of HL7v2Message and only keep data 
> and labels when calling ingestMessages, rather than expecting the user to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9699) Add ability to use ZetaSQL in Python SqlTransform

2020-04-28 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-9699:
---

Assignee: Brian Hulette

> Add ability to use ZetaSQL in Python SqlTransform
> -
>
> Key: BEAM-9699
> URL: https://issues.apache.org/jira/browse/BEAM-9699
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> This may just work when the [plannerName pipeline 
> option|https://github.com/apache/beam/blob/1e52e4298085eda8e88e1215c7a73d52658b31f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29]
>  is exposed to Python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9699) Add ability to use ZetaSQL in Python SqlTransform

2020-04-28 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9699 started by Brian Hulette.
---
> Add ability to use ZetaSQL in Python SqlTransform
> -
>
> Key: BEAM-9699
> URL: https://issues.apache.org/jira/browse/BEAM-9699
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> This may just work when the [plannerName pipeline 
> option|https://github.com/apache/beam/blob/1e52e4298085eda8e88e1215c7a73d52658b31f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L29]
>  is exposed to Python



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=428401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428401
 ]

ASF GitHub Bot logged work on BEAM-9650:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:57
Start Date: 28/Apr/20 21:57
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on pull request #11477:
URL: https://github.com/apache/beam/pull/11477#issuecomment-620876614


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428401)
Time Spent: 8.5h  (was: 8h 20m)

> Add consistent slowly changing side inputs support
> --
>
> Key: BEAM-9650
> URL: https://issues.apache.org/jira/browse/BEAM-9650
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Add implementation for slowly changing dimentions based on [design 
> doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9650) Add consistent slowly changing side inputs support

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9650?focusedWorklogId=428402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428402
 ]

ASF GitHub Bot logged work on BEAM-9650:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:57
Start Date: 28/Apr/20 21:57
Worklog Time Spent: 10m 
  Work Description: Ardagan removed a comment on pull request #11477:
URL: https://github.com/apache/beam/pull/11477#issuecomment-620876614


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428402)
Time Spent: 8h 40m  (was: 8.5h)

> Add consistent slowly changing side inputs support
> --
>
> Key: BEAM-9650
> URL: https://issues.apache.org/jira/browse/BEAM-9650
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Add implementation for slowly changing dimentions based on [design 
> doc](https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9643) Add user-facing Go SDF documentation.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9643?focusedWorklogId=428400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428400
 ]

ASF GitHub Bot logged work on BEAM-9643:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:54
Start Date: 28/Apr/20 21:54
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #11517:
URL: https://github.com/apache/beam/pull/11517#discussion_r416930600



##
File path: sdks/go/examples/stringsplit/offsetrange/offsetrange.go
##
@@ -89,21 +89,23 @@ func (tracker *Tracker) GetError() error {
 }
 
 // TrySplit splits at the nearest integer greater than the given fraction of 
the remainder.
-func (tracker *Tracker) TrySplit(fraction float64) (interface{}, error) {
+func (tracker *Tracker) TrySplit(fraction float64) (interface{}, interface{}, 
error) {
if tracker.Stopped || tracker.IsDone() {
-   return nil, nil
+   return tracker.Rest, nil, nil
}
-   if fraction < 0 || fraction > 1 {
-   return nil, errors.New("fraction must be in range [0, 1]")
+   if fraction < 0 {
+   fraction = 0

Review comment:
   Might be worth documenting this behavior in the comment.

##
File path: sdks/go/examples/stringsplit/offsetrange/offsetrange.go
##
@@ -89,21 +89,23 @@ func (tracker *Tracker) GetError() error {
 }
 
 // TrySplit splits at the nearest integer greater than the given fraction of 
the remainder.
-func (tracker *Tracker) TrySplit(fraction float64) (interface{}, error) {
+func (tracker *Tracker) TrySplit(fraction float64) (interface{}, interface{}, 
error) {

Review comment:
   Given this is the example, using named return values (primary, residual 
...) is appropriate here for documentation purposes (but not so one can use an 
empty return.)

##
File path: sdks/go/pkg/beam/pardo.go
##
@@ -222,6 +222,87 @@ func ParDo0(s Scope, dofn interface{}, col PCollection, 
opts ...Option) {
 // DoFn instance via output PCollections, in the absence of external
 // communication mechanisms written by user code.
 //
+// Splittable DoFns (Experimental)
+//
+// Warning: Splittable DoFns are still experimental, largely untested, and
+// likely to have bugs.
+//
+// Splittable DoFns are DoFns that are able to split work within an element,
+// as opposed to only at element boundaries like normal DoFns. This is useful
+// for DoFns that emit many outputs per input element and can distribute that
+// work among multiple workers. The most common examples of this are sources.
+//
+// In order to split work within an element, splittable DoFns use the concept 
of
+// restrictions, which are objects that are associated with an element and
+// describe a portion of work on that element. For example, a restriction
+// associated with a filename might describe what byte range within that file 
to
+// process. In addition to restrictions, splittable DoFns also rely on
+// restriction trackers to track progress and perform splits on a restriction
+// currently being processed. See the `RTracker` interface in core/sdf/sdf.go
+// for more details.
+//
+// Splitting
+//
+// Splitting means taking one restriction and splitting into two or more that
+// cover the entire input space of the original one. In other words, processing
+// all the split restrictions should produce identical output to processing
+// the original one.
+//
+// Splitting occurs in two stages. The initial splitting occurs before any
+// restrictions have started processing. This step is used to split large
+// restrictions into smaller ones that can then be distributed among multiple
+// workers for processing. Initial splitting is user-defined and optional.
+//
+// Dynamic splitting occurs during the processing of a restriction in runners
+// that have implemented it. If there are available workers, runners may split
+// the unprocessed portion of work from a busy worker and shard it to available
+// workers in order to better distribute work. With unsplittable DoFns this can
+// only occur on element boundaries, but for splittable DoFns this split
+// can land within a restriction and will require splitting that restriction.
+//
+// * Note: The Go SDK currently does not support dynamic splitting for SDFs,
+//   only initial splitting. Only initially split restrictions can be
+//   distributed by liquid sharding. Stragglers will not be split during
+//   execution with dynamic splitting.
+//
+// Splittable DoFn Methods
+//
+// Making a splittable DoFn requires the following methods to be implemented on
+// a DoFn in addition to the usual DoFn requirements. In the following
+// method signatures `elem` represents the main input elements to the DoFn, and
+// should match the types used in ProcessElement. `restriction` represents the
+// user-defined restriction, and can 

[jira] [Work logged] (BEAM-9393) support schemas in state API

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9393?focusedWorklogId=428397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428397
 ]

ASF GitHub Bot logged work on BEAM-9393:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:51
Start Date: 28/Apr/20 21:51
Worklog Time Spent: 10m 
  Work Description: dpmills commented on pull request #10983:
URL: https://github.com/apache/beam/pull/10983#issuecomment-620874181


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428397)
Time Spent: 50m  (was: 40m)

> support schemas in state API
> 
>
> Key: BEAM-9393
> URL: https://issues.apache.org/jira/browse/BEAM-9393
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9393) support schemas in state API

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9393?focusedWorklogId=428394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428394
 ]

ASF GitHub Bot logged work on BEAM-9393:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:40
Start Date: 28/Apr/20 21:40
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10983:
URL: https://github.com/apache/beam/pull/10983#issuecomment-620869944


   @dpmills any comments on this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428394)
Time Spent: 0.5h  (was: 20m)

> support schemas in state API
> 
>
> Key: BEAM-9393
> URL: https://issues.apache.org/jira/browse/BEAM-9393
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=428389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428389
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:28
Start Date: 28/Apr/20 21:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11039:
URL: https://github.com/apache/beam/pull/11039#issuecomment-620864937







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428389)
Time Spent: 6.5h  (was: 6h 20m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=428387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428387
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:27
Start Date: 28/Apr/20 21:27
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11039:
URL: https://github.com/apache/beam/pull/11039#issuecomment-620864665


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428387)
Time Spent: 6h 10m  (was: 6h)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=428388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428388
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:27
Start Date: 28/Apr/20 21:27
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11039:
URL: https://github.com/apache/beam/pull/11039#issuecomment-620864801


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428388)
Time Spent: 6h 20m  (was: 6h 10m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6661) FnApi gRPC setup/teardown glitch

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6661?focusedWorklogId=428386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428386
 ]

ASF GitHub Bot logged work on BEAM-6661:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:26
Start Date: 28/Apr/20 21:26
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11537:
URL: https://github.com/apache/beam/pull/11537#issuecomment-620864491


   > @ibzib Maybe also worth backporting?
   
   Might as well. `SEVERE: *~*~*~ Beam is horribly broken! *~*~*~` definitely 
tends to frighten new users.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428386)
Time Spent: 1h 10m  (was: 1h)

> FnApi gRPC setup/teardown glitch
> 
>
> Key: BEAM-6661
> URL: https://issues.apache.org/jira/browse/BEAM-6661
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Affects Versions: 2.11.0
>Reporter: Heejong Lee
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Multiple exceptions are observed during FnApi gRPC setup/teardown. The 
> examples are
> {noformat}
> 14:53:22 [grpc-default-executor-1] WARN 
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService - Logging 
> client failed unexpectedly.
> 14:53:22 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: 
> CANCELLED: cancelled before receiving half close
> 14:53:22 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517)
> 14:53:22 at{noformat}
> {noformat}
>  
> 14:52:56 [main] INFO org.apache.beam.runners.flink.FlinkJobServerDriver - 
> JobService started on localhost:58179
> 14:52:57 [grpc-default-executor-0] ERROR 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - 
> Encountered Unexpected Exception for Invocation 
> job_bfb7df0e-408e-4bfd-bb3c-432e946ca819
> 14:52:57 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: 
> NOT_FOUND
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534)
> 14:52:57 at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341)
> 14:52:57 at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262)
> 14:52:57 at 
> org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:693)
>  14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> 14:52:57 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 14:52:57 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {noformat}
> {noformat}
> 14:54:50 Feb 07, 2019 10:54:50 PM 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> 14:54:50 SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=628, 
> target=localhost:41409} was not shutdown properly!!! ~*~*~*
> 14:54:50 Make sure to call shutdown()/shutdownNow() and wait until 
> awaitTermination() returns true.
> 14:54:50 java.lang.RuntimeException: ManagedChannel allocation site
> 14:54:50 

[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428385=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428385
 ]

ASF GitHub Bot logged work on BEAM-9802:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:24
Start Date: 28/Apr/20 21:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11495:
URL: https://github.com/apache/beam/pull/11495#issuecomment-620863418


   Modulo the conflict.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428385)
Time Spent: 50m  (was: 40m)

> Provide a way to customize automatically started services.
> --
>
> Key: BEAM-9802
> URL: https://issues.apache.org/jira/browse/BEAM-9802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This can be useful for testing and alternative production environments. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428384
 ]

ASF GitHub Bot logged work on BEAM-9831:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:23
Start Date: 28/Apr/20 21:23
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11538:
URL: https://github.com/apache/beam/pull/11538#issuecomment-620862736


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428384)
Time Spent: 1h  (was: 50m)

> HL7v2IO Improvements
> 
>
> Key: BEAM-9831
> URL: https://issues.apache.org/jira/browse/BEAM-9831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> # HL7v2MessageCoder constructor should be public for use by end users
>  # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list 
> messages results before emitting any output data elements (due to high fan 
> out from a single input element). We should add early firings so that 
> downstream processing can proceed on early pages while later pages are still 
> being scrolled through.
>  # We should drop all output only fields of HL7v2Message and only keep data 
> and labels when calling ingestMessages, rather than expecting the user to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies

2020-04-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094883#comment-17094883
 ] 

Ismaël Mejía edited comment on BEAM-4087 at 4/28/20, 9:20 PM:
--

This is still an issue even if the PR / approach was not merged. We still do 
not have a practical way to test multiple provided versions of a dependency, 
and this is probably the only issue bugging me still after 2 years of the move 
to gradle, we still work by faith based validation for things like multi 
version compatibility of Kafka and Spark.

It is a pity because this is so simple to do in maven that still surprises me 
that it demands so much effort to get it done in gradle.

I still think it is worth to explore a general way to tackle this because 
multiple modules may require it, or is there some new simpler way to do it now 
that I missed?


was (Author: iemejia):
This is still an issue even if the PR / approach was not merged. We still do 
not have a practical way to test multiple provided versions of a dependency, 
and this is probably the only issue bugging me still after 2 years of the move 
to gradle, we still work by faith based validation for things like multi 
version compatibility of Kafka and Spark.

It is a pity because this is so simple to do in maven that still surprises me 
that it demands so much effort to get it done in gradle.

I still think it is worth to explore a way to tackle this, or is there some new 
simpler way to do it now that I missed?

> Gradle build does not allow to overwrite versions of provided dependencies
> --
>
> Key: BEAM-4087
> URL: https://issues.apache.org/jira/browse/BEAM-4087
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: gradle
> Fix For: Not applicable
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In order to test modules with provided dependencies in maven we can execute 
> for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 
> -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this 
> with gradle because the version of the dependencies are defined locally and 
> not in the gradle.properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies

2020-04-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reopened BEAM-4087:


This is still an issue even if the PR / approach was not merged. We still do 
not have a practical way to test multiple provided versions of a dependency, 
and this is probably the only issue bugging me still after 2 years of the move 
to gradle, we still work by faith based validation for things like multi 
version compatibility of Kafka and Spark.

It is a pity because this is so simple to do in maven that still surprises me 
that it demands so much effort to get it done in gradle.

I still think it is worth to explore a way to tackle this, or is there some new 
simpler way to do it now that I missed?

> Gradle build does not allow to overwrite versions of provided dependencies
> --
>
> Key: BEAM-4087
> URL: https://issues.apache.org/jira/browse/BEAM-4087
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: gradle
> Fix For: Not applicable
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In order to test modules with provided dependencies in maven we can execute 
> for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 
> -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this 
> with gradle because the version of the dependencies are defined locally and 
> not in the gradle.properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies

2020-04-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-4087:
---
Status: Open  (was: Triage Needed)

> Gradle build does not allow to overwrite versions of provided dependencies
> --
>
> Key: BEAM-4087
> URL: https://issues.apache.org/jira/browse/BEAM-4087
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: gradle
> Fix For: Not applicable
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In order to test modules with provided dependencies in maven we can execute 
> for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 
> -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this 
> with gradle because the version of the dependencies are defined locally and 
> not in the gradle.properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428381
 ]

ASF GitHub Bot logged work on BEAM-9815:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:18
Start Date: 28/Apr/20 21:18
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11524:
URL: https://github.com/apache/beam/pull/11524#issuecomment-620860334


   "No artifacts staged" state has been verified. Merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428381)
Time Spent: 2h 40m  (was: 2.5h)

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428383
 ]

ASF GitHub Bot logged work on BEAM-9831:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:18
Start Date: 28/Apr/20 21:18
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11538:
URL: https://github.com/apache/beam/pull/11538#issuecomment-620860467


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428383)
Time Spent: 50m  (was: 40m)

> HL7v2IO Improvements
> 
>
> Key: BEAM-9831
> URL: https://issues.apache.org/jira/browse/BEAM-9831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> # HL7v2MessageCoder constructor should be public for use by end users
>  # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list 
> messages results before emitting any output data elements (due to high fan 
> out from a single input element). We should add early firings so that 
> downstream processing can proceed on early pages while later pages are still 
> being scrolled through.
>  # We should drop all output only fields of HL7v2Message and only keep data 
> and labels when calling ingestMessages, rather than expecting the user to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428380
 ]

ASF GitHub Bot logged work on BEAM-9831:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:17
Start Date: 28/Apr/20 21:17
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11538:
URL: https://github.com/apache/beam/pull/11538#issuecomment-620859719


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428380)
Time Spent: 40m  (was: 0.5h)

> HL7v2IO Improvements
> 
>
> Key: BEAM-9831
> URL: https://issues.apache.org/jira/browse/BEAM-9831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> # HL7v2MessageCoder constructor should be public for use by end users
>  # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list 
> messages results before emitting any output data elements (due to high fan 
> out from a single input element). We should add early firings so that 
> downstream processing can proceed on early pages while later pages are still 
> being scrolled through.
>  # We should drop all output only fields of HL7v2Message and only keep data 
> and labels when calling ingestMessages, rather than expecting the user to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9831) HL7v2IO Improvements

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9831?focusedWorklogId=428379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428379
 ]

ASF GitHub Bot logged work on BEAM-9831:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:16
Start Date: 28/Apr/20 21:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #11538:
URL: https://github.com/apache/beam/pull/11538#discussion_r416060393



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##
@@ -475,7 +497,14 @@ public void initClient() throws IOException {
 public void listMessages(ProcessContext context) throws IOException {
   String hl7v2Store = context.element();
   // Output all elements of all pages.
-  this.client.getHL7v2MessageStream(hl7v2Store, 
this.filter).forEach(context::output);
+  HttpHealthcareApiClient.HL7v2MessagePages pages =
+  new HttpHealthcareApiClient.HL7v2MessagePages(client, hl7v2Store, 
this.filter);
+  long reqestTime = Instant.now().getMillis();

Review comment:
   `requestTime`?

##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/HL7v2IO.java
##
@@ -437,6 +444,20 @@ private Message fetchMessage(HealthcareApiClient client, 
String msgId)
   .apply(Create.of(this.hl7v2Stores))
   .apply(ParDo.of(new ListHL7v2MessagesFn(this.filter)))
   .setCoder(new HL7v2MessageCoder())
+  // Listing takes a long time for each input element (HL7v2 store) 
because it has to
+  // paginate through results in a single thread / ProcessElement call 
in order to keep
+  // track of page token.
+  // Eagerly emit data on 1 second intervals so downstream processing 
can get started before
+  // all of the list results have been paginated through.

Review comment:
   Unfortunately, this is not possible. If you are paginating from inside 
the single DoFn `processelement` call, the data coming out of it will only go 
downstream after the element is done being processed, so this windowing is not 
changing that in the execution.
   This is because bundle execution is committed atomically, so the whole 
bundle executes before data can go downstream. You do touch on an interesting 
example, which is one of the reasons that we came up with SplittableDoFn.
   
   Something you could try to do is:
   ```
   PColll pages = hl7v2Stores.apply(ParDo.of(new 
RetrieveAndOutputPagesFn()))
   
   pages.apply(Reshuffle.viaRandomKey()).apply(ParDo.of(new FetchEachPageFn())
   ```
   Though I don't know if you can actually do that : )





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428379)
Time Spent: 0.5h  (was: 20m)

> HL7v2IO Improvements
> 
>
> Key: BEAM-9831
> URL: https://issues.apache.org/jira/browse/BEAM-9831
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> # HL7v2MessageCoder constructor should be public for use by end users
>  # Currently HL7v2IO.ListHL7v2Messages blocks on pagination through list 
> messages results before emitting any output data elements (due to high fan 
> out from a single input element). We should add early firings so that 
> downstream processing can proceed on early pages while later pages are still 
> being scrolled through.
>  # We should drop all output only fields of HL7v2Message and only keep data 
> and labels when calling ingestMessages, rather than expecting the user to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9844) ParDoTest.KeyTests failing for Spark

2020-04-28 Thread Rehman Murad Ali (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094766#comment-17094766
 ] 

Rehman Murad Ali edited comment on BEAM-9844 at 4/28/20, 9:14 PM:
--

Fixed here  https://github.com/apache/beam/pull/11559


was (Author: rehmanmuradali):
Fixed here  https://github.com/apache/beam/pull/11154

> ParDoTest.KeyTests failing for Spark 
> -
>
> Key: BEAM-9844
> URL: https://issues.apache.org/jira/browse/BEAM-9844
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
> Fix For: Not applicable
>
>
> Seems like these tests were added recently and 
> beam_PostCommit_Java_ValidatesRunner_Spark is perma red due to this.
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7365/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7364/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/]
> I see two different errors.
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimer/]
> rg.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NumberFormatException: null
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimerWithGenericKey/]
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NullPointerException
> Rehman, can you take a look ?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-4830) Determine why go vet failures invoked by ./gradlew check were not caught be jenkins build build

2020-04-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-4830:
---
Status: Open  (was: Triage Needed)

> Determine why go vet failures invoked by ./gradlew check  were not caught be 
> jenkins build build
> 
>
> Key: BEAM-4830
> URL: https://issues.apache.org/jira/browse/BEAM-4830
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Alex Amato
>Assignee: Robert Burke
>Priority: Major
>  Labels: gradle, jenkins
>
> The purpose of this is to catch errors developers see when they first start 
> contributing to beam. Let's ensure we run the same commands in the 
> [contributing guide|https://beam.apache.org/contribute/].
>  
> Note: check runs more than build, so we are not catching these problems in 
> the continuous Jenkins testing.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428378
 ]

ASF GitHub Bot logged work on BEAM-9815:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:07
Start Date: 28/Apr/20 21:07
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11524:
URL: https://github.com/apache/beam/pull/11524#issuecomment-620855224


   I'll merge once I sanity check that we're back to "No artifacts" for the 
Dataflow tests, and otherwise passed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428378)
Time Spent: 2.5h  (was: 2h 20m)

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9845) Stage dependencies over the expansion service.

2020-04-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9845:
---
Status: Open  (was: Triage Needed)

> Stage dependencies over the expansion service.
> --
>
> Key: BEAM-9845
> URL: https://issues.apache.org/jira/browse/BEAM-9845
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will obviate the need for jar_packages. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428377
 ]

ASF GitHub Bot logged work on BEAM-9815:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:06
Start Date: 28/Apr/20 21:06
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #11524:
URL: https://github.com/apache/beam/pull/11524#discussion_r416922273



##
File path: sdks/go/test/run_integration_tests.sh
##
@@ -102,9 +102,8 @@ case $key in
 esac
 done
 
-if [[ "$RUNNER" != "universal" ]]; then
-  PUSH_CONTAINER_TO_GCR='yes'
-else
+PUSH_CONTAINER_TO_GCR='yes'

Review comment:
   A fair observation. I ended up with reversing the if clause anyway since 
the positive check is clearer.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428377)
Time Spent: 2h 20m  (was: 2h 10m)

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9844) ParDoTest.KeyTests failing for Spark

2020-04-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9844.

Fix Version/s: Not applicable
   Resolution: Duplicate

> ParDoTest.KeyTests failing for Spark 
> -
>
> Key: BEAM-9844
> URL: https://issues.apache.org/jira/browse/BEAM-9844
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
> Fix For: Not applicable
>
>
> Seems like these tests were added recently and 
> beam_PostCommit_Java_ValidatesRunner_Spark is perma red due to this.
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7365/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7364/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/]
> I see two different errors.
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimer/]
> rg.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NumberFormatException: null
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimerWithGenericKey/]
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NullPointerException
> Rehman, can you take a look ?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428376
 ]

ASF GitHub Bot logged work on BEAM-9815:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:05
Start Date: 28/Apr/20 21:05
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #11524:
URL: https://github.com/apache/beam/pull/11524#discussion_r416922004



##
File path: sdks/go/test/run_integration_tests.sh
##
@@ -118,7 +117,7 @@ test -d sdks/go/test
 command -v docker
 docker -v
 
-if [[ PUSH_CONTAINER_TO_GCR == 'yes' ]]; then
+if [ "$PUSH_CONTAINER_TO_GCR" = "yes" ]; then

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428376)
Time Spent: 2h 10m  (was: 2h)

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9844) ParDoTest.KeyTests failing for Spark

2020-04-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9844:
--

Assignee: (was: Rehman Murad Ali)

> ParDoTest.KeyTests failing for Spark 
> -
>
> Key: BEAM-9844
> URL: https://issues.apache.org/jira/browse/BEAM-9844
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Seems like these tests were added recently and 
> beam_PostCommit_Java_ValidatesRunner_Spark is perma red due to this.
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7365/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7364/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/]
> I see two different errors.
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimer/]
> rg.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NumberFormatException: null
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimerWithGenericKey/]
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NullPointerException
> Rehman, can you take a look ?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9815) beam_PostCommit_Go perma red due to failing to start container

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9815?focusedWorklogId=428374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428374
 ]

ASF GitHub Bot logged work on BEAM-9815:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:05
Start Date: 28/Apr/20 21:05
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11524:
URL: https://github.com/apache/beam/pull/11524#issuecomment-620854348


   Run Go Postcommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428374)
Time Spent: 2h  (was: 1h 50m)

> beam_PostCommit_Go perma red due to failing to start container
> --
>
> Key: BEAM-9815
> URL: https://issues.apache.org/jira/browse/BEAM-9815
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Robert Bradshaw
>Priority: Critical
>  Labels: currently-failing
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> For example,
> [https://builds.apache.org/job/beam_PostCommit_Go/6847/]
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=500=false=2020-04-24T15:09:13.45500Z==true=NO_LIMIT=dataflow_step%2Fjob_id%2F2020-04-24_05_03_49-5495819388067192698=2020-04-24T13:03:38.313084000Z]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9801) Setting a timer from a timer callback fails

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9801?focusedWorklogId=428375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428375
 ]

ASF GitHub Bot logged work on BEAM-9801:


Author: ASF GitHub Bot
Created on: 28/Apr/20 21:05
Start Date: 28/Apr/20 21:05
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #11492:
URL: https://github.com/apache/beam/pull/11492#discussion_r416921999



##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
##
@@ -845,10 +845,12 @@ def process_bundle(self,
   (result_future.is_done() and result_future.get().error)):
 if isinstance(output, beam_fn_api_pb2.Elements.Timers) and not dry_run:
   with BundleManager._lock:
-self.bundle_context_manager.get_buffer(
+timer_buffer = self.bundle_context_manager.get_buffer(
 expected_output_timers[(
 output.transform_id, output.timer_family_id)],
-output.transform_id).append(output.timers)
+output.transform_id)
+timer_buffer.cleared = False

Review comment:
   Did you run into these errors when setting a timer from the timer call, 
Max? I think it would be good to explicitly reset the buffer, rather than 
manipulate its flag (e.g. write a timer_buffer.reset function). Or at least 
check `if timer_buffer.cleared: timer_buffer.cleared = False`, to confirm that 
the rest of the internal context in `ListBuffer` is cleared.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428375)
Time Spent: 5h 50m  (was: 5h 40m)

> Setting a timer from a timer callback fails
> ---
>
> Key: BEAM-9801
> URL: https://issues.apache.org/jira/browse/BEAM-9801
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Hi,
> I'm trying to set a timer from a timer callback in the Python SDK:
> {code:Python}
> class GenerateLoad(beam.DoFn):
>   timer_spec = userstate.TimerSpec('timer', userstate.TimeDomain.WATERMARK)
>   def process(self, element, timer=beam.DoFn.TimerParam(timer_spec)):
> self.key = element[0]
> timer.set(0)
>   @userstate.on_timer(timer_spec)
>   def process_timer(self, timer=beam.DoFn.TimerParam(timer_spec)):
> timer.set(0)
> {code}
> This yields the following Python stack trace:
> {noformat}
> INFO:apache_beam.utils.subprocess_server:Caused by: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> 4: Traceback (most recent call last):
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 245, in _execute
> INFO:apache_beam.utils.subprocess_server: response = task()
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 302, in 
> INFO:apache_beam.utils.subprocess_server: lambda: 
> self.create_worker().do_instruction(request), request)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction
> INFO:apache_beam.utils.subprocess_server: getattr(request, request_type), 
> request.instruction_id)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle
> INFO:apache_beam.utils.subprocess_server: 
> bundle_processor.process_bundle(instruction_id))
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/bundle_processor.py", line 910, in process_bundle
> INFO:apache_beam.utils.subprocess_server: element.timer_family_id, timer_data)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/operations.py", line 688, in process_timer
> INFO:apache_beam.utils.subprocess_server: timer_data.fire_timestamp)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 990, in process_user_timer
> INFO:apache_beam.utils.subprocess_server: self._reraise_augmented(exn)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 1043, in _reraise_augmented
> INFO:apache_beam.utils.subprocess_server: raise_with_traceback(new_exn)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 988, in process_user_timer
> 

[jira] [Updated] (BEAM-9844) ParDoTest.KeyTests failing for Spark

2020-04-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9844:
---
Status: Open  (was: Triage Needed)

> ParDoTest.KeyTests failing for Spark 
> -
>
> Key: BEAM-9844
> URL: https://issues.apache.org/jira/browse/BEAM-9844
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Rehman Murad Ali
>Priority: Major
>
> Seems like these tests were added recently and 
> beam_PostCommit_Java_ValidatesRunner_Spark is perma red due to this.
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7365/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7364/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/]
> I see two different errors.
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimer/]
> rg.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NumberFormatException: null
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7363/testReport/junit/org.apache.beam.sdk.transforms/ParDoTest$KeyTests/testKeyInOnTimerWithGenericKey/]
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NullPointerException
> Rehman, can you take a look ?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428370
 ]

ASF GitHub Bot logged work on BEAM-8742:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:50
Start Date: 28/Apr/20 20:50
Worklog Time Spent: 10m 
  Work Description: mxm removed a comment on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-620847201


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428370)
Time Spent: 3h 40m  (was: 3.5h)

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428369
 ]

ASF GitHub Bot logged work on BEAM-8742:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:50
Start Date: 28/Apr/20 20:50
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-620847201


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428369)
Time Spent: 3.5h  (was: 3h 20m)

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428367
 ]

ASF GitHub Bot logged work on BEAM-8742:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:49
Start Date: 28/Apr/20 20:49
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-620846828


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428367)
Time Spent: 3h 20m  (was: 3h 10m)

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9561) Run pandas tests with Beam Dataframe API

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9561?focusedWorklogId=428364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428364
 ]

ASF GitHub Bot logged work on BEAM-9561:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:47
Start Date: 28/Apr/20 20:47
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11419:
URL: https://github.com/apache/beam/pull/11419#issuecomment-620845876


   Thanks. Merging as soon as tests pass. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428364)
Time Spent: 0.5h  (was: 20m)

> Run pandas tests with Beam Dataframe API
> 
>
> Key: BEAM-9561
> URL: https://issues.apache.org/jira/browse/BEAM-9561
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8414) Cleanup Python codebase to enable some of the excluded Python lint checks.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8414?focusedWorklogId=428362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428362
 ]

ASF GitHub Bot logged work on BEAM-8414:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:41
Start Date: 28/Apr/20 20:41
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11523:
URL: https://github.com/apache/beam/pull/11523#issuecomment-620842927


   Hm lint and autoformat are complaining : (
   Lint:
   ```
   13:08:40 apache_beam/runners/interactive/interactive_runner.py:30:0: W0611: 
Unused import sys (unused-import)
   ```
   
   you can run the autoformatter via `tox -e py3-yapf` to format your code and 
ensure that test will pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428362)
Time Spent: 4h 40m  (was: 4.5h)

> Cleanup Python  codebase to enable some of the excluded Python lint checks.
> ---
>
> Key: BEAM-8414
> URL: https://issues.apache.org/jira/browse/BEAM-8414
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Stephen O'Kennedy
>Priority: Minor
>  Labels: beginner, easy, easy-fix, easyfix, newbie, starter
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/9725 upgraded lint checker, however  Beam 
> codebase is not fully compliant with some of the checks new linter supports, 
> so we excluded such checks. We would like to have some checks permanently 
> excluded (see discussion on the PR), however we would like to re-enable the 
> following checks:
> consider-using-set-comprehension
> chained-comparison
> consider-using-sys-exit
> To reenable these checks, we should:
> 1) remove them from disabled checks in .pylintrc [1] 
> https://github.com/apache/beam/blob/master/sdks/python/.pylintrc and 
> 2) cleanup the codebase to make it compliant.
> [1] 
> https://github.com/apache/beam/blob/3330069291d8168c56c77acfef84c2566af05ec6/sdks/python/.pylintrc#L81



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9801) Setting a timer from a timer callback fails

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9801?focusedWorklogId=428360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428360
 ]

ASF GitHub Bot logged work on BEAM-9801:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:40
Start Date: 28/Apr/20 20:40
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11492:
URL: https://github.com/apache/beam/pull/11492#issuecomment-620842616


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428360)
Time Spent: 5h 40m  (was: 5.5h)

> Setting a timer from a timer callback fails
> ---
>
> Key: BEAM-9801
> URL: https://issues.apache.org/jira/browse/BEAM-9801
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Hi,
> I'm trying to set a timer from a timer callback in the Python SDK:
> {code:Python}
> class GenerateLoad(beam.DoFn):
>   timer_spec = userstate.TimerSpec('timer', userstate.TimeDomain.WATERMARK)
>   def process(self, element, timer=beam.DoFn.TimerParam(timer_spec)):
> self.key = element[0]
> timer.set(0)
>   @userstate.on_timer(timer_spec)
>   def process_timer(self, timer=beam.DoFn.TimerParam(timer_spec)):
> timer.set(0)
> {code}
> This yields the following Python stack trace:
> {noformat}
> INFO:apache_beam.utils.subprocess_server:Caused by: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> 4: Traceback (most recent call last):
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 245, in _execute
> INFO:apache_beam.utils.subprocess_server: response = task()
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 302, in 
> INFO:apache_beam.utils.subprocess_server: lambda: 
> self.create_worker().do_instruction(request), request)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction
> INFO:apache_beam.utils.subprocess_server: getattr(request, request_type), 
> request.instruction_id)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle
> INFO:apache_beam.utils.subprocess_server: 
> bundle_processor.process_bundle(instruction_id))
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/bundle_processor.py", line 910, in process_bundle
> INFO:apache_beam.utils.subprocess_server: element.timer_family_id, timer_data)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/operations.py", line 688, in process_timer
> INFO:apache_beam.utils.subprocess_server: timer_data.fire_timestamp)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 990, in process_user_timer
> INFO:apache_beam.utils.subprocess_server: self._reraise_augmented(exn)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 1043, in _reraise_augmented
> INFO:apache_beam.utils.subprocess_server: raise_with_traceback(new_exn)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 988, in process_user_timer
> INFO:apache_beam.utils.subprocess_server: 
> self.do_fn_invoker.invoke_user_timer(timer_spec, key, window, timestamp)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 517, in invoke_user_timer
> INFO:apache_beam.utils.subprocess_server: self.user_state_context, key, 
> window, timestamp))
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 1093, in process_outputs
> INFO:apache_beam.utils.subprocess_server: for result in results:
> INFO:apache_beam.utils.subprocess_server: File 
> "/Users/max/Dev/beam/sdks/python/apache_beam/testing/load_tests/pardo_test.py",
>  line 185, in process_timer
> INFO:apache_beam.utils.subprocess_server: timer.set(0)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/bundle_processor.py", line 589, in set
> INFO:apache_beam.utils.subprocess_server: 
> self._timer_coder_impl.encode_to_stream(timer, self._output_stream, True)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/coders/coder_impl.py", line 651, in encode_to_stream
> 

[jira] [Work logged] (BEAM-9801) Setting a timer from a timer callback fails

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9801?focusedWorklogId=428359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428359
 ]

ASF GitHub Bot logged work on BEAM-9801:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:40
Start Date: 28/Apr/20 20:40
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11492:
URL: https://github.com/apache/beam/pull/11492#issuecomment-620842521


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428359)
Time Spent: 5.5h  (was: 5h 20m)

> Setting a timer from a timer callback fails
> ---
>
> Key: BEAM-9801
> URL: https://issues.apache.org/jira/browse/BEAM-9801
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Hi,
> I'm trying to set a timer from a timer callback in the Python SDK:
> {code:Python}
> class GenerateLoad(beam.DoFn):
>   timer_spec = userstate.TimerSpec('timer', userstate.TimeDomain.WATERMARK)
>   def process(self, element, timer=beam.DoFn.TimerParam(timer_spec)):
> self.key = element[0]
> timer.set(0)
>   @userstate.on_timer(timer_spec)
>   def process_timer(self, timer=beam.DoFn.TimerParam(timer_spec)):
> timer.set(0)
> {code}
> This yields the following Python stack trace:
> {noformat}
> INFO:apache_beam.utils.subprocess_server:Caused by: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> 4: Traceback (most recent call last):
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 245, in _execute
> INFO:apache_beam.utils.subprocess_server: response = task()
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 302, in 
> INFO:apache_beam.utils.subprocess_server: lambda: 
> self.create_worker().do_instruction(request), request)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction
> INFO:apache_beam.utils.subprocess_server: getattr(request, request_type), 
> request.instruction_id)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle
> INFO:apache_beam.utils.subprocess_server: 
> bundle_processor.process_bundle(instruction_id))
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/bundle_processor.py", line 910, in process_bundle
> INFO:apache_beam.utils.subprocess_server: element.timer_family_id, timer_data)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/operations.py", line 688, in process_timer
> INFO:apache_beam.utils.subprocess_server: timer_data.fire_timestamp)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 990, in process_user_timer
> INFO:apache_beam.utils.subprocess_server: self._reraise_augmented(exn)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 1043, in _reraise_augmented
> INFO:apache_beam.utils.subprocess_server: raise_with_traceback(new_exn)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 988, in process_user_timer
> INFO:apache_beam.utils.subprocess_server: 
> self.do_fn_invoker.invoke_user_timer(timer_spec, key, window, timestamp)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 517, in invoke_user_timer
> INFO:apache_beam.utils.subprocess_server: self.user_state_context, key, 
> window, timestamp))
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 1093, in process_outputs
> INFO:apache_beam.utils.subprocess_server: for result in results:
> INFO:apache_beam.utils.subprocess_server: File 
> "/Users/max/Dev/beam/sdks/python/apache_beam/testing/load_tests/pardo_test.py",
>  line 185, in process_timer
> INFO:apache_beam.utils.subprocess_server: timer.set(0)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/bundle_processor.py", line 589, in set
> INFO:apache_beam.utils.subprocess_server: 
> self._timer_coder_impl.encode_to_stream(timer, self._output_stream, True)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/coders/coder_impl.py", line 651, in encode_to_stream
> 

[jira] [Work logged] (BEAM-6661) FnApi gRPC setup/teardown glitch

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6661?focusedWorklogId=428358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428358
 ]

ASF GitHub Bot logged work on BEAM-6661:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:39
Start Date: 28/Apr/20 20:39
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11537:
URL: https://github.com/apache/beam/pull/11537#issuecomment-620842028


   @ibzib Maybe also worth backporting?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428358)
Time Spent: 1h  (was: 50m)

> FnApi gRPC setup/teardown glitch
> 
>
> Key: BEAM-6661
> URL: https://issues.apache.org/jira/browse/BEAM-6661
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Affects Versions: 2.11.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Multiple exceptions are observed during FnApi gRPC setup/teardown. The 
> examples are
> {noformat}
> 14:53:22 [grpc-default-executor-1] WARN 
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService - Logging 
> client failed unexpectedly.
> 14:53:22 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: 
> CANCELLED: cancelled before receiving half close
> 14:53:22 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517)
> 14:53:22 at{noformat}
> {noformat}
>  
> 14:52:56 [main] INFO org.apache.beam.runners.flink.FlinkJobServerDriver - 
> JobService started on localhost:58179
> 14:52:57 [grpc-default-executor-0] ERROR 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - 
> Encountered Unexpected Exception for Invocation 
> job_bfb7df0e-408e-4bfd-bb3c-432e946ca819
> 14:52:57 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: 
> NOT_FOUND
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534)
> 14:52:57 at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341)
> 14:52:57 at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262)
> 14:52:57 at 
> org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:693)
>  14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> 14:52:57 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 14:52:57 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {noformat}
> {noformat}
> 14:54:50 Feb 07, 2019 10:54:50 PM 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> 14:54:50 SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=628, 
> target=localhost:41409} was not shutdown properly!!! ~*~*~*
> 14:54:50 Make sure to call shutdown()/shutdownNow() and wait until 
> awaitTermination() returns true.
> 14:54:50 java.lang.RuntimeException: ManagedChannel allocation site
> 14:54:50 at 
> 

[jira] [Resolved] (BEAM-6661) FnApi gRPC setup/teardown glitch

2020-04-28 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved BEAM-6661.
--
Fix Version/s: 2.22.0
 Assignee: (was: Heejong Lee)
   Resolution: Fixed

> FnApi gRPC setup/teardown glitch
> 
>
> Key: BEAM-6661
> URL: https://issues.apache.org/jira/browse/BEAM-6661
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Affects Versions: 2.11.0
>Reporter: Heejong Lee
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Multiple exceptions are observed during FnApi gRPC setup/teardown. The 
> examples are
> {noformat}
> 14:53:22 [grpc-default-executor-1] WARN 
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService - Logging 
> client failed unexpectedly.
> 14:53:22 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: 
> CANCELLED: cancelled before receiving half close
> 14:53:22 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517)
> 14:53:22 at{noformat}
> {noformat}
>  
> 14:52:56 [main] INFO org.apache.beam.runners.flink.FlinkJobServerDriver - 
> JobService started on localhost:58179
> 14:52:57 [grpc-default-executor-0] ERROR 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - 
> Encountered Unexpected Exception for Invocation 
> job_bfb7df0e-408e-4bfd-bb3c-432e946ca819
> 14:52:57 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: 
> NOT_FOUND
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534)
> 14:52:57 at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341)
> 14:52:57 at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262)
> 14:52:57 at 
> org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:693)
>  14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> 14:52:57 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 14:52:57 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {noformat}
> {noformat}
> 14:54:50 Feb 07, 2019 10:54:50 PM 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> 14:54:50 SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=628, 
> target=localhost:41409} was not shutdown properly!!! ~*~*~*
> 14:54:50 Make sure to call shutdown()/shutdownNow() and wait until 
> awaitTermination() returns true.
> 14:54:50 java.lang.RuntimeException: ManagedChannel allocation site
> 14:54:50 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:103)
> 14:54:50 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:53)
> 14:54:50 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:44)
> 14:54:50 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:410)
> 14:54:50 at 
> org.apache.beam.sdk.fn.channel.ManagedChannelFactory.forDescriptor(ManagedChannelFactory.java:44)
> 14:54:50 at 
> 

[jira] [Work logged] (BEAM-6661) FnApi gRPC setup/teardown glitch

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6661?focusedWorklogId=428357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428357
 ]

ASF GitHub Bot logged work on BEAM-6661:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:38
Start Date: 28/Apr/20 20:38
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11537:
URL: https://github.com/apache/beam/pull/11537#issuecomment-620841780


   Merging, we can follow-up with the nits if we feel like it, since they are 
very minor.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428357)
Time Spent: 50m  (was: 40m)

> FnApi gRPC setup/teardown glitch
> 
>
> Key: BEAM-6661
> URL: https://issues.apache.org/jira/browse/BEAM-6661
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Affects Versions: 2.11.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Multiple exceptions are observed during FnApi gRPC setup/teardown. The 
> examples are
> {noformat}
> 14:53:22 [grpc-default-executor-1] WARN 
> org.apache.beam.runners.fnexecution.logging.GrpcLoggingService - Logging 
> client failed unexpectedly.
> 14:53:22 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusRuntimeException: 
> CANCELLED: cancelled before receiving half close
> 14:53:22 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asRuntimeException(Status.java:517)
> 14:53:22 at{noformat}
> {noformat}
>  
> 14:52:56 [main] INFO org.apache.beam.runners.flink.FlinkJobServerDriver - 
> JobService started on localhost:58179
> 14:52:57 [grpc-default-executor-0] ERROR 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService - 
> Encountered Unexpected Exception for Invocation 
> job_bfb7df0e-408e-4bfd-bb3c-432e946ca819
> 14:52:57 org.apache.beam.vendor.grpc.v1p13p1.io.grpc.StatusException: 
> NOT_FOUND
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Status.asException(Status.java:534)
> 14:52:57 at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getInvocation(InMemoryJobService.java:341)
> 14:52:57 at 
> org.apache.beam.runners.fnexecution.jobsubmission.InMemoryJobService.getStateStream(InMemoryJobService.java:262)
> 14:52:57 at 
> org.apache.beam.model.jobmanagement.v1.JobServiceGrpc$MethodHandlers.invoke(JobServiceGrpc.java:693)
>  14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> 14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  14:52:57 at 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> 14:52:57 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 14:52:57 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {noformat}
> {noformat}
> 14:54:50 Feb 07, 2019 10:54:50 PM 
> org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference
>  cleanQueue
> 14:54:50 SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=628, 
> target=localhost:41409} was not shutdown properly!!! ~*~*~*
> 14:54:50 Make sure to call shutdown()/shutdownNow() and wait until 
> awaitTermination() returns true.
> 14:54:50 java.lang.RuntimeException: ManagedChannel allocation site
> 14:54:50 at 
> 

[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428356
 ]

ASF GitHub Bot logged work on BEAM-8742:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:35
Start Date: 28/Apr/20 20:35
Worklog Time Spent: 10m 
  Work Description: mxm removed a comment on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-620839972


   Run Website_Stage_GCS PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428356)
Time Spent: 3h 10m  (was: 3h)

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428355
 ]

ASF GitHub Bot logged work on BEAM-8742:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:34
Start Date: 28/Apr/20 20:34
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-620839972


   Run Website_Stage_GCS PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428355)
Time Spent: 3h  (was: 2h 50m)

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9801) Setting a timer from a timer callback fails

2020-04-28 Thread Maximilian Michels (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094864#comment-17094864
 ] 

Maximilian Michels commented on BEAM-9801:
--

I'm not sure if it was working at some point and broke due to introducing new 
features. It appears to be untested, so it likely never worked. It was 
relatively easy to fix though, see the PR.

> Setting a timer from a timer callback fails
> ---
>
> Key: BEAM-9801
> URL: https://issues.apache.org/jira/browse/BEAM-9801
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Priority: Critical
> Fix For: 2.21.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Hi,
> I'm trying to set a timer from a timer callback in the Python SDK:
> {code:Python}
> class GenerateLoad(beam.DoFn):
>   timer_spec = userstate.TimerSpec('timer', userstate.TimeDomain.WATERMARK)
>   def process(self, element, timer=beam.DoFn.TimerParam(timer_spec)):
> self.key = element[0]
> timer.set(0)
>   @userstate.on_timer(timer_spec)
>   def process_timer(self, timer=beam.DoFn.TimerParam(timer_spec)):
> timer.set(0)
> {code}
> This yields the following Python stack trace:
> {noformat}
> INFO:apache_beam.utils.subprocess_server:Caused by: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> 4: Traceback (most recent call last):
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 245, in _execute
> INFO:apache_beam.utils.subprocess_server: response = task()
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 302, in 
> INFO:apache_beam.utils.subprocess_server: lambda: 
> self.create_worker().do_instruction(request), request)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 471, in do_instruction
> INFO:apache_beam.utils.subprocess_server: getattr(request, request_type), 
> request.instruction_id)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/sdk_worker.py", line 506, in process_bundle
> INFO:apache_beam.utils.subprocess_server: 
> bundle_processor.process_bundle(instruction_id))
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/bundle_processor.py", line 910, in process_bundle
> INFO:apache_beam.utils.subprocess_server: element.timer_family_id, timer_data)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/operations.py", line 688, in process_timer
> INFO:apache_beam.utils.subprocess_server: timer_data.fire_timestamp)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 990, in process_user_timer
> INFO:apache_beam.utils.subprocess_server: self._reraise_augmented(exn)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 1043, in _reraise_augmented
> INFO:apache_beam.utils.subprocess_server: raise_with_traceback(new_exn)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 988, in process_user_timer
> INFO:apache_beam.utils.subprocess_server: 
> self.do_fn_invoker.invoke_user_timer(timer_spec, key, window, timestamp)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 517, in invoke_user_timer
> INFO:apache_beam.utils.subprocess_server: self.user_state_context, key, 
> window, timestamp))
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/common.py", line 1093, in process_outputs
> INFO:apache_beam.utils.subprocess_server: for result in results:
> INFO:apache_beam.utils.subprocess_server: File 
> "/Users/max/Dev/beam/sdks/python/apache_beam/testing/load_tests/pardo_test.py",
>  line 185, in process_timer
> INFO:apache_beam.utils.subprocess_server: timer.set(0)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/runners/worker/bundle_processor.py", line 589, in set
> INFO:apache_beam.utils.subprocess_server: 
> self._timer_coder_impl.encode_to_stream(timer, self._output_stream, True)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/coders/coder_impl.py", line 651, in encode_to_stream
> INFO:apache_beam.utils.subprocess_server: value.hold_timestamp, out, True)
> INFO:apache_beam.utils.subprocess_server: File 
> "apache_beam/coders/coder_impl.py", line 608, in encode_to_stream
> INFO:apache_beam.utils.subprocess_server: millis = value.micros // 1000
> INFO:apache_beam.utils.subprocess_server:AttributeError: 'NoneType' object 
> has no attribute 'micros' [while running 'GenerateLoad']
> {noformat}
> Looking at the code base, I'm not sure we have tests for timer output 
> timestamps. Am I missing something?



--
This message was sent by Atlassian 

[jira] [Work logged] (BEAM-8742) Add stateful processing to ParDo load test

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8742?focusedWorklogId=428352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428352
 ]

ASF GitHub Bot logged work on BEAM-8742:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:29
Start Date: 28/Apr/20 20:29
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-620837611


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428352)
Time Spent: 2h 50m  (was: 2h 40m)

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9546) Support for batching a schema-aware PCollection and processing as a Dataframe

2020-04-28 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9546 started by Brian Hulette.
---
> Support for batching a schema-aware PCollection and processing as a Dataframe
> -
>
> Key: BEAM-9546
> URL: https://issues.apache.org/jira/browse/BEAM-9546
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8414) Cleanup Python codebase to enable some of the excluded Python lint checks.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8414?focusedWorklogId=428342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428342
 ]

ASF GitHub Bot logged work on BEAM-8414:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:04
Start Date: 28/Apr/20 20:04
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11523:
URL: https://github.com/apache/beam/pull/11523#issuecomment-620825431


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428342)
Time Spent: 4.5h  (was: 4h 20m)

> Cleanup Python  codebase to enable some of the excluded Python lint checks.
> ---
>
> Key: BEAM-8414
> URL: https://issues.apache.org/jira/browse/BEAM-8414
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Stephen O'Kennedy
>Priority: Minor
>  Labels: beginner, easy, easy-fix, easyfix, newbie, starter
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/9725 upgraded lint checker, however  Beam 
> codebase is not fully compliant with some of the checks new linter supports, 
> so we excluded such checks. We would like to have some checks permanently 
> excluded (see discussion on the PR), however we would like to re-enable the 
> following checks:
> consider-using-set-comprehension
> chained-comparison
> consider-using-sys-exit
> To reenable these checks, we should:
> 1) remove them from disabled checks in .pylintrc [1] 
> https://github.com/apache/beam/blob/master/sdks/python/.pylintrc and 
> 2) cleanup the codebase to make it compliant.
> [1] 
> https://github.com/apache/beam/blob/3330069291d8168c56c77acfef84c2566af05ec6/sdks/python/.pylintrc#L81



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9846) Remove references to native Java BQ source and sink

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9846?focusedWorklogId=428341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428341
 ]

ASF GitHub Bot logged work on BEAM-9846:


Author: ASF GitHub Bot
Created on: 28/Apr/20 20:00
Start Date: 28/Apr/20 20:00
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11562:
URL: https://github.com/apache/beam/pull/11562#issuecomment-620823343


   R: @yifanzou 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428341)
Time Spent: 20m  (was: 10m)

> Remove references to native Java BQ source and sink
> ---
>
> Key: BEAM-9846
> URL: https://issues.apache.org/jira/browse/BEAM-9846
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Remove references to enable_custom_bigquery_source and 
> enable_custom_bigquery_sink experiments.
> These experiments have not been used to enable the native Dataflow BQ 
> source/sink for 10+ releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9846) Remove references to native Java BQ source and sink

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9846?focusedWorklogId=428340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428340
 ]

ASF GitHub Bot logged work on BEAM-9846:


Author: ASF GitHub Bot
Created on: 28/Apr/20 19:58
Start Date: 28/Apr/20 19:58
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #11562:
URL: https://github.com/apache/beam/pull/11562


   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-9846) Remove references to native Java BQ source and sink

2020-04-28 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9846:

Status: Open  (was: Triage Needed)

> Remove references to native Java BQ source and sink
> ---
>
> Key: BEAM-9846
> URL: https://issues.apache.org/jira/browse/BEAM-9846
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>
> Remove references to enable_custom_bigquery_source and 
> enable_custom_bigquery_sink experiments.
> These experiments have not been used to enable the native Dataflow BQ 
> source/sink for 10+ releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9846) Remove references to native Java BQ source and sink

2020-04-28 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-9846:
---

 Summary: Remove references to native Java BQ source and sink
 Key: BEAM-9846
 URL: https://issues.apache.org/jira/browse/BEAM-9846
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Reporter: Luke Cwik
Assignee: Luke Cwik


Remove references to enable_custom_bigquery_source and 
enable_custom_bigquery_sink experiments.

These experiments have not been used to enable the native Dataflow BQ 
source/sink for 10+ releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428338
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 28/Apr/20 19:40
Start Date: 28/Apr/20 19:40
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on a change in pull request 
#11548:
URL: https://github.com/apache/beam/pull/11548#discussion_r416874101



##
File path: sdks/java/container/build.gradle
##
@@ -101,16 +84,44 @@ docker {
   project.rootProject["docker-tag"] : project.sdk_version)
   dockerfile project.file("./${dockerfileName}")
   files "./build/"
+  buildArgs(['pull_licenses': 
!project.rootProject.hasProperty(["no-licenses"])])

Review comment:
   In short, users want to create a lightweight images, without adding 
licenses.
   Lightweight images are welcomed for Jenkins test as well, it reduces image 
size by 85MB for Java image. More discussion can be found at 
[here](https://lists.apache.org/thread.html/rff9f05e08de6adf7c39c0f4c59f97ae1a2f3602768480fe9e31e0428%40%3Cdev.beam.apache.org%3E)
   
   Naming suggestion sounds good to me, thank you.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428338)
Time Spent: 24.5h  (was: 24h 20m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 24.5h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8542) Add async write to AWS SNS IO & remove retry logic

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8542?focusedWorklogId=428335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428335
 ]

ASF GitHub Bot logged work on BEAM-8542:


Author: ASF GitHub Bot
Created on: 28/Apr/20 19:17
Start Date: 28/Apr/20 19:17
Worklog Time Spent: 10m 
  Work Description: Akshay-Iyangar commented on pull request #10078:
URL: https://github.com/apache/beam/pull/10078#issuecomment-620803551


   @aromanenko-dev - could you also have a look at this?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428335)
Time Spent: 8h 50m  (was: 8h 40m)

> Add async write to AWS SNS IO & remove retry logic
> --
>
> Key: BEAM-8542
> URL: https://issues.apache.org/jira/browse/BEAM-8542
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Ajo Thomas
>Assignee: Ajo Thomas
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> - While working with SNS IO for one of my work-related projects, I found that 
> the IO uses synchronous publishes during writes. I had a simple mock pipeline 
> where I was reading from a kinesis stream and publishing it to SNS using 
> Beam's SNS IO. For comparison, I also had a lamdba which did the same using 
> asynchronous publishes but was about 5x faster. Changing the SNS IO to use 
> async publishes would improve publish latencies.
>  - SNS IO also has some retry logic which isn't required as SNS clients can 
> handle retries. The retry logic in the SNS client is user-configurable and 
> therefore, an explicit retry logic in SNS IO is not required
> I have a working version of the IO with these changes, will create a PR 
> linking this ticket to it once I get some feedback here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=428332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428332
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 28/Apr/20 19:08
Start Date: 28/Apr/20 19:08
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #11548:
URL: https://github.com/apache/beam/pull/11548#discussion_r416855737



##
File path: sdks/java/container/build.gradle
##
@@ -101,16 +84,44 @@ docker {
   project.rootProject["docker-tag"] : project.sdk_version)
   dockerfile project.file("./${dockerfileName}")
   files "./build/"
+  buildArgs(['pull_licenses': 
!project.rootProject.hasProperty(["no-licenses"])])

Review comment:
   Sounds good. Sorry if this was covered in the discussion that I didn't 
read, but why would we want to check urls but not pull? Feel free to point me 
to the discussion.
   
   Naming suggestion:
   licenses-in-docker-images=add
   licenses-in-docker-images=skip
   licenses-in-docker-images=check_urls





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428332)
Time Spent: 24h 20m  (was: 24h 10m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 24h 20m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9720) Add custom AWS Http Client Configuration capability for AWS client 1.0/2.0

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9720?focusedWorklogId=428330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428330
 ]

ASF GitHub Bot logged work on BEAM-9720:


Author: ASF GitHub Bot
Created on: 28/Apr/20 19:05
Start Date: 28/Apr/20 19:05
Worklog Time Spent: 10m 
  Work Description: Akshay-Iyangar commented on a change in pull request 
#11341:
URL: https://github.com/apache/beam/pull/11341#discussion_r416853789



##
File path: 
sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsOptions.java
##
@@ -103,6 +103,34 @@ public AWSCredentialsProvider create(PipelineOptions 
options) {
 
   void setClientConfiguration(ClientConfiguration clientConfiguration);

Review comment:
   Yes, you are right doesn't bring in any breaking change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428330)
Time Spent: 3.5h  (was: 3h 20m)

> Add custom AWS Http Client Configuration capability for AWS client 1.0/2.0
> --
>
> Key: BEAM-9720
> URL: https://issues.apache.org/jira/browse/BEAM-9720
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Akshay Iyangar
>Assignee: Akshay Iyangar
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently, there is no way to set custom client configuration abilities to 
> AWS client service.
> Enable a way to pass these custom client configuration options as pipeline 
> options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=428324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428324
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 28/Apr/20 18:53
Start Date: 28/Apr/20 18:53
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11507:
URL: https://github.com/apache/beam/pull/11507#issuecomment-620791900







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428324)
Time Spent: 22h 50m  (was: 22h 40m)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 22h 50m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9802) Provide a way to customize automatically started services.

2020-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9802?focusedWorklogId=428323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-428323
 ]

ASF GitHub Bot logged work on BEAM-9802:


Author: ASF GitHub Bot
Created on: 28/Apr/20 18:53
Start Date: 28/Apr/20 18:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11495:
URL: https://github.com/apache/beam/pull/11495#issuecomment-620792019


   Thanks looking.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 428323)
Time Spent: 40m  (was: 0.5h)

> Provide a way to customize automatically started services.
> --
>
> Key: BEAM-9802
> URL: https://issues.apache.org/jira/browse/BEAM-9802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This can be useful for testing and alternative production environments. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >