from:"Maximilian Michels $Jira$"

[jira] [Commented] (BEAM-11754) KafkaIO.Write EOS documentation outdated

2021-05-03 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-11754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338289#comment-17338289
 ] 

Maximilian Michels commented on BEAM-11754:
---

We still need to update the docs. PR: https://github.com/apache/beam/pull/14705

> KafkaIO.Write EOS documentation outdated
> 
>
> Key: BEAM-11754
> URL: https://issues.apache.org/jira/browse/BEAM-11754
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.27.0
>Reporter: Marek Pikulski
>Assignee: Maximilian Michels
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Documentation states that exactly-once semantics (EOS) is not supported on 
> Flink:
> [https://beam.apache.org/releases/javadoc/2.27.0/org/apache/beam/sdk/io/kafka/KafkaIO.WriteRecords.html#withEOS-int-java.lang.String-]
> Meanwhile, the feature appears to have been implemented in:
> [https://github.com/apache/beam/pull/7991/files]
> This is very confusing for new/potential users, so worthwhile fixing IMHO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8218) Implement Apache PulsarIO

2021-01-27 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned BEAM-8218:


Assignee: Maximilian Michels

> Implement Apache PulsarIO
> -
>
> Key: BEAM-8218
> URL: https://issues.apache.org/jira/browse/BEAM-8218
> Project: Beam
>  Issue Type: Task
>  Components: io-ideas
>Reporter: Alex Van Boxel
>Assignee: Maximilian Michels
>Priority: P3
>
> Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO 
> could be beneficial.
> [https://pulsar.apache.org/|https://pulsar.apache.org/en/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8218) Implement Apache PulsarIO

2021-01-27 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272801#comment-17272801
 ] 

Maximilian Michels commented on BEAM-8218:
--

I don't think so. It would be good to have at least a basic integration with 
Pulsar. I've drafted something for this a while ago, let me see if I can bring 
it up in a good enough shape. Assigning this to myself for now.

> Implement Apache PulsarIO
> -
>
> Key: BEAM-8218
> URL: https://issues.apache.org/jira/browse/BEAM-8218
> Project: Beam
>  Issue Type: Task
>  Components: io-ideas
>Reporter: Alex Van Boxel
>Priority: P3
>
> Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO 
> could be beneficial.
> [https://pulsar.apache.org/|https://pulsar.apache.org/en/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-11419) Add flink 1.12 build target

2020-12-09 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-11419:
--
Fix Version/s: 2.27.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Add flink 1.12 build target
> ---
>
> Key: BEAM-11419
> URL: https://issues.apache.org/jira/browse/BEAM-11419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: David Morávek
>Assignee: David Morávek
>Priority: P2
> Fix For: 2.27.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add support for flink 1.12



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-11251) Unfair lock/monitor in UnboundedSourceWrapper can starve checkpointing

2020-11-13 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-11251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231282#comment-17231282
 ] 

Maximilian Michels commented on BEAM-11251:
---

In the meantime, do we have a Flink issue for changing the lock to a fair one?

> Unfair lock/monitor in UnboundedSourceWrapper can starve checkpointing
> --
>
> Key: BEAM-11251
> URL: https://issues.apache.org/jira/browse/BEAM-11251
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: P2
> Attachments: 
> 0001-BEAM-11251-Don-t-chain-sources-to-avoid-checkpoint-s.patch
>
>
> From my email on this Flink ML thread: 
> https://lists.apache.org/thread.html/rca7eb85183ac17be478b4d6c59c5ca348077689e5ba4a42c87b71bf4%40%3Cuser.flink.apache.org%3E:
> We need the synchronized block in the source because the call to 
> {{reader.advance()}} (via the invoker) and {{reader.getCurrent()}} (via 
> {{emitElement()}}) must be atomic with respect to state. We cannot advance 
> the reader state, not emit that record but still checkpoint the new reader 
> state. The monitor ensures that no checkpoint can happen in between those to 
> calls.
> The basic problem is now that we can starve checkpointing because the 
> monitor/lock is not fair. This could be solved by using a fair lock but that 
> would require Flink proper to be changed to use a fair lock instead of a 
> monitor/synchronized. I don't see this as an immediate solution.
> One thing that exacerbates this problem is that too many things are happening 
> "under" the synchronized block. All the transforms before a 
> shuffle/rebalance/keyBy are chained to the source, which means that they are 
> invoked from the {{emitElement()}} call.
> A possible mitigation would be to disable chaining globally by inserting a 
> {{flinkStreamEnv.disableOperatorChaining()}} in [1].
> A more surgical version would be to only disable chaining for sources but 
> this can also have an impact on performance since without chaining we 
> potentially have more serialization between tasks/operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-11251) Unfair lock/monitor in UnboundedSourceWrapper can starve checkpointing

2020-11-13 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-11251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231280#comment-17231280
 ] 

Maximilian Michels commented on BEAM-11251:
---

It will definitely increase the serialization costs. Whether that affects 
performance depends on your streaming job.

> Unfair lock/monitor in UnboundedSourceWrapper can starve checkpointing
> --
>
> Key: BEAM-11251
> URL: https://issues.apache.org/jira/browse/BEAM-11251
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: P2
> Attachments: 
> 0001-BEAM-11251-Don-t-chain-sources-to-avoid-checkpoint-s.patch
>
>
> From my email on this Flink ML thread: 
> https://lists.apache.org/thread.html/rca7eb85183ac17be478b4d6c59c5ca348077689e5ba4a42c87b71bf4%40%3Cuser.flink.apache.org%3E:
> We need the synchronized block in the source because the call to 
> {{reader.advance()}} (via the invoker) and {{reader.getCurrent()}} (via 
> {{emitElement()}}) must be atomic with respect to state. We cannot advance 
> the reader state, not emit that record but still checkpoint the new reader 
> state. The monitor ensures that no checkpoint can happen in between those to 
> calls.
> The basic problem is now that we can starve checkpointing because the 
> monitor/lock is not fair. This could be solved by using a fair lock but that 
> would require Flink proper to be changed to use a fair lock instead of a 
> monitor/synchronized. I don't see this as an immediate solution.
> One thing that exacerbates this problem is that too many things are happening 
> "under" the synchronized block. All the transforms before a 
> shuffle/rebalance/keyBy are chained to the source, which means that they are 
> invoked from the {{emitElement()}} call.
> A possible mitigation would be to disable chaining globally by inserting a 
> {{flinkStreamEnv.disableOperatorChaining()}} in [1].
> A more surgical version would be to only disable chaining for sources but 
> this can also have an impact on performance since without chaining we 
> potentially have more serialization between tasks/operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-11251) Unfair lock/monitor in UnboundedSourceWrapper can starve checkpointing

2020-11-12 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-11251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230993#comment-17230993
 ] 

Maximilian Michels commented on BEAM-11251:
---

Good observation! I'd say disabling chaining for sources is a fair enough 
solution. Often times you have a shuffle after reading from the source, so it 
won't always hurt.

> Unfair lock/monitor in UnboundedSourceWrapper can starve checkpointing
> --
>
> Key: BEAM-11251
> URL: https://issues.apache.org/jira/browse/BEAM-11251
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: P2
> Attachments: 
> 0001-BEAM-11251-Don-t-chain-sources-to-avoid-checkpoint-s.patch
>
>
> From my email on this Flink ML thread: 
> https://lists.apache.org/thread.html/rca7eb85183ac17be478b4d6c59c5ca348077689e5ba4a42c87b71bf4%40%3Cuser.flink.apache.org%3E:
> We need the synchronized block in the source because the call to 
> {{reader.advance()}} (via the invoker) and {{reader.getCurrent()}} (via 
> {{emitElement()}}) must be atomic with respect to state. We cannot advance 
> the reader state, not emit that record but still checkpoint the new reader 
> state. The monitor ensures that no checkpoint can happen in between those to 
> calls.
> The basic problem is now that we can starve checkpointing because the 
> monitor/lock is not fair. This could be solved by using a fair lock but that 
> would require Flink proper to be changed to use a fair lock instead of a 
> monitor/synchronized. I don't see this as an immediate solution.
> One thing that exacerbates this problem is that too many things are happening 
> "under" the synchronized block. All the transforms before a 
> shuffle/rebalance/keyBy are chained to the source, which means that they are 
> invoked from the {{emitElement()}} call.
> A possible mitigation would be to disable chaining globally by inserting a 
> {{flinkStreamEnv.disableOperatorChaining()}} in [1].
> A more surgical version would be to only disable chaining for sources but 
> this can also have an impact on performance since without chaining we 
> potentially have more serialization between tasks/operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-11210) ExecutableStageDoFnOperator should fire processing time timers properly instead of draining it directly when close() happens

2020-11-11 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229888#comment-17229888
 ] 

Maximilian Michels commented on BEAM-11210:
---

Processing timers have to be manually drained after the operator shuts down and 
{{close()}} is called. This does not happen if checkpointing is enabled or the 
\{{shutdownSourcesAfterIdleMs}} is used. Context: 
https://github.com/apache/beam/pull/13105#discussion_r521252916

> ExecutableStageDoFnOperator should fire processing time timers properly 
> instead of draining it directly when close() happens
> 
>
> Key: BEAM-11210
> URL: https://issues.apache.org/jira/browse/BEAM-11210
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Boyuan Zhang
>Priority: P2
>
> We hit this case when processing time timer holds the watermark but at the 
> same time we are closing the operator.
> One typical case is when SDF sets timer for rescheduling residuals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9855) Make it easier to configure a Flink state backend

2020-11-10 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9855:
-
Fix Version/s: 2.26.0

> Make it easier to configure a Flink state backend
> -
>
> Key: BEAM-9855
> URL: https://issues.apache.org/jira/browse/BEAM-9855
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P3
> Fix For: 2.26.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We should make it easier to configure a Flink state backend. At the moment, 
> users have to either (1) configure the default state backend in their Flink 
> cluster, or make sure (2a) they include the dependency in their Gradle/Maven 
> project (e.g. 
> {{"org.apache.flink:flink-statebackend-rocksdb_2.11:$flink_version"}} for 
> RocksDB) (2b) set the state backend factory in the {{FlinkPipelineOptions}.
> The drawback of option (2) is that it only works in Java due to the factory 
> specification being in Java.
> We can make it easier by simple adding pipeline options for the state backend 
> name and the checkpoint directory which will be enough for configuring the 
> state backend. We can add the RocksDB state backend as a default dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9855) Make it easier to configure a Flink state backend

2020-11-10 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9855:
-
Status: Resolved  (was: Open)

> Make it easier to configure a Flink state backend
> -
>
> Key: BEAM-9855
> URL: https://issues.apache.org/jira/browse/BEAM-9855
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P3
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We should make it easier to configure a Flink state backend. At the moment, 
> users have to either (1) configure the default state backend in their Flink 
> cluster, or make sure (2a) they include the dependency in their Gradle/Maven 
> project (e.g. 
> {{"org.apache.flink:flink-statebackend-rocksdb_2.11:$flink_version"}} for 
> RocksDB) (2b) set the state backend factory in the {{FlinkPipelineOptions}.
> The drawback of option (2) is that it only works in Java due to the factory 
> specification being in Java.
> We can make it easier by simple adding pipeline options for the state backend 
> name and the checkpoint directory which will be enough for configuring the 
> state backend. We can add the RocksDB state backend as a default dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9855) Make it easier to configure a Flink state backend

2020-10-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned BEAM-9855:


Assignee: Maximilian Michels

> Make it easier to configure a Flink state backend
> -
>
> Key: BEAM-9855
> URL: https://issues.apache.org/jira/browse/BEAM-9855
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P3
>
> We should make it easier to configure a Flink state backend. At the moment, 
> users have to either (1) configure the default state backend in their Flink 
> cluster, or make sure (2a) they include the dependency in their Gradle/Maven 
> project (e.g. 
> {{"org.apache.flink:flink-statebackend-rocksdb_2.11:$flink_version"}} for 
> RocksDB) (2b) set the state backend factory in the {{FlinkPipelineOptions}.
> The drawback of option (2) is that it only works in Java due to the factory 
> specification being in Java.
> We can make it easier by simple adding pipeline options for the state backend 
> name and the checkpoint directory which will be enough for configuring the 
> state backend. We can add the RocksDB state backend as a default dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9631) Flink 1.10 test execution is broken due to premature test cluster shutdown

2020-10-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9631:
-
Status: Resolved  (was: Open)

> Flink 1.10 test execution is broken due to premature test cluster shutdown
> --
>
> Key: BEAM-9631
> URL: https://issues.apache.org/jira/browse/BEAM-9631
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: P1
>
> Due to a race condition with the test cluster shutdown code, tests may fail 
> because the Flink job result cannot be retrieved when the cluster already has 
> been shut down. This is a Flink 1.10.0 bug which is addressed upstream via 
> FLINK-16705. 
> If this doesn't get addressed upstream, we may also be able to work around 
> this. 
> {noformat}
> java.lang.RuntimeException: Pipeline execution failed
>   at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:115)
>   at 
> org.apache.beam.runners.flink.TestFlinkRunner.run(TestFlinkRunner.java:61)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInput(ViewTest.java:543)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorke

[jira] [Updated] (BEAM-9631) Flink 1.10 test execution is broken due to premature test cluster shutdown

2020-10-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9631:
-
Fix Version/s: 2.21.0

> Flink 1.10 test execution is broken due to premature test cluster shutdown
> --
>
> Key: BEAM-9631
> URL: https://issues.apache.org/jira/browse/BEAM-9631
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: P1
> Fix For: 2.21.0
>
>
> Due to a race condition with the test cluster shutdown code, tests may fail 
> because the Flink job result cannot be retrieved when the cluster already has 
> been shut down. This is a Flink 1.10.0 bug which is addressed upstream via 
> FLINK-16705. 
> If this doesn't get addressed upstream, we may also be able to work around 
> this. 
> {noformat}
> java.lang.RuntimeException: Pipeline execution failed
>   at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:115)
>   at 
> org.apache.beam.runners.flink.TestFlinkRunner.run(TestFlinkRunner.java:61)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInput(ViewTest.java:543)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.pr

[jira] [Updated] (BEAM-9631) Flink 1.10 test execution is broken due to premature test cluster shutdown

2020-10-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9631:
-
Status: Open  (was: Triage Needed)

> Flink 1.10 test execution is broken due to premature test cluster shutdown
> --
>
> Key: BEAM-9631
> URL: https://issues.apache.org/jira/browse/BEAM-9631
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: P1
>
> Due to a race condition with the test cluster shutdown code, tests may fail 
> because the Flink job result cannot be retrieved when the cluster already has 
> been shut down. This is a Flink 1.10.0 bug which is addressed upstream via 
> FLINK-16705. 
> If this doesn't get addressed upstream, we may also be able to work around 
> this. 
> {noformat}
> java.lang.RuntimeException: Pipeline execution failed
>   at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:115)
>   at 
> org.apache.beam.runners.flink.TestFlinkRunner.run(TestFlinkRunner.java:61)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInput(ViewTest.java:543)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(Test

[jira] [Updated] (BEAM-9631) Flink 1.10 test execution is broken due to premature test cluster shutdown

2020-10-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9631:
-
Status: Resolved  (was: Open)

> Flink 1.10 test execution is broken due to premature test cluster shutdown
> --
>
> Key: BEAM-9631
> URL: https://issues.apache.org/jira/browse/BEAM-9631
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: P1
>
> Due to a race condition with the test cluster shutdown code, tests may fail 
> because the Flink job result cannot be retrieved when the cluster already has 
> been shut down. This is a Flink 1.10.0 bug which is addressed upstream via 
> FLINK-16705. 
> If this doesn't get addressed upstream, we may also be able to work around 
> this. 
> {noformat}
> java.lang.RuntimeException: Pipeline execution failed
>   at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:115)
>   at 
> org.apache.beam.runners.flink.TestFlinkRunner.run(TestFlinkRunner.java:61)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInput(ViewTest.java:543)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorke

[jira] [Reopened] (BEAM-9631) Flink 1.10 test execution is broken due to premature test cluster shutdown

2020-10-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reopened BEAM-9631:
--

> Flink 1.10 test execution is broken due to premature test cluster shutdown
> --
>
> Key: BEAM-9631
> URL: https://issues.apache.org/jira/browse/BEAM-9631
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Priority: P1
>
> Due to a race condition with the test cluster shutdown code, tests may fail 
> because the Flink job result cannot be retrieved when the cluster already has 
> been shut down. This is a Flink 1.10.0 bug which is addressed upstream via 
> FLINK-16705. 
> If this doesn't get addressed upstream, we may also be able to work around 
> this. 
> {noformat}
> java.lang.RuntimeException: Pipeline execution failed
>   at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:115)
>   at 
> org.apache.beam.runners.flink.TestFlinkRunner.run(TestFlinkRunner.java:61)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.sdk.transforms.ViewTest.testMultimapSideInput(ViewTest.java:543)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
>   at sun.refle

[jira] [Updated] (BEAM-10940) Portable Flink runner should handle DelayedBundleApplication from ProcessBundleResponse.

2020-10-06 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10940:
--
Status: Open  (was: Triage Needed)

> Portable Flink runner should handle DelayedBundleApplication from 
> ProcessBundleResponse.
> 
>
> Key: BEAM-10940
> URL: https://issues.apache.org/jira/browse/BEAM-10940
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Boyuan Zhang
>Priority: P2
>
> SDF can produce residuals by self-checkpoint, which will be returned to 
> runner by ProcessBundleResponse.DelayedBundleApplication. The portable runner 
> should be able to handle the DelayedBundleApplication and reschedule it based 
> on the timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-10848) Gauge metrics error when setting timers

2020-10-05 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned BEAM-10848:
-

Assignee: maghamravikiran

> Gauge metrics error when setting timers
> ---
>
> Key: BEAM-10848
> URL: https://issues.apache.org/jira/browse/BEAM-10848
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: maghamravikiran
>Priority: P2
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Gauges are affected by setting timers leading to {{None}} values:
> {noformat}
> ERROR:apache_beam.runners.worker.sdk_worker:Error processing instruction 147. 
> Original traceback is
> Traceback (most recent call last):
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in _execute
> response = task()
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 310, in 
> lambda: self.create_worker().do_instruction(request), request)
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 480, in do_instruction
> getattr(request, request_type), request.instruction_id)
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 516, in process_bundle
> monitoring_infos = bundle_processor.monitoring_infos()
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 1107, in monitoring_infos
> op.monitoring_infos(transform_id, dict(tag_to_pcollection_id)))
>   File "apache_beam/runners/worker/operations.py", line 340, in 
> apache_beam.runners.worker.operations.Operation.monitoring_infos
>   File "apache_beam/runners/worker/operations.py", line 347, in 
> apache_beam.runners.worker.operations.Operation.monitoring_infos
>   File "apache_beam/runners/worker/operations.py", line 386, in 
> apache_beam.runners.worker.operations.Operation.user_monitoring_infos
>   File "apache_beam/metrics/execution.py", line 261, in 
> apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
>   File "apache_beam/metrics/cells.py", line 222, in 
> apache_beam.metrics.cells.GaugeCell.to_runner_api_monitoring_info
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/metrics/monitoring_infos.py",
>  line 222, in int64_user_gauge
> payload = _encode_gauge(coder, timestamp, value)
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/metrics/monitoring_infos.py",
>  line 397, in _encode_gauge
> coder.get_impl().encode_to_stream(value, stream, True)
>   File "apache_beam/coders/coder_impl.py", line 690, in 
> apache_beam.coders.coder_impl.VarIntCoderImpl.encode_to_stream
>   File "apache_beam/coders/coder_impl.py", line 692, in 
> apache_beam.coders.coder_impl.VarIntCoderImpl.encode_to_stream
> TypeError: an integer is required
> {noformat}
> The transform has the following structure and errors when the lines following 
> {{TODO}} have been uncommented:
> {code:python}
> class StatefulOperation(beam.DoFn):
>   def __init__(self, state_size_per_key_bytes, use_processing_timer=False):
> self.state_size_per_key_bytes = state_size_per_key_bytes
> self.str_coder = StrUtf8Coder().get_impl()
> self.bytes_gauge = Metrics.gauge('synthetic', 'state_bytes')
> self.elements_gauge = Metrics.gauge('synthetic', 'state_elements')
> self.use_processing_timer = use_processing_timer
>   state_spec = userstate.BagStateSpec('state', StrUtf8Coder())
>   state_spec2 = userstate.CombiningValueStateSpec('state_size_bytes', 
> combine_fn=sum)
>   state_spec3 = userstate.CombiningValueStateSpec('num_state_entries', 
> combine_fn=sum)
>   event_timer_spec = userstate.TimerSpec('event_timer', 
> beam.TimeDomain.WATERMARK)
>   processing_timer_spec = userstate.TimerSpec('proc_timer', 
> beam.TimeDomain.REAL_TIME)
>   def process(self,
>   element,
>   timestamp=beam.DoFn.TimestampParam,
>   state=beam.DoFn.StateParam(state_spec),
>   state_num_bytes=beam.DoFn.StateParam(state_spec2),
>   state_num_entries=beam.DoFn.StateParam(state_spec3),
>   event_timer=beam.DoFn.TimerParam(event_timer_spec),
>   processing_timer=beam.DoFn.TimerParam(processing_timer_spec)):
> # Append stringified element to state until the

[jira] [Updated] (BEAM-10527) Python2_PVR_Flink precommit should publish test results to Jenkins

2020-10-01 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10527:
--
Fix Version/s: Not applicable

> Python2_PVR_Flink precommit should publish test results to Jenkins
> --
>
> Key: BEAM-10527
> URL: https://issues.apache.org/jira/browse/BEAM-10527
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Right now we only have the logs, which often require scrolling up to see the 
> failure (which itself often requires curl'ing the logs because they are too 
> large for a browser to load comfortably). This causes frequent 
> misunderstandings. For example, folks often mistake errors printed by 
> pipelines that are meant to fail (e.g. test_error_message_includes_stage) for 
> actual test failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10527) Python2_PVR_Flink precommit should publish test results to Jenkins

2020-10-01 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10527:
--
Status: Resolved  (was: Open)

> Python2_PVR_Flink precommit should publish test results to Jenkins
> --
>
> Key: BEAM-10527
> URL: https://issues.apache.org/jira/browse/BEAM-10527
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Right now we only have the logs, which often require scrolling up to see the 
> failure (which itself often requires curl'ing the logs because they are too 
> large for a browser to load comfortably). This causes frequent 
> misunderstandings. For example, folks often mistake errors printed by 
> pipelines that are meant to fail (e.g. test_error_message_includes_stage) for 
> actual test failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10602) Display Python streaming metrics in Grafana dashboard

2020-09-30 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10602:
--
Status: Resolved  (was: Open)

> Display Python streaming metrics in Grafana dashboard
> -
>
> Key: BEAM-10602
> URL: https://issues.apache.org/jira/browse/BEAM-10602
> Project: Beam
>  Issue Type: Task
>  Components: build-system, testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Labels: stale-assigned
> Attachments: screenshot-1.png
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> The Grafana dashboard is currently missing Python load test metrics: 
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10602) Display Python streaming metrics in Grafana dashboard

2020-09-30 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10602:
--
Fix Version/s: Not applicable

> Display Python streaming metrics in Grafana dashboard
> -
>
> Key: BEAM-10602
> URL: https://issues.apache.org/jira/browse/BEAM-10602
> Project: Beam
>  Issue Type: Task
>  Components: build-system, testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Labels: stale-assigned
> Fix For: Not applicable
>
> Attachments: screenshot-1.png
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> The Grafana dashboard is currently missing Python load test metrics: 
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10671) Add environment configuration fields as first-class pipeline options

2020-09-30 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204746#comment-17204746
 ] 

Maximilian Michels commented on BEAM-10671:
---

Implemented in Python with backwards-compatibility via 
[https://github.com/apache/beam/pull/12576.] Still needs Java and Go support.

> Add environment configuration fields as first-class pipeline options
> 
>
> Key: BEAM-10671
> URL: https://issues.apache.org/jira/browse/BEAM-10671
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> The pipeline option --environment_config has completely different usages 
> depending on the value of --environment_type. This is confusing for the user 
> and hard to check. Additionally, --environment_config is a JSON blob for 
> --environment_type=PROCESS. This JSON blob is a pain to escape and pass 
> around compared to a collection of flat strings.
> We should replace --environment_config with first-class / top-level pipeline 
> options for each environment type:
> DOCKER
> --environment_container_image
> PROCESS
> --environment_os
> --environment_architecture
> --environment_variables
> EXTERNAL
> --environment_service_address
> LOOPBACK
> (none)
> This way we can validate that the user is configuring these options correctly 
> (ie give a warning or error if they use options that do not apply to their 
> chosen --environment_type).
> We can deprecate the --environment_config option, logging a warning until 
> removing this option altogether in a future Beam release.
> [https://beam.apache.org/documentation/runtime/sdk-harness-config/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10974) Flink PortableValidatesRunner test failure: GroupByKeyTest$BasicTests.testLargeKeys10MB

2020-09-28 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203064#comment-17203064
 ] 

Maximilian Michels commented on BEAM-10974:
---

This one can be fixed by adjusting Flink's managed memory size: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#taskmanager-memory-managed-size|https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#taskmanager-memory-managed-siz]

In recent releases, this was changed to be configured as a fraction of the 
total memory by default. This is why we see the regression.

> Flink PortableValidatesRunner test failure: 
> GroupByKeyTest$BasicTests.testLargeKeys10MB
> ---
>
> Key: BEAM-10974
> URL: https://issues.apache.org/jira/browse/BEAM-10974
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Robin Qiu
>Assignee: Kyle Weaver
>Priority: P0
> Fix For: 2.25.0
>
>
> h3. Error Message
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution: java.io.IOException: Cannot write record to fresh sort buffer. 
> Record too large.
> h3. Stacktrace
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution: java.io.IOException: Cannot write record to fresh sort buffer. 
> Record too large. at 
> org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
>  at 
> org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
>  at 
> org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317) at 
> org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) at 
> org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) at 
> org.apache.beam.sdk.transforms.GroupByKeyTest.runLargeKeysTest(GroupByKeyTest.java:741)
>  at 
> org.apache.beam.sdk.transforms.GroupByKeyTest.access$200(GroupByKeyTest.java:92)
>  at 
> org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testLargeKeys10MB(GroupByKeyTest.java:473)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319) at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
>  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:412) at 
> org.junit.runners.Suite.runChild(Suite.java:128) at 
> org.junit.runners.Suite.runChild(Suite.java:27) at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:412) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecuto

[jira] [Commented] (BEAM-10760) Cleanup timers lead to unbounded state accumulation in global window

2020-09-16 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196842#comment-17196842
 ] 

Maximilian Michels commented on BEAM-10760:
---

Certainly possible but the release candidate #3 for 2.24.0 is already underway. 
I'm afraid it's too late for 2.24.0.

> Cleanup timers lead to unbounded state accumulation in global window
> 
>
> Key: BEAM-10760
> URL: https://issues.apache.org/jira/browse/BEAM-10760
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-flink
>Affects Versions: 2.21.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: P2
> Fix For: 2.25.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> For each key, the runner sets a cleanup timer that is designed to garbage 
> collect state at the end of a window. For a global window, these timers will 
> stay around until the pipeline terminates. Depending on the key cardinality, 
> this can lead to unbounded state growth, which in the case of the Flink 
> runner is observable in the growth of checkpoint size.
> https://lists.apache.org/thread.html/rae268806035688b77646195505e5b7a56568a38feb1e52d6341feedd%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9456) Upgrade to gradle 6.6.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9456:
-
Fix Version/s: (was: 2.24.0)
   2.25.0

> Upgrade to gradle 6.6.1
> ---
>
> Key: BEAM-9456
> URL: https://issues.apache.org/jira/browse/BEAM-9456
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Alex Van Boxel
>Assignee: Kamil Gałuszka
>Priority: P2
> Fix For: 2.25.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9456) Upgrade to gradle 6.6.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9456:
-
Status: Resolved  (was: Open)

> Upgrade to gradle 6.6.1
> ---
>
> Key: BEAM-9456
> URL: https://issues.apache.org/jira/browse/BEAM-9456
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Alex Van Boxel
>Assignee: Kamil Gałuszka
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9456) Upgrade to gradle 6.6.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9456:
-
Fix Version/s: 2.24.0

> Upgrade to gradle 6.6.1
> ---
>
> Key: BEAM-9456
> URL: https://issues.apache.org/jira/browse/BEAM-9456
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Alex Van Boxel
>Assignee: Kamil Gałuszka
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9456) Upgrade to gradle 6.6.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9456:
-
Summary: Upgrade to gradle 6.6.1  (was: Upgrade to gradle 6.6.0)

> Upgrade to gradle 6.6.1
> ---
>
> Key: BEAM-9456
> URL: https://issues.apache.org/jira/browse/BEAM-9456
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Alex Van Boxel
>Assignee: Kamil Gałuszka
>Priority: P2
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9456) Upgrade to gradle 6.6.1

2020-09-11 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194090#comment-17194090
 ] 

Maximilian Michels commented on BEAM-9456:
--

Upgrade to 6.6.1 has been performed here: 
https://github.com/apache/beam/pull/12776

> Upgrade to gradle 6.6.1
> ---
>
> Key: BEAM-9456
> URL: https://issues.apache.org/jira/browse/BEAM-9456
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Alex Van Boxel
>Assignee: Kamil Gałuszka
>Priority: P2
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10819) Update Gradle to 6.1.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10819:
--
Status: Open  (was: Triage Needed)

> Update Gradle to 6.1.1
> --
>
> Key: BEAM-10819
> URL: https://issues.apache.org/jira/browse/BEAM-10819
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Adam Newman
>Assignee: Adam Newman
>Priority: P2
>  Labels: gradle, gradle-wrapper
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update Gradle build wrapper to Gradle 6.1.1.
> Update build plugins to fix incompatibility:
> org.nosphere.apache.rat:0.4.0 -> 0.5.2
> com.diffplug.gradle.spotless:3.24.0 -> 3.24.2
> net.ltgt.gradle:gradle-apt-plugin:0.20 -> 0.21
> com.google.protobuf:protobuf-gradle-plugin:0.8.5 -> 0.8.8
> Replace 
> [deprecated|[https://docs.gradle.com/enterprise/gradle-plugin/#gradle_6_x_and_later]]
>  build scan with new version:
> id 'com.gradle.build-scan' version '2.3'
> moved from build.gradle to settings.gradle and switch to new plugin
> id "com.gradle.enterprise" version "3.4.1"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10819) Update Gradle to 6.1.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10819:
--
Status: Resolved  (was: Resolved)

> Update Gradle to 6.1.1
> --
>
> Key: BEAM-10819
> URL: https://issues.apache.org/jira/browse/BEAM-10819
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Adam Newman
>Assignee: Adam Newman
>Priority: P2
>  Labels: gradle, gradle-wrapper
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update Gradle build wrapper to Gradle 6.1.1.
> Update build plugins to fix incompatibility:
> org.nosphere.apache.rat:0.4.0 -> 0.5.2
> com.diffplug.gradle.spotless:3.24.0 -> 3.24.2
> net.ltgt.gradle:gradle-apt-plugin:0.20 -> 0.21
> com.google.protobuf:protobuf-gradle-plugin:0.8.5 -> 0.8.8
> Replace 
> [deprecated|[https://docs.gradle.com/enterprise/gradle-plugin/#gradle_6_x_and_later]]
>  build scan with new version:
> id 'com.gradle.build-scan' version '2.3'
> moved from build.gradle to settings.gradle and switch to new plugin
> id "com.gradle.enterprise" version "3.4.1"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10819) Update Gradle to 6.1.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10819:
--
Status: Resolved  (was: Open)

> Update Gradle to 6.1.1
> --
>
> Key: BEAM-10819
> URL: https://issues.apache.org/jira/browse/BEAM-10819
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Adam Newman
>Assignee: Adam Newman
>Priority: P2
>  Labels: gradle, gradle-wrapper
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update Gradle build wrapper to Gradle 6.1.1.
> Update build plugins to fix incompatibility:
> org.nosphere.apache.rat:0.4.0 -> 0.5.2
> com.diffplug.gradle.spotless:3.24.0 -> 3.24.2
> net.ltgt.gradle:gradle-apt-plugin:0.20 -> 0.21
> com.google.protobuf:protobuf-gradle-plugin:0.8.5 -> 0.8.8
> Replace 
> [deprecated|[https://docs.gradle.com/enterprise/gradle-plugin/#gradle_6_x_and_later]]
>  build scan with new version:
> id 'com.gradle.build-scan' version '2.3'
> moved from build.gradle to settings.gradle and switch to new plugin
> id "com.gradle.enterprise" version "3.4.1"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (BEAM-10819) Update Gradle to 6.1.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reopened BEAM-10819:
---

> Update Gradle to 6.1.1
> --
>
> Key: BEAM-10819
> URL: https://issues.apache.org/jira/browse/BEAM-10819
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Adam Newman
>Assignee: Adam Newman
>Priority: P2
>  Labels: gradle, gradle-wrapper
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update Gradle build wrapper to Gradle 6.1.1.
> Update build plugins to fix incompatibility:
> org.nosphere.apache.rat:0.4.0 -> 0.5.2
> com.diffplug.gradle.spotless:3.24.0 -> 3.24.2
> net.ltgt.gradle:gradle-apt-plugin:0.20 -> 0.21
> com.google.protobuf:protobuf-gradle-plugin:0.8.5 -> 0.8.8
> Replace 
> [deprecated|[https://docs.gradle.com/enterprise/gradle-plugin/#gradle_6_x_and_later]]
>  build scan with new version:
> id 'com.gradle.build-scan' version '2.3'
> moved from build.gradle to settings.gradle and switch to new plugin
> id "com.gradle.enterprise" version "3.4.1"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10819) Update Gradle to 6.1.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10819:
--
Fix Version/s: (was: 2.24.0)
   Not applicable

> Update Gradle to 6.1.1
> --
>
> Key: BEAM-10819
> URL: https://issues.apache.org/jira/browse/BEAM-10819
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Adam Newman
>Assignee: Adam Newman
>Priority: P2
>  Labels: gradle, gradle-wrapper
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update Gradle build wrapper to Gradle 6.1.1.
> Update build plugins to fix incompatibility:
> org.nosphere.apache.rat:0.4.0 -> 0.5.2
> com.diffplug.gradle.spotless:3.24.0 -> 3.24.2
> net.ltgt.gradle:gradle-apt-plugin:0.20 -> 0.21
> com.google.protobuf:protobuf-gradle-plugin:0.8.5 -> 0.8.8
> Replace 
> [deprecated|[https://docs.gradle.com/enterprise/gradle-plugin/#gradle_6_x_and_later]]
>  build scan with new version:
> id 'com.gradle.build-scan' version '2.3'
> moved from build.gradle to settings.gradle and switch to new plugin
> id "com.gradle.enterprise" version "3.4.1"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10819) Update Gradle to 6.1.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10819:
--
Status: Resolved  (was: Open)

> Update Gradle to 6.1.1
> --
>
> Key: BEAM-10819
> URL: https://issues.apache.org/jira/browse/BEAM-10819
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Adam Newman
>Assignee: Adam Newman
>Priority: P2
>  Labels: gradle, gradle-wrapper
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update Gradle build wrapper to Gradle 6.1.1.
> Update build plugins to fix incompatibility:
> org.nosphere.apache.rat:0.4.0 -> 0.5.2
> com.diffplug.gradle.spotless:3.24.0 -> 3.24.2
> net.ltgt.gradle:gradle-apt-plugin:0.20 -> 0.21
> com.google.protobuf:protobuf-gradle-plugin:0.8.5 -> 0.8.8
> Replace 
> [deprecated|[https://docs.gradle.com/enterprise/gradle-plugin/#gradle_6_x_and_later]]
>  build scan with new version:
> id 'com.gradle.build-scan' version '2.3'
> moved from build.gradle to settings.gradle and switch to new plugin
> id "com.gradle.enterprise" version "3.4.1"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10819) Update Gradle to 6.1.1

2020-09-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10819:
--
Fix Version/s: (was: Not applicable)
   2.24.0

> Update Gradle to 6.1.1
> --
>
> Key: BEAM-10819
> URL: https://issues.apache.org/jira/browse/BEAM-10819
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Adam Newman
>Assignee: Adam Newman
>Priority: P2
>  Labels: gradle, gradle-wrapper
> Fix For: 2.24.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Update Gradle build wrapper to Gradle 6.1.1.
> Update build plugins to fix incompatibility:
> org.nosphere.apache.rat:0.4.0 -> 0.5.2
> com.diffplug.gradle.spotless:3.24.0 -> 3.24.2
> net.ltgt.gradle:gradle-apt-plugin:0.20 -> 0.21
> com.google.protobuf:protobuf-gradle-plugin:0.8.5 -> 0.8.8
> Replace 
> [deprecated|[https://docs.gradle.com/enterprise/gradle-plugin/#gradle_6_x_and_later]]
>  build scan with new version:
> id 'com.gradle.build-scan' version '2.3'
> moved from build.gradle to settings.gradle and switch to new plugin
> id "com.gradle.enterprise" version "3.4.1"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10760) Cleanup timers lead to unbounded state accumulation in global window

2020-09-08 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192247#comment-17192247
 ] 

Maximilian Michels commented on BEAM-10760:
---

Fixed via 5fd0010e52c5efee36760c302b03181e429da02f and 
e010b15ba16e7867cb1acd36d9215674606a7d2a.

> Cleanup timers lead to unbounded state accumulation in global window
> 
>
> Key: BEAM-10760
> URL: https://issues.apache.org/jira/browse/BEAM-10760
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-flink
>Affects Versions: 2.21.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: P2
> Fix For: 2.25.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> For each key, the runner sets a cleanup timer that is designed to garbage 
> collect state at the end of a window. For a global window, these timers will 
> stay around until the pipeline terminates. Depending on the key cardinality, 
> this can lead to unbounded state growth, which in the case of the Flink 
> runner is observable in the growth of checkpoint size.
> https://lists.apache.org/thread.html/rae268806035688b77646195505e5b7a56568a38feb1e52d6341feedd%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10760) Cleanup timers lead to unbounded state accumulation in global window

2020-09-08 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10760:
--
Status: Resolved  (was: Open)

> Cleanup timers lead to unbounded state accumulation in global window
> 
>
> Key: BEAM-10760
> URL: https://issues.apache.org/jira/browse/BEAM-10760
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-flink
>Affects Versions: 2.21.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: P2
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> For each key, the runner sets a cleanup timer that is designed to garbage 
> collect state at the end of a window. For a global window, these timers will 
> stay around until the pipeline terminates. Depending on the key cardinality, 
> this can lead to unbounded state growth, which in the case of the Flink 
> runner is observable in the growth of checkpoint size.
> https://lists.apache.org/thread.html/rae268806035688b77646195505e5b7a56568a38feb1e52d6341feedd%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10760) Cleanup timers lead to unbounded state accumulation in global window

2020-09-08 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10760:
--
Fix Version/s: 2.25.0

> Cleanup timers lead to unbounded state accumulation in global window
> 
>
> Key: BEAM-10760
> URL: https://issues.apache.org/jira/browse/BEAM-10760
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-flink
>Affects Versions: 2.21.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: P2
> Fix For: 2.25.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> For each key, the runner sets a cleanup timer that is designed to garbage 
> collect state at the end of a window. For a global window, these timers will 
> stay around until the pipeline terminates. Depending on the key cardinality, 
> this can lead to unbounded state growth, which in the case of the Flink 
> runner is observable in the growth of checkpoint size.
> https://lists.apache.org/thread.html/rae268806035688b77646195505e5b7a56568a38feb1e52d6341feedd%40%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10630) Include data from load tests in the release process

2020-09-07 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10630:
--
Status: Resolved  (was: Open)

> Include data from load tests in the release process
> ---
>
> Key: BEAM-10630
> URL: https://issues.apache.org/jira/browse/BEAM-10630
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-community
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In the past, we have seen performance regressions in releases. We should make 
> sure that the release guide includes checking available performance 
> measurements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10848) Gauge metrics error when setting timers

2020-09-02 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10848:
--
Status: Open  (was: Triage Needed)

> Gauge metrics error when setting timers
> ---
>
> Key: BEAM-10848
> URL: https://issues.apache.org/jira/browse/BEAM-10848
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Priority: P2
>
> Gauges are affected by setting timers leading to {{None}} values:
> {noformat}
> ERROR:apache_beam.runners.worker.sdk_worker:Error processing instruction 147. 
> Original traceback is
> Traceback (most recent call last):
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 253, in _execute
> response = task()
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 310, in 
> lambda: self.create_worker().do_instruction(request), request)
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 480, in do_instruction
> getattr(request, request_type), request.instruction_id)
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 516, in process_bundle
> monitoring_infos = bundle_processor.monitoring_infos()
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 1107, in monitoring_infos
> op.monitoring_infos(transform_id, dict(tag_to_pcollection_id)))
>   File "apache_beam/runners/worker/operations.py", line 340, in 
> apache_beam.runners.worker.operations.Operation.monitoring_infos
>   File "apache_beam/runners/worker/operations.py", line 347, in 
> apache_beam.runners.worker.operations.Operation.monitoring_infos
>   File "apache_beam/runners/worker/operations.py", line 386, in 
> apache_beam.runners.worker.operations.Operation.user_monitoring_infos
>   File "apache_beam/metrics/execution.py", line 261, in 
> apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
>   File "apache_beam/metrics/cells.py", line 222, in 
> apache_beam.metrics.cells.GaugeCell.to_runner_api_monitoring_info
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/metrics/monitoring_infos.py",
>  line 222, in int64_user_gauge
> payload = _encode_gauge(coder, timestamp, value)
>   File 
> "/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/metrics/monitoring_infos.py",
>  line 397, in _encode_gauge
> coder.get_impl().encode_to_stream(value, stream, True)
>   File "apache_beam/coders/coder_impl.py", line 690, in 
> apache_beam.coders.coder_impl.VarIntCoderImpl.encode_to_stream
>   File "apache_beam/coders/coder_impl.py", line 692, in 
> apache_beam.coders.coder_impl.VarIntCoderImpl.encode_to_stream
> TypeError: an integer is required
> {noformat}
> The transform has the following structure and errors when the lines following 
> {{TODO}} have been uncommented:
> {code:python}
> class StatefulOperation(beam.DoFn):
>   def __init__(self, state_size_per_key_bytes, use_processing_timer=False):
> self.state_size_per_key_bytes = state_size_per_key_bytes
> self.str_coder = StrUtf8Coder().get_impl()
> self.bytes_gauge = Metrics.gauge('synthetic', 'state_bytes')
> self.elements_gauge = Metrics.gauge('synthetic', 'state_elements')
> self.use_processing_timer = use_processing_timer
>   state_spec = userstate.BagStateSpec('state', StrUtf8Coder())
>   state_spec2 = userstate.CombiningValueStateSpec('state_size_bytes', 
> combine_fn=sum)
>   state_spec3 = userstate.CombiningValueStateSpec('num_state_entries', 
> combine_fn=sum)
>   event_timer_spec = userstate.TimerSpec('event_timer', 
> beam.TimeDomain.WATERMARK)
>   processing_timer_spec = userstate.TimerSpec('proc_timer', 
> beam.TimeDomain.REAL_TIME)
>   def process(self,
>   element,
>   timestamp=beam.DoFn.TimestampParam,
>   state=beam.DoFn.StateParam(state_spec),
>   state_num_bytes=beam.DoFn.StateParam(state_spec2),
>   state_num_entries=beam.DoFn.StateParam(state_spec3),
>   event_timer=beam.DoFn.TimerParam(event_timer_spec),
>   processing_timer=beam.DoFn.TimerParam(processing_timer_spec)):
> # Append stringified element to state until the threshold has been reached
> # The cleanup timer will then clean up and the process wil

[jira] [Updated] (BEAM-10848) Gauge metrics error when setting timers

2020-09-02 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10848:
--
Description: 
Gauges are affected by setting timers leading to {{None}} values:

{noformat}
ERROR:apache_beam.runners.worker.sdk_worker:Error processing instruction 147. 
Original traceback is
Traceback (most recent call last):
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 253, in _execute
response = task()
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 310, in 
lambda: self.create_worker().do_instruction(request), request)
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 480, in do_instruction
getattr(request, request_type), request.instruction_id)
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 516, in process_bundle
monitoring_infos = bundle_processor.monitoring_infos()
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 1107, in monitoring_infos
op.monitoring_infos(transform_id, dict(tag_to_pcollection_id)))
  File "apache_beam/runners/worker/operations.py", line 340, in 
apache_beam.runners.worker.operations.Operation.monitoring_infos
  File "apache_beam/runners/worker/operations.py", line 347, in 
apache_beam.runners.worker.operations.Operation.monitoring_infos
  File "apache_beam/runners/worker/operations.py", line 386, in 
apache_beam.runners.worker.operations.Operation.user_monitoring_infos
  File "apache_beam/metrics/execution.py", line 261, in 
apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
  File "apache_beam/metrics/cells.py", line 222, in 
apache_beam.metrics.cells.GaugeCell.to_runner_api_monitoring_info
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/metrics/monitoring_infos.py",
 line 222, in int64_user_gauge
payload = _encode_gauge(coder, timestamp, value)
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/metrics/monitoring_infos.py",
 line 397, in _encode_gauge
coder.get_impl().encode_to_stream(value, stream, True)
  File "apache_beam/coders/coder_impl.py", line 690, in 
apache_beam.coders.coder_impl.VarIntCoderImpl.encode_to_stream
  File "apache_beam/coders/coder_impl.py", line 692, in 
apache_beam.coders.coder_impl.VarIntCoderImpl.encode_to_stream
TypeError: an integer is required
{noformat}

The transform has the following structure and errors when the lines following 
{{TODO}} have been uncommented:

{code:python}
class StatefulOperation(beam.DoFn):
  def __init__(self, state_size_per_key_bytes, use_processing_timer=False):
self.state_size_per_key_bytes = state_size_per_key_bytes
self.str_coder = StrUtf8Coder().get_impl()
self.bytes_gauge = Metrics.gauge('synthetic', 'state_bytes')
self.elements_gauge = Metrics.gauge('synthetic', 'state_elements')
self.use_processing_timer = use_processing_timer

  state_spec = userstate.BagStateSpec('state', StrUtf8Coder())

  state_spec2 = userstate.CombiningValueStateSpec('state_size_bytes', 
combine_fn=sum)

  state_spec3 = userstate.CombiningValueStateSpec('num_state_entries', 
combine_fn=sum)

  event_timer_spec = userstate.TimerSpec('event_timer', 
beam.TimeDomain.WATERMARK)
  processing_timer_spec = userstate.TimerSpec('proc_timer', 
beam.TimeDomain.REAL_TIME)

  def process(self,
  element,
  timestamp=beam.DoFn.TimestampParam,
  state=beam.DoFn.StateParam(state_spec),
  state_num_bytes=beam.DoFn.StateParam(state_spec2),
  state_num_entries=beam.DoFn.StateParam(state_spec3),
  event_timer=beam.DoFn.TimerParam(event_timer_spec),
  processing_timer=beam.DoFn.TimerParam(processing_timer_spec)):
# Append stringified element to state until the threshold has been reached
# The cleanup timer will then clean up and the process will repeat.
if state_num_bytes.read() <= self.state_size_per_key_bytes:
  state_element = str(element)
  state.add(state_element)
  bytes_added = len(self.str_coder.encode_nested(state_element))
  state_num_bytes.add(bytes_added)
  state_num_entries.add(1)
  timer = processing_timer if self.use_processing_timer else event_timer
  # Set a timer which will clear the state if it grows too large
  timer.set(timestamp.micros // 100 + 5)
# Metrics
# TODO Unfortunately buggy with t

[jira] [Created] (BEAM-10848) Gauge metrics error when setting timers

2020-09-02 Thread Maximilian Michels (Jira)

Maximilian Michels created BEAM-10848:
-

 Summary: Gauge metrics error when setting timers
 Key: BEAM-10848
 URL: https://issues.apache.org/jira/browse/BEAM-10848
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Maximilian Michels


Gauges are affected by setting timers leading to {{None}} values:

{noformat}
ERROR:apache_beam.runners.worker.sdk_worker:Error processing instruction 147. 
Original traceback is
Traceback (most recent call last):
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 253, in _execute
response = task()
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 310, in 
lambda: self.create_worker().do_instruction(request), request)
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 480, in do_instruction
getattr(request, request_type), request.instruction_id)
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 516, in process_bundle
monitoring_infos = bundle_processor.monitoring_infos()
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 1107, in monitoring_infos
op.monitoring_infos(transform_id, dict(tag_to_pcollection_id)))
  File "apache_beam/runners/worker/operations.py", line 340, in 
apache_beam.runners.worker.operations.Operation.monitoring_infos
  File "apache_beam/runners/worker/operations.py", line 347, in 
apache_beam.runners.worker.operations.Operation.monitoring_infos
  File "apache_beam/runners/worker/operations.py", line 386, in 
apache_beam.runners.worker.operations.Operation.user_monitoring_infos
  File "apache_beam/metrics/execution.py", line 261, in 
apache_beam.metrics.execution.MetricsContainer.to_runner_api_monitoring_infos
  File "apache_beam/metrics/cells.py", line 222, in 
apache_beam.metrics.cells.GaugeCell.to_runner_api_monitoring_info
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/metrics/monitoring_infos.py",
 line 222, in int64_user_gauge
payload = _encode_gauge(coder, timestamp, value)
  File 
"/Users/max/Consulting/Lyft/dev/streamperfbench/beamperfk8s/venv/lib/python3.7/site-packages/apache_beam/metrics/monitoring_infos.py",
 line 397, in _encode_gauge
coder.get_impl().encode_to_stream(value, stream, True)
  File "apache_beam/coders/coder_impl.py", line 690, in 
apache_beam.coders.coder_impl.VarIntCoderImpl.encode_to_stream
  File "apache_beam/coders/coder_impl.py", line 692, in 
apache_beam.coders.coder_impl.VarIntCoderImpl.encode_to_stream
TypeError: an integer is required
{noformat}

The application has the following structure:

{code:python}
class StatefulOperation(beam.DoFn):
  def __init__(self, state_size_per_key_bytes, use_processing_timer=False):
self.state_size_per_key_bytes = state_size_per_key_bytes
self.str_coder = StrUtf8Coder().get_impl()
self.bytes_gauge = Metrics.gauge('synthetic', 'state_bytes')
self.elements_gauge = Metrics.gauge('synthetic', 'state_elements')
self.use_processing_timer = use_processing_timer

  state_spec = userstate.BagStateSpec('state', StrUtf8Coder())

  state_spec2 = userstate.CombiningValueStateSpec('state_size_bytes', 
combine_fn=sum)

  state_spec3 = userstate.CombiningValueStateSpec('num_state_entries', 
combine_fn=sum)

  event_timer_spec = userstate.TimerSpec('event_timer', 
beam.TimeDomain.WATERMARK)
  processing_timer_spec = userstate.TimerSpec('proc_timer', 
beam.TimeDomain.REAL_TIME)

  def process(self,
  element,
  timestamp=beam.DoFn.TimestampParam,
  state=beam.DoFn.StateParam(state_spec),
  state_num_bytes=beam.DoFn.StateParam(state_spec2),
  state_num_entries=beam.DoFn.StateParam(state_spec3),
  event_timer=beam.DoFn.TimerParam(event_timer_spec),
  processing_timer=beam.DoFn.TimerParam(processing_timer_spec)):
# Append stringified element to state until the threshold has been reached
# The cleanup timer will then clean up and the process will repeat.
if state_num_bytes.read() <= self.state_size_per_key_bytes:
  state_element = str(element)
  state.add(state_element)
  bytes_added = len(self.str_coder.encode_nested(state_element))
  state_num_bytes.add(bytes_added)
  state_num_entries.add(1)
  timer = processing_timer if self.use_processing_timer else event_timer
  # Set a timer which will clear the state if it grows too large
  time

[jira] [Commented] (BEAM-8742) Add stateful processing to ParDo load test

2020-08-24 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183282#comment-17183282
 ] 

Maximilian Michels commented on BEAM-8742:
--

You're right, seems like this is now possible thanks to BEAM-3742.

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.
> The test should work in streaming mode and with checkpointing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10612) Add support for Flink 1.11.0

2020-08-15 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10612:
--
Fix Version/s: 2.25.0

> Add support for Flink 1.11.0
> 
>
> Key: BEAM-10612
> URL: https://issues.apache.org/jira/browse/BEAM-10612
> Project: Beam
>  Issue Type: Wish
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Neville Li
>Priority: P2
> Fix For: 2.25.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Flink 1.11.0 was released on July 6, 2020: 
> [https://flink.apache.org/news/2020/07/06/release-1.11.0.html]
> We should add a Flink 1.11 module to Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10612) Add support for Flink 1.11.0

2020-08-15 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10612:
--
Status: Resolved  (was: Triage Needed)

> Add support for Flink 1.11.0
> 
>
> Key: BEAM-10612
> URL: https://issues.apache.org/jira/browse/BEAM-10612
> Project: Beam
>  Issue Type: Wish
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Neville Li
>Priority: P2
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Flink 1.11.0 was released on July 6, 2020: 
> [https://flink.apache.org/news/2020/07/06/release-1.11.0.html]
> We should add a Flink 1.11 module to Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10691) FlinkRunner: pipeline slows down due to expensive output timestamp queue

2020-08-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10691:
--
Summary: FlinkRunner: pipeline slows down due to expensive output timestamp 
queue   (was: FlinkRunner: pipeline slows down due to slow output timetamp 
queue )

> FlinkRunner: pipeline slows down due to expensive output timestamp queue 
> -
>
> Key: BEAM-10691
> URL: https://issues.apache.org/jira/browse/BEAM-10691
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.23.0, 2.24.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: P1
> Fix For: 2.25.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Pipeline might stop progressing watermark in certain cases due to timer 
> output timestamp not being released from 
> FlinkTimerInternals#outputTimestampQueue. The pipeline has to be restarted 
> from checkpoint to reload the cache and free watermark hold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10676) Timers use the input timestamp as the timer output timestamp which prevents watermark progress

2020-08-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10676:
--
Status: Resolved  (was: Resolved)

> Timers use the input timestamp as the timer output timestamp which prevents 
> watermark progress
> --
>
> Key: BEAM-10676
> URL: https://issues.apache.org/jira/browse/BEAM-10676
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> By default, the Python SDK adds a timer output timestamp equal to the current 
> timestamp of an element. This is problematic because
> 1. We hold back the output watermark on the current element's timestamp for 
> every timer
> 2. It doesn't match the behavior in the Java SDK which defaults to using the 
> fire timestamp as the timer output timestamp (and adds a hold on it)
> 3. There is no way for the user to influence this behavior because there is 
> no user-facing API 
> https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650
> We should use the fire timestamp as the default output timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10691) FlinkRunner: pipeline slows down due to slow output timetamp queue

2020-08-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10691:
--
Summary: FlinkRunner: pipeline slows down due to slow output timetamp queue 
  (was: FlinkRunner: pipeline might get stuck due to timer watermark hold not 
being released)

> FlinkRunner: pipeline slows down due to slow output timetamp queue 
> ---
>
> Key: BEAM-10691
> URL: https://issues.apache.org/jira/browse/BEAM-10691
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.23.0, 2.24.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: P1
> Fix For: 2.25.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Pipeline might stop progressing watermark in certain cases due to timer 
> output timestamp not being released from 
> FlinkTimerInternals#outputTimestampQueue. The pipeline has to be restarted 
> from checkpoint to reload the cache and free watermark hold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10691) FlinkRunner: pipeline might get stuck due to timer watermark hold not being released

2020-08-14 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177647#comment-17177647
 ] 

Maximilian Michels commented on BEAM-10691:
---

Also see related BEAM-10676 bug for Python.

> FlinkRunner: pipeline might get stuck due to timer watermark hold not being 
> released
> 
>
> Key: BEAM-10691
> URL: https://issues.apache.org/jira/browse/BEAM-10691
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.23.0, 2.24.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: P1
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Pipeline might stop progressing watermark in certain cases due to timer 
> output timestamp not being released from 
> FlinkTimerInternals#outputTimestampQueue. The pipeline has to be restarted 
> from checkpoint to reload the cache and free watermark hold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10691) FlinkRunner: pipeline might get stuck due to timer watermark hold not being released

2020-08-14 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177645#comment-17177645
 ] 

Maximilian Michels commented on BEAM-10691:
---

Putting in triage mode again because there is not enough information here yet 
to indicate this is actually a bug. As discussed in 
https://github.com/apache/beam/pull/12551, there might be a timer set that is 
yet to fire in the future because the input watermark has not reached the fire 
timestamp yet. If this timer has an output timestamp configured, the output 
watermark will be held back. This is a common source of confusion when it comes 
to the output timestamp feature.

> FlinkRunner: pipeline might get stuck due to timer watermark hold not being 
> released
> 
>
> Key: BEAM-10691
> URL: https://issues.apache.org/jira/browse/BEAM-10691
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.23.0, 2.24.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: P1
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Pipeline might stop progressing watermark in certain cases due to timer 
> output timestamp not being released from 
> FlinkTimerInternals#outputTimestampQueue. The pipeline has to be restarted 
> from checkpoint to reload the cache and free watermark hold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10691) FlinkRunner: pipeline might get stuck due to timer watermark hold not being released

2020-08-14 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10691:
--
Status: Triage Needed  (was: Open)

> FlinkRunner: pipeline might get stuck due to timer watermark hold not being 
> released
> 
>
> Key: BEAM-10691
> URL: https://issues.apache.org/jira/browse/BEAM-10691
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.23.0, 2.24.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: P1
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Pipeline might stop progressing watermark in certain cases due to timer 
> output timestamp not being released from 
> FlinkTimerInternals#outputTimestampQueue. The pipeline has to be restarted 
> from checkpoint to reload the cache and free watermark hold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10676) Timers use the input timestamp as the timer output timestamp which prevents watermark progress

2020-08-13 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10676:
--
Status: Resolved  (was: Open)

> Timers use the input timestamp as the timer output timestamp which prevents 
> watermark progress
> --
>
> Key: BEAM-10676
> URL: https://issues.apache.org/jira/browse/BEAM-10676
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> By default, the Python SDK adds a timer output timestamp equal to the current 
> timestamp of an element. This is problematic because
> 1. We hold back the output watermark on the current element's timestamp for 
> every timer
> 2. It doesn't match the behavior in the Java SDK which defaults to using the 
> fire timestamp as the timer output timestamp (and adds a hold on it)
> 3. There is no way for the user to influence this behavior because there is 
> no user-facing API 
> https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650
> We should use the fire timestamp as the default output timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10676) Timers use the input timestamp as the timer output timestamp which prevents watermark progress

2020-08-13 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176814#comment-17176814
 ] 

Maximilian Michels commented on BEAM-10676:
---

Yes, PR has been merged. So closing this issue.

> Timers use the input timestamp as the timer output timestamp which prevents 
> watermark progress
> --
>
> Key: BEAM-10676
> URL: https://issues.apache.org/jira/browse/BEAM-10676
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> By default, the Python SDK adds a timer output timestamp equal to the current 
> timestamp of an element. This is problematic because
> 1. We hold back the output watermark on the current element's timestamp for 
> every timer
> 2. It doesn't match the behavior in the Java SDK which defaults to using the 
> fire timestamp as the timer output timestamp (and adds a hold on it)
> 3. There is no way for the user to influence this behavior because there is 
> no user-facing API 
> https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650
> We should use the fire timestamp as the default output timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10676) Timers use the input timestamp as the timer output timestamp which prevents watermark progress

2020-08-13 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176813#comment-17176813
 ] 

Maximilian Michels commented on BEAM-10676:
---

Small addition: By design, processing timers set the timer output timestamp to 
the current input timestamp. In this issue, we adapted the Python SDK to the 
behavior of the Java SDK, which is to set the output timestamp of event time 
timers to the fire timestamp. 

> Timers use the input timestamp as the timer output timestamp which prevents 
> watermark progress
> --
>
> Key: BEAM-10676
> URL: https://issues.apache.org/jira/browse/BEAM-10676
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> By default, the Python SDK adds a timer output timestamp equal to the current 
> timestamp of an element. This is problematic because
> 1. We hold back the output watermark on the current element's timestamp for 
> every timer
> 2. It doesn't match the behavior in the Java SDK which defaults to using the 
> fire timestamp as the timer output timestamp (and adds a hold on it)
> 3. There is no way for the user to influence this behavior because there is 
> no user-facing API 
> https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650
> We should use the fire timestamp as the default output timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10671) Add environment configuration fields as first-class pipeline options

2020-08-11 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175704#comment-17175704
 ] 

Maximilian Michels commented on BEAM-10671:
---

Good idea. Not sure whether we need to prefix all the parameters with 
{{environment_}}. Maybe use the environment name as the prefix, e.g. 
{{docker_container_image}} vs {{environment_container_image}}.

> Add environment configuration fields as first-class pipeline options
> 
>
> Key: BEAM-10671
> URL: https://issues.apache.org/jira/browse/BEAM-10671
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P2
>
> The pipeline option --environment_config has completely different usages 
> depending on the value of --environment_type. This is confusing for the user 
> and hard to check. Additionally, --environment_config is a JSON blob for 
> --environment_type=PROCESS. This JSON blob is a pain to escape and pass 
> around compared to a collection of flat strings.
> We should replace --environment_config with first-class / top-level pipeline 
> options for each environment type:
> DOCKER
> --environment_container_image
> PROCESS
> --environment_os
> --environment_architecture
> --environment_variables
> EXTERNAL
> --environment_service_address
> LOOPBACK
> (none)
> This way we can validate that the user is configuring these options correctly 
> (ie give a warning or error if they use options that do not apply to their 
> chosen --environment_type).
> We can deprecate the --environment_config option, logging a warning until 
> removing this option altogether in a future Beam release.
> [https://beam.apache.org/documentation/runtime/sdk-harness-config/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10676) Timers use the input timestamp as the timer output timestamp which prevents watermark progress

2020-08-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10676:
--
Fix Version/s: 2.24.0

> Timers use the input timestamp as the timer output timestamp which prevents 
> watermark progress
> --
>
> Key: BEAM-10676
> URL: https://issues.apache.org/jira/browse/BEAM-10676
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> By default, the Python SDK adds a timer output timestamp equal to the current 
> timestamp of an element. This is problematic because
> 1. We hold back the output watermark on the current element's timestamp for 
> every timer
> 2. It doesn't match the behavior in the Java SDK which defaults to using the 
> fire timestamp as the timer output timestamp (and adds a hold on it)
> 3. There is no way for the user to influence this behavior because there is 
> no user-facing API 
> https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650
> We should use the fire timestamp as the default output timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10676) Timers use the input timestamp as the timer output timestamp which prevents watermark progress

2020-08-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10676:
--
Summary: Timers use the input timestamp as the timer output timestamp which 
prevents watermark progress  (was: Timers by default add a hold on the input 
timestamp)

> Timers use the input timestamp as the timer output timestamp which prevents 
> watermark progress
> --
>
> Key: BEAM-10676
> URL: https://issues.apache.org/jira/browse/BEAM-10676
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>
> By default, the Python SDK adds a timer output timestamp equal to the current 
> timestamp of an element. This is problematic because
> 1. We hold back the output watermark on the current element's timestamp for 
> every timer
> 2. It doesn't match the behavior in the Java SDK which defaults to using the 
> fire timestamp as the timer output timestamp (and adds a hold on it)
> 3. There is no way for the user to influence this behavior because there is 
> no user-facing API 
> https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650
> We should use the fire timestamp as the default output timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10676) Timers by default add a hold on the input timestamp

2020-08-11 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10676:
--
Description: 
By default, the Python SDK adds a timer output timestamp equal to the current 
timestamp of an element. This is problematic because

1. We hold back the output watermark on the current element's timestamp for 
every timer
2. It doesn't match the behavior in the Java SDK which defaults to using the 
fire timestamp as the timer output timestamp (and adds a hold on it)
3. There is no way for the user to influence this behavior because there is no 
user-facing API 

https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650

We should use the fire timestamp as the default output timestamp.

  was:
By default, the Python SDK adds a timer output timestamp equal to the current 
timestamp of an element. This is problematic because

1. We hold back the output watermark on the current element's timestamp for 
every timer
2. It doesn't match the behavior in the Java SDK which defaults to using the 
fire timestamp as the timer output timestamp (and adds a hold on it)
3. There is no way for the user to influence this behavior because there is no 
user-facing API 

https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650


> Timers by default add a hold on the input timestamp
> ---
>
> Key: BEAM-10676
> URL: https://issues.apache.org/jira/browse/BEAM-10676
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>
> By default, the Python SDK adds a timer output timestamp equal to the current 
> timestamp of an element. This is problematic because
> 1. We hold back the output watermark on the current element's timestamp for 
> every timer
> 2. It doesn't match the behavior in the Java SDK which defaults to using the 
> fire timestamp as the timer output timestamp (and adds a hold on it)
> 3. There is no way for the user to influence this behavior because there is 
> no user-facing API 
> https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650
> We should use the fire timestamp as the default output timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10676) Timers by default add a hold on the input timestamp

2020-08-11 Thread Maximilian Michels (Jira)

Maximilian Michels created BEAM-10676:
-

 Summary: Timers by default add a hold on the input timestamp
 Key: BEAM-10676
 URL: https://issues.apache.org/jira/browse/BEAM-10676
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core, sdk-py-harness
Reporter: Maximilian Michels
Assignee: Maximilian Michels


By default, the Python SDK adds a timer output timestamp equal to the current 
timestamp of an element. This is problematic because

1. We hold back the output watermark on the current element's timestamp for 
every timer
2. It doesn't match the behavior in the Java SDK which defaults to using the 
fire timestamp as the timer output timestamp (and adds a hold on it)
3. There is no way for the user to influence this behavior because there is no 
user-facing API 

https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-5521) Cache execution trees in SDK worker

2020-08-11 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175440#comment-17175440
 ] 

Maximilian Michels commented on BEAM-5521:
--

If I'm not mistaken this has already been fixed by introducing a cache for 
process bundle descriptors. 

CC [~robertwb]

> Cache execution trees in SDK worker
> ---
>
> Key: BEAM-5521
> URL: https://issues.apache.org/jira/browse/BEAM-5521
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: P2
>  Labels: portability-flink, stale-P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently they are re-constructed from the protos for every bundle, which is 
> expensive (especially for 1-element bundles in streaming flink). 
> Care should be taken to ensure the objects can be re-usued. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9976) FlinkSavepointTest timeout flake

2020-08-06 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172628#comment-17172628
 ] 

Maximilian Michels commented on BEAM-9976:
--

Oh that's neat. I didn't know this feature. Cool!

> FlinkSavepointTest timeout flake
> 
>
> Key: BEAM-9976
> URL: https://issues.apache.org/jira/browse/BEAM-9976
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Kyle Weaver
>Priority: P1
>  Labels: beam-fixit, flake
> Fix For: 2.24.0
>
> Attachments: multiple_savepoint_tests.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable
> org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.runSavepointAndRestore(FlinkSavepointTest.java:188)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable(FlinkSavepointTest.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9976) FlinkSavepointTest timeout flake

2020-08-06 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-9976:
-
Status: Resolved  (was: Resolved)

> FlinkSavepointTest timeout flake
> 
>
> Key: BEAM-9976
> URL: https://issues.apache.org/jira/browse/BEAM-9976
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Kyle Weaver
>Priority: P1
>  Labels: beam-fixit, flake
> Fix For: 2.24.0
>
> Attachments: multiple_savepoint_tests.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable
> org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.runSavepointAndRestore(FlinkSavepointTest.java:188)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable(FlinkSavepointTest.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9976) FlinkSavepointTest timeout flake

2020-08-06 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172609#comment-17172609
 ] 

Maximilian Michels commented on BEAM-9976:
--

If we haven't seen any new failures, I'd say yes.

> FlinkSavepointTest timeout flake
> 
>
> Key: BEAM-9976
> URL: https://issues.apache.org/jira/browse/BEAM-9976
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Kyle Weaver
>Priority: P1
>  Labels: beam-fixit, flake
> Attachments: multiple_savepoint_tests.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable
> org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.runSavepointAndRestore(FlinkSavepointTest.java:188)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable(FlinkSavepointTest.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10622) Prefix Gradle paths with a colon for user-facing output

2020-08-04 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10622:
--
Fix Version/s: 2.24.0

> Prefix Gradle paths with a colon for user-facing output
> ---
>
> Key: BEAM-10622
> URL: https://issues.apache.org/jira/browse/BEAM-10622
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P3
> Fix For: 2.24.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When using the FlinkRunner with master, the job server needs to be build 
> first. Beam prints out instructions for that, e.g.
> {noformat}
> RuntimeError: 
> /Users/max/Dev/beam/runners/flink/1.10/job-server/build/libs/beam-runners-flink-1.10-job-server-2.24.0-SNAPSHOT.jar
>  not found. Please build the server with
>   cd /Users/max/Dev/beam; ./gradlew runners:flink:1.10:job-server:shadowJar
> {noformat}
> Note that the gradle path printed is not valid because it misses a colon 
> before "runners".
> Internally, the Gradle path is used to resolve the jar locally or from Maven 
> Central. Gradle paths always start with a colon. We should stick to that 
> format although we can also parse Gradle paths which do not start with a 
> colon.
> In order for the Gradle path printed to be valid, we should prefix Gradle 
> path with colon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10622) Prefix Gradle paths with a colon for user-facing output

2020-08-04 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10622:
--
Status: Resolved  (was: Open)

> Prefix Gradle paths with a colon for user-facing output
> ---
>
> Key: BEAM-10622
> URL: https://issues.apache.org/jira/browse/BEAM-10622
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P3
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using the FlinkRunner with master, the job server needs to be build 
> first. Beam prints out instructions for that, e.g.
> {noformat}
> RuntimeError: 
> /Users/max/Dev/beam/runners/flink/1.10/job-server/build/libs/beam-runners-flink-1.10-job-server-2.24.0-SNAPSHOT.jar
>  not found. Please build the server with
>   cd /Users/max/Dev/beam; ./gradlew runners:flink:1.10:job-server:shadowJar
> {noformat}
> Note that the gradle path printed is not valid because it misses a colon 
> before "runners".
> Internally, the Gradle path is used to resolve the jar locally or from Maven 
> Central. Gradle paths always start with a colon. We should stick to that 
> format although we can also parse Gradle paths which do not start with a 
> colon.
> In order for the Gradle path printed to be valid, we should prefix Gradle 
> path with colon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10630) Include data from load tests in the release process

2020-08-03 Thread Maximilian Michels (Jira)

Maximilian Michels created BEAM-10630:
-

 Summary: Include data from load tests in the release process
 Key: BEAM-10630
 URL: https://issues.apache.org/jira/browse/BEAM-10630
 Project: Beam
  Issue Type: Improvement
  Components: beam-community
Reporter: Maximilian Michels
Assignee: Maximilian Michels


In the past, we have seen performance regressions in releases. We should make 
sure that the release guide includes checking available performance 
measurements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10622) Prefix Gradle paths with a colon for user-facing output

2020-08-03 Thread Maximilian Michels (Jira)

Maximilian Michels created BEAM-10622:
-

 Summary: Prefix Gradle paths with a colon for user-facing output
 Key: BEAM-10622
 URL: https://issues.apache.org/jira/browse/BEAM-10622
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Maximilian Michels
Assignee: Maximilian Michels


When using the FlinkRunner with master, the job server needs to be build first. 
Beam prints out instructions for that, e.g.

{noformat}
RuntimeError: 
/Users/max/Dev/beam/runners/flink/1.10/job-server/build/libs/beam-runners-flink-1.10-job-server-2.24.0-SNAPSHOT.jar
 not found. Please build the server with
  cd /Users/max/Dev/beam; ./gradlew runners:flink:1.10:job-server:shadowJar
{noformat}

Note that the gradle path printed is not valid because it misses a colon before 
"runners".

Internally, the Gradle path is used to resolve the jar locally or from Maven 
Central. Gradle paths always start with a colon. We should stick to that format 
although we can also parse Gradle paths which do not start with a colon.

In order for the Gradle path printed to be valid, we should prefix Gradle path 
with colon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10602) Display Python streaming metrics in Grafana dashboard

2020-07-29 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10602:
--
Description: 
The Grafana dashboard is currently missing Python load test metrics: 
http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python

 !screenshot-1.png! 

  was:
The Grafana dashboard is currently missing Python load test metrics: 
http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python

 !image-2020-07-29-17-19-43-696.png! 


> Display Python streaming metrics in Grafana dashboard
> -
>
> Key: BEAM-10602
> URL: https://issues.apache.org/jira/browse/BEAM-10602
> Project: Beam
>  Issue Type: Task
>  Components: build-system, testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Attachments: screenshot-1.png
>
>
> The Grafana dashboard is currently missing Python load test metrics: 
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Moved] (BEAM-10602) Display Python streaming metrics in Grafana dashboard

2020-07-29 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels moved FLINK-18754 to BEAM-10602:
---

An error occurred whilst rendering this message.  Please contact the 
administrators, and inform them of this bug.

Details:
---
org.apache.velocity.exception.MethodInvocationException: Invocation of method 
'printChangelog' in  class com.atlassian.jira.util.JiraVelocityHelper threw 
exception com.atlassian.jira.issue.customfields.impl.FieldValidationException: 
Invalid priority name 'Major (not available)'. at 
templates/email/macros.vm[line 42, column 59]
at 
org.apache.velocity.runtime.parser.node.ASTMethod.handleInvocationException(ASTMethod.java:337)
at 
org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:284)
at 
org.apache.velocity.runtime.parser.node.ASTReference.execute(ASTReference.java:262)
at 
org.apache.velocity.runtime.parser.node.ASTReference.render(ASTReference.java:342)
at 
org.apache.velocity.runtime.parser.node.ASTBlock.render(ASTBlock.java:72)
at 
org.apache.velocity.runtime.parser.node.ASTIfStatement.render(ASTIfStatement.java:87)
at 
org.apache.velocity.runtime.parser.node.ASTBlock.render(ASTBlock.java:72)
at 
org.apache.velocity.runtime.directive.VelocimacroProxy.render(VelocimacroProxy.java:212)
at 
org.apache.velocity.runtime.directive.RuntimeMacro.render(RuntimeMacro.java:247)
at 
org.apache.velocity.runtime.parser.node.ASTDirective.render(ASTDirective.java:175)
at 
org.apache.velocity.runtime.parser.node.SimpleNode.render(SimpleNode.java:336)
at 
org.apache.velocity.runtime.RuntimeInstance.render(RuntimeInstance.java:1276)
at 
org.apache.velocity.runtime.RuntimeInstance.evaluate(RuntimeInstance.java:1215)
at 
org.apache.velocity.runtime.RuntimeInstance.evaluate(RuntimeInstance.java:1164)
at 
org.apache.velocity.app.VelocityEngine.evaluate(VelocityEngine.java:219)
at 
com.atlassian.velocity.DefaultVelocityManager.writeEncodedBodyForContent(DefaultVelocityManager.java:86)
at 
com.atlassian.jira.template.velocity.DefaultVelocityTemplatingEngine$DefaultRenderRequest.toWriterImpl(DefaultVelocityTemplatingEngine.java:129)
at 
com.atlassian.jira.template.velocity.DefaultVelocityTemplatingEngine$DefaultRenderRequest.asPlainText(DefaultVelocityTemplatingEngine.java:108)
at 
com.atlassian.jira.template.velocity.DefaultVelocityTemplatingEngine$DefaultRenderRequest$1.with(DefaultVelocityTemplatingEngine.java:92)
at 
com.atlassian.jira.template.velocity.DefaultVelocityTemplatingEngine$DefaultRenderRequest$StringRepresentation.toString(DefaultVelocityTemplatingEngine.java:77)
at 
com.atlassian.jira.template.velocity.DefaultVelocityTemplatingEngine$DefaultRenderRequest.asPlainText(DefaultVelocityTemplatingEngine.java:94)
at 
com.atlassian.jira.mail.builder.EmailRenderer.renderEmailBody(EmailRenderer.java:106)
at 
com.atlassian.jira.mail.builder.EmailRenderer.render(EmailRenderer.java:150)
at 
com.atlassian.jira.mail.builder.EmailBuilder.renderNow(EmailBuilder.java:155)
at 
com.atlassian.jira.mail.builder.EmailBuilder.renderNowAsQueueItem(EmailBuilder.java:145)
at 
com.atlassian.jira.mail.MailingListCompiler$1.evaluateEmailForRecipient(MailingListCompiler.java:336)
at 
com.atlassian.jira.mail.NotificationRecipientProcessor.evaluateEmails(NotificationRecipientProcessor.java:39)
at 
com.atlassian.jira.mail.MailingListCompiler.evaluateEmails(MailingListCompiler.java:267)
at 
com.atlassian.jira.mail.MailingListCompiler.access$300(MailingListCompiler.java:49)
at 
com.atlassian.jira.mail.MailingListCompiler$NotificationCompiler.lambda$addEmailsToQueue$0(MailingListCompiler.java:466)
at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at 
java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1746)
at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at 
java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
at 
com.atlassian.jira.mail.MailingListCompiler$NotificationCompiler.addEmailsToQueue(MailingListCompiler.java:468)
at 
com.atlassian.jira.mail.MailingListCompiler$NotificationCompiler.sendLists(MailingListCompiler.java:438)
at 
com.atlassian.jira.mail.MailingListCompiler$NotificationCompiler.sendNoLevelsIgnoreGroup(MailingListCompiler.java:414)
at 
com

[jira] [Updated] (BEAM-10602) Display Python streaming metrics in Grafana dashboard

2020-07-29 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10602:
--
Attachment: screenshot-1.png

> Display Python streaming metrics in Grafana dashboard
> -
>
> Key: BEAM-10602
> URL: https://issues.apache.org/jira/browse/BEAM-10602
> Project: Beam
>  Issue Type: Task
>  Components: build-system, testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Attachments: screenshot-1.png
>
>
> The Grafana dashboard is currently missing Python load test metrics: 
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
>  !image-2020-07-29-17-19-43-696.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10542) Investigate a possible Nexmark performance regression around 06/16

2020-07-28 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166455#comment-17166455
 ] 

Maximilian Michels commented on BEAM-10542:
---

Awesome, thanks! Didn't see the ticket is already marked as resolved.

> Investigate a possible Nexmark performance regression around 06/16  
> 
>
> Key: BEAM-10542
> URL: https://issues.apache.org/jira/browse/BEAM-10542
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Valentyn Tymofieiev
>Assignee: Damian Gadomski
>Priority: P2
> Attachments: image-2020-07-22-12-56-30-138.png, nexmark.png
>
>
> There is a jump in benchmark metric visible on the Dashboard:
> http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1&from=now-90d&to=now



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10542) Investigate a possible Nexmark performance regression around 06/16

2020-07-28 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166437#comment-17166437
 ] 

Maximilian Michels commented on BEAM-10542:
---

Thanks everyone for the investigation here. Great findings. Are we sure that 
the regression is caused by the Jenkins migration?

> Investigate a possible Nexmark performance regression around 06/16  
> 
>
> Key: BEAM-10542
> URL: https://issues.apache.org/jira/browse/BEAM-10542
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Valentyn Tymofieiev
>Assignee: Damian Gadomski
>Priority: P2
> Attachments: image-2020-07-22-12-56-30-138.png, nexmark.png
>
>
> There is a jump in benchmark metric visible on the Dashboard:
> http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1&from=now-90d&to=now



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8742) Add stateful processing to ParDo load test

2020-07-28 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166436#comment-17166436
 ] 

Maximilian Michels commented on BEAM-8742:
--

[~kasiak] We wanted to have support for streaming which we can't have until SDF 
is implemented in streaming mode. The StatefulLoadGenerator uses timers to 
generate the load which work in streaming.

> Add stateful processing to ParDo load test
> --
>
> Key: BEAM-8742
> URL: https://issues.apache.org/jira/browse/BEAM-8742
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> So far, the ParDo load test is not stateful. We should add a basic counter to 
> test the stateful processing.
> The test should work in streaming mode and with checkpointing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9976) FlinkSavepointTest timeout flake

2020-07-27 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165896#comment-17165896
 ] 

Maximilian Michels commented on BEAM-9976:
--

This scan has the log: 
https://scans.gradle.com/s/jb5tbdedsu5xs/tests/:runners:flink:1.8:test/org.apache.beam.runners.flink.FlinkSavepointTest/testSavepointRestorePortable#1

The problem seems to come from the embedded cluster which does not allow the 
job to start from a savepoint, due to missing resources. The job vertices are 
still in the {{SCHEDULED}} phase (not {{RUNNING}}) and thus can't produce 
output. This may be resolvable by just increasing the timeout: 
https://github.com/apache/beam/pull/12378

> FlinkSavepointTest timeout flake
> 
>
> Key: BEAM-9976
> URL: https://issues.apache.org/jira/browse/BEAM-9976
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Kyle Weaver
>Priority: P1
>  Labels: beam-fixit, flake
> Attachments: multiple_savepoint_tests.png
>
>
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable
> org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.runSavepointAndRestore(FlinkSavepointTest.java:188)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable(FlinkSavepointTest.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9976) FlinkSavepointTest timeout flake

2020-07-27 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165784#comment-17165784
 ] 

Maximilian Michels commented on BEAM-9976:
--

Doesn't seem to be the case because the test is still flaky: 
https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/3042/

> FlinkSavepointTest timeout flake
> 
>
> Key: BEAM-9976
> URL: https://issues.apache.org/jira/browse/BEAM-9976
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Kyle Weaver
>Priority: P1
>  Labels: beam-fixit, flake
> Attachments: multiple_savepoint_tests.png
>
>
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable
> org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.runSavepointAndRestore(FlinkSavepointTest.java:188)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable(FlinkSavepointTest.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10306) Add latency measurement to Python benchmarks

2020-07-27 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10306:
--
Fix Version/s: 2.24.0

> Add latency measurement to Python benchmarks
> 
>
> Key: BEAM-10306
> URL: https://issues.apache.org/jira/browse/BEAM-10306
> Project: Beam
>  Issue Type: Task
>  Components: benchmarking-py, build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> There are currently no latency metrics in the load tests which makes it 
> impossible to monitor latency regressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10306) Add latency measurement to Python benchmarks

2020-07-27 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10306:
--
Status: Resolved  (was: Open)

> Add latency measurement to Python benchmarks
> 
>
> Key: BEAM-10306
> URL: https://issues.apache.org/jira/browse/BEAM-10306
> Project: Beam
>  Issue Type: Task
>  Components: benchmarking-py, build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> There are currently no latency metrics in the load tests which makes it 
> impossible to monitor latency regressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-6868) Flink runner supports Bundle Finalization

2020-07-23 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163591#comment-17163591
 ] 

Maximilian Michels commented on BEAM-6868:
--

Just curious, how could users of the external KafkaIO run into the problem of 
missing BundleFinalize then? It appears that the finalization will still be 
performed, even if you switch of auto commit in KafkaIO.

> Flink runner supports Bundle Finalization
> -
>
> Key: BEAM-6868
> URL: https://issues.apache.org/jira/browse/BEAM-6868
> Project: Beam
>  Issue Type: New Feature
>  Components: cross-language, runner-flink
>Reporter: Boyuan Zhang
>Priority: P1
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Resolved  (was: Open)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Open  (was: Triage Needed)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reopened BEAM-10558:
---

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Fix Version/s: (was: 2.24.0)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Fix Version/s: 2.24.0

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reopened BEAM-10558:
---

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Open  (was: Triage Needed)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Open  (was: Triage Needed)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reopened BEAM-10558:
---

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Resolved  (was: Open)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Closed  (was: Open)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Resolved  (was: Closed)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Resolved  (was: Resolved)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Fix Version/s: 2.24.0

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
> Fix For: 2.24.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Closed  (was: Resolved)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Resolved  (was: Resolved)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-23 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-10558:
--
Status: Resolved  (was: Open)

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9976) FlinkSavepointTest timeout flake

2020-07-22 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163118#comment-17163118
 ] 

Maximilian Michels commented on BEAM-9976:
--

Could be caused by BEAM-10558. We will see after the fix is merged.

> FlinkSavepointTest timeout flake
> 
>
> Key: BEAM-9976
> URL: https://issues.apache.org/jira/browse/BEAM-9976
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Kyle Weaver
>Priority: P1
>  Labels: beam-fixit, flake
> Attachments: multiple_savepoint_tests.png
>
>
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable
> org.junit.runners.model.TestTimedOutException: test timed out after 60 seconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.runSavepointAndRestore(FlinkSavepointTest.java:188)
>   at 
> org.apache.beam.runners.flink.FlinkSavepointTest.testSavepointRestorePortable(FlinkSavepointTest.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-22 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163117#comment-17163117
 ] 

Maximilian Michels commented on BEAM-10558:
---

Could be causing BEAM-9976.

> Flushing of buffered elements during checkpoint can stall
> -
>
> Key: BEAM-10558
> URL: https://issues.apache.org/jira/browse/BEAM-10558
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Elements are buffered during {{DoFnOperator#snapshotState}}, called as part 
> of a Flink checkpoint. This is necessary because flushing out elements in 
> this method call would alter the checkpoint barrier alignment. Optionally, 
> elements can be flushed out before the method call via the 
> {{finishBundleBeforeCheckpointing}} option which is turned off by default 
> because it can affect the checkpoint duration.
> The buffer is flushed as part of starting a new bundle. A problem arises if 
> no new bundle will be started. For example, this can be the case if only a 
> single element (e.g. Impulse) is produced as part of a bundle during 
> checkpointing. Afterwards, when no bundle will be started due to another 
> element arriving, or a timer firing, the element will not be flushed from the 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10558) Flushing of buffered elements during checkpoint can stall

2020-07-22 Thread Maximilian Michels (Jira)

Maximilian Michels created BEAM-10558:
-

 Summary: Flushing of buffered elements during checkpoint can stall
 Key: BEAM-10558
 URL: https://issues.apache.org/jira/browse/BEAM-10558
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Maximilian Michels
Assignee: Maximilian Michels


Elements are buffered during {{DoFnOperator#snapshotState}}, called as part of 
a Flink checkpoint. This is necessary because flushing out elements in this 
method call would alter the checkpoint barrier alignment. Optionally, elements 
can be flushed out before the method call via the 
{{finishBundleBeforeCheckpointing}} option which is turned off by default 
because it can affect the checkpoint duration.

The buffer is flushed as part of starting a new bundle. A problem arises if no 
new bundle will be started. For example, this can be the case if only a single 
element (e.g. Impulse) is produced as part of a bundle during checkpointing. 
Afterwards, when no bundle will be started due to another element arriving, or 
a timer firing, the element will not be flushed from the buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1232 matches

Mail list logo