Hi,

We recently saw an increase in latency migrating from Beam 2.18.0 to 2.21.0 (Python SDK with Flink Runner). This proofed very hard to debug and it looks like each version in between the two versions let to increased latency.

This is not the first time we saw issues when migrating, another time we had a decline in checkpointing performance and thus added a checkpointing test [1] and dashboard [2] (see checkpointing widget).

That makes me wonder if we should monitor performance (throughput / latency) for basic use cases as part of the release testing. Currently, our release guide [3] mentions running examples but not evaluating the performance. I think it would be good practice to check relevant charts with performance measurements as part of of the release process. The release guide should reflect that.

WDYT?

-Max

PS: Of course, this requires tests and metrics to be available. This PR adds latency measurements to the load tests [4].


[1] https://github.com/apache/beam/pull/11558
[2] https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
[3] https://beam.apache.org/contribute/release-guide/
[4] https://github.com/apache/beam/pull/12065

Reply via email to