Hi,
We recently saw an increase in latency migrating from Beam 2.18.0 to
2.21.0 (Python SDK with Flink Runner). This proofed very hard to debug
and it looks like each version in between the two versions let to
increased latency.
This is not the first time we saw issues when migrating, another time we
had a decline in checkpointing performance and thus added a
checkpointing test [1] and dashboard [2] (see checkpointing widget).
That makes me wonder if we should monitor performance (throughput /
latency) for basic use cases as part of the release testing. Currently,
our release guide [3] mentions running examples but not evaluating the
performance. I think it would be good practice to check relevant charts
with performance measurements as part of of the release process. The
release guide should reflect that.
WDYT?
-Max
PS: Of course, this requires tests and metrics to be available. This PR
adds latency measurements to the load tests [4].
[1] https://github.com/apache/beam/pull/11558
[2]
https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
[3] https://beam.apache.org/contribute/release-guide/
[4] https://github.com/apache/beam/pull/12065