Re: Monitoring performance for releases

Robert Bradshaw Mon, 03 Aug 2020 13:30:06 -0700

I have to admit I still have some qualms about tying detecting and fixing
performance regressions as part of the release process (which is onerous
enough as it is). Instead, I think we'd be better off with a
separate process to detect and triage performance issues, which, when they
occur, may merit filing a blocker which will require fixing before the
release just like any other blocker would. Hopefully this would result in
issues being detected (and resolved) sooner.


That being said, if a release is known to have performance regressions,
that should be called out when the RCs are cut, and if not resolved,
probably as part of the release notes as well.

On Mon, Aug 3, 2020 at 9:40 AM Maximilian Michels <[email protected]> wrote:

> Here a first version of the updated release guide:
> https://github.com/apache/beam/pull/12455
>
> Feel free to comment.
>
> -Max
>
> On 29.07.20 17:27, Maximilian Michels wrote:
> > Thanks! I'm following up with this PR to display the Flink Pardo
> > streaming data: https://github.com/apache/beam/pull/12408
> >
> > Streaming data appears to be missing for Dataflow. We can revise the
> > Jenkins jobs to add those.
> >
> > -Max
> >
> > On 29.07.20 17:01, Tyson Hamilton wrote:
> >> Max,
> >>
> >> The runner dimension are present when hovering over a particular
> >> graph. For some more info, the load test configurations can be found
> >> here [1]. I didn't get a chance to look into them but there are tests
> >> for all the runners there, possibly not for every loadtest.
> >>
> >> [1]: https://github.com/apache/beam/tree/master/.test-infra/jenkins
> >>
> >> -Tyson
> >>
> >> On Wed, Jul 29, 2020 at 3:46 AM Maximilian Michels <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >>     Looks like the permissions won't be necessary because backup data
> >> gets
> >>     loaded into the local InfluxDb instance which makes writing queries
> >>     locally possible.
> >>
> >>     On 29.07.20 12:21, Maximilian Michels wrote:
> >>      > Thanks Michał!
> >>      >
> >>      > It is a bit tricky to verify the exported query works if I don't
> >>     have
> >>      > access to the data stored in InfluxDb.
> >>      >
> >>      > ==> Could somebody give me permissions to [email protected]
> >>     <mailto:[email protected]> for
> >>      > apache-beam-testing such that I can setup a ssh port-forwarding
> >>     from the
> >>      > InfluxDb pod to my machine? I do have access to see the pods but
> >>     that is
> >>      > not enough.
> >>      >
> >>      >> I think that the only test data is from Python streaming tests,
> >>     which
> >>      >> are not implemented right now (check out
> >>      >>
> >>
> >>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=batch&var-sdk=python
> >>
> >>
> >>      >> )
> >>      >
> >>      > Additionally, there is an entire dimension missing: Runners. I'm
> >>      > assuming this data is for Dataflow?
> >>      >
> >>      > -Max
> >>      >
> >>      > On 29.07.20 11:55, Michał Walenia wrote:
> >>      >> Hi there,
> >>      >>
> >>      >>  > Indeed the Python load test data appears to be missing:
> >>      >>  >
> >>      >>
> >>
> >>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
> >>
> >>
> >>      >>
> >>      >>
> >>      >> I think that the only test data is from Python streaming tests,
> >>     which
> >>      >> are not implemented right now (check out
> >>      >>
> >>
> >>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=batch&var-sdk=python
> >>
> >>
> >>      >> )
> >>      >>
> >>      >> As for updating the dashboards, the manual for doing this is
> >> here:
> >>      >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics#CommunityMetrics-UpdatingDashboards
> >>
> >>
> >>      >>
> >>      >>
> >>      >> I hope this helps,
> >>      >>
> >>      >> Michal
> >>      >>
> >>      >> On Mon, Jul 27, 2020 at 4:31 PM Maximilian Michels
> >>     <[email protected] <mailto:[email protected]>
> >>      >> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>      >>
> >>      >>     Indeed the Python load test data appears to be missing:
> >>      >>
> >>      >>
> >>
> >>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
> >>
> >>
> >>      >>
> >>      >>
> >>      >>     How do we typically modify the dashboards?
> >>      >>
> >>      >>     It looks like we need to edit this json file:
> >>      >>
> >>      >>
> >>
> >>
> https://github.com/apache/beam/blob/8d460db620d2ff1257b0e092218294df15b409a1/.test-infra/metrics/grafana/dashboards/perftests_metrics/ParDo_Load_Tests.json#L81
> >>
> >>
> >>      >>
> >>      >>
> >>      >>     I found some documentation on the deployment:
> >>      >>
> >>      >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/BEAM/Test+Results+Monitoring
> >>      >>
> >>      >>
> >>      >>     +1 for alerting or weekly emails including performance
> >>     numbers for
> >>      >>     fixed
> >>      >>     intervals (1d, 1w, 1m, previous release).
> >>      >>
> >>      >>     +1 for linking the dashboards in the release guide to allow
> >>     for a
> >>      >>     comparison as part of the release process.
> >>      >>
> >>      >>     As a first step, consolidating all the data seems like the
> >> most
> >>      >>     pressing
> >>      >>     problem to solve.
> >>      >>
> >>      >>     @Kamil I could need some advice regarding how to proceed
> >>     updating the
> >>      >>     dashboards.
> >>      >>
> >>      >>     -Max
> >>      >>
> >>      >>     On 22.07.20 20:20, Robert Bradshaw wrote:
> >>      >>      > On Tue, Jul 21, 2020 at 9:58 AM Thomas Weise
> >>     <[email protected] <mailto:[email protected]>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>>
> >>      >>      > <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >>      >>      >
> >>      >>      >     It appears that there is coverage missing in the
> >> Grafana
> >>      >>     dashboards
> >>      >>      >     (it could also be that I just don't find it).
> >>      >>      >
> >>      >>      >     For example:
> >>      >>      >
> >>      >>
> >>      >>
> >>
> >>
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> >>
> >>
> >>      >>
> >>      >>      >
> >>      >>      >     The GBK and ParDo tests have a selection for {batch,
> >>      >>     streaming} and
> >>      >>      >     SDK. No coverage for streaming and python? There is
> >>     also no
> >>      >>     runner
> >>      >>      >     option currently.
> >>      >>      >
> >>      >>      >     We have seen repeated regressions with streaming,
> >> Python,
> >>      >>     Flink. The
> >>      >>      >     test has been contributed. It would be great if the
> >>     results
> >>      >>     can be
> >>      >>      >     covered as part of release verification.
> >>      >>      >
> >>      >>      >
> >>      >>      > Even better would be if we can use these dashboards (plus
> >>      >>     alerting or
> >>      >>      > similar?) to find issues before release verification.
> >>     It's much
> >>      >>     easier
> >>      >>      > to fix things earlier.
> >>      >>      >
> >>      >>      >
> >>      >>      >     Thomas
> >>      >>      >
> >>      >>      >
> >>      >>      >
> >>      >>      >     On Tue, Jul 21, 2020 at 7:55 AM Kamil Wasilewski
> >>      >>      >     <[email protected]
> >>     <mailto:[email protected]>
> >>      >>     <mailto:[email protected]
> >>     <mailto:[email protected]>>
> >>      >>     <mailto:[email protected]
> >>     <mailto:[email protected]>
> >>      >>     <mailto:[email protected]
> >>     <mailto:[email protected]>>>>
> >>      >>      >     wrote:
> >>      >>      >
> >>      >>      >             The prerequisite is that we have all the
> >>     stats in one
> >>      >>     place.
> >>      >>      >             They seem
> >>      >>      >             to be scattered across
> >>      >> http://metrics.beam.apache.org and
> >>      >>      > https://apache-beam-testing.appspot.com.
> >>      >>      >
> >>      >>      >             Would it be possible to consolidate the two,
> >>     i.e.
> >>      >> use the
> >>      >>      >             Grafana-based
> >>      >>      >             dashboard to load the legacy stats?
> >>      >>      >
> >>      >>      >
> >>      >>      >         I'm pretty sure that all dashboards have been
> >>     moved to
> >>      >>      > http://metrics.beam.apache.org. Let me know if I missed
> >>      >>      >         something during the migration.
> >>      >>      >
> >>      >>      >         I think we should turn off
> >>      >>      > https://apache-beam-testing.appspot.com in the near
> >>     future. New
> >>      >>      >         Grafana-based dashboards have been working
> >>     seamlessly for
> >>      >>     some
> >>      >>      >         time now and there's no point in maintaining the
> >>     older
> >>      >>     solution.
> >>      >>      >         We'd also avoid ambiguity in where the stats
> >>     should be
> >>      >>     looked for.
> >>      >>      >
> >>      >>      >         Kamil
> >>      >>      >
> >>      >>      >         On Tue, Jul 21, 2020 at 4:17 PM Maximilian
> Michels
> >>      >>      >         <[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >>      >>      >
> >>      >>      >              > It doesn't support https. I had to add an
> >>      >> exception to
> >>      >>      >             the HTTPS Everywhere extension for
> >>      >>     "metrics.beam.apache.org <http://metrics.beam.apache.org>
> >>     <http://metrics.beam.apache.org>
> >>      >>      >             <http://metrics.beam.apache.org>".
> >>      >>      >
> >>      >>      >             *facepalm* Thanks Udi! It would always hang
> >> on me
> >>      >>     because I
> >>      >>      >             use HTTPS
> >>      >>      >             Everywhere.
> >>      >>      >
> >>      >>      >              > To be explicit, I am supporting the idea
> of
> >>      >>     reviewing the
> >>      >>      >             release guide but not changing the release
> >>     process
> >>      >>     for the
> >>      >>      >             already in-progress release.
> >>      >>      >
> >>      >>      >             I consider the release guide immutable for
> the
> >>      >>     process of a
> >>      >>      >             release.
> >>      >>      >             Thus, a change to the release guide can only
> >>     affect
> >>      >> new
> >>      >>      >             upcoming
> >>      >>      >             releases, not an in-process release.
> >>      >>      >
> >>      >>      >              > +1 and I think we can also evaluate
> >>     whether flaky
> >>      >>     tests
> >>      >>      >             should be reviewed as release blockers or
> >>     not. Some
> >>      >> flaky
> >>      >>      >             tests would be hiding real issues our users
> >>     could
> >>      >> face.
> >>      >>      >
> >>      >>      >             Flaky tests are also worth to take into
> >>     account when
> >>      >>      >             releasing, but a
> >>      >>      >             little harder to find because may just happen
> >>     to pass
> >>      >>     during
> >>      >>      >             building
> >>      >>      >             the release. It is possible though if we
> >>     strictly
> >>      >> capture
> >>      >>      >             flaky tests
> >>      >>      >             via JIRA and mark them with the Fix Version
> >>     for the
> >>      >>     release.
> >>      >>      >
> >>      >>      >              > We keep accumulating dashboards and
> >>      >>      >              > tests that few people care about, so it is
> >>      >>     probably worth
> >>      >>      >             that we use
> >>      >>      >              > them or get a way to alert us of
> >> regressions
> >>      >>     during the
> >>      >>      >             release cycle
> >>      >>      >              > to catch this even before the RCs.
> >>      >>      >
> >>      >>      >             +1 The release guide should be explicit about
> >>     which
> >>      >>      >             performance test
> >>      >>      >             results to evaluate.
> >>      >>      >
> >>      >>      >             The prerequisite is that we have all the
> >>     stats in one
> >>      >>     place.
> >>      >>      >             They seem
> >>      >>      >             to be scattered across
> >>      >> http://metrics.beam.apache.org and
> >>      >>      > https://apache-beam-testing.appspot.com.
> >>      >>      >
> >>      >>      >             Would it be possible to consolidate the two,
> >>     i.e.
> >>      >> use the
> >>      >>      >             Grafana-based
> >>      >>      >             dashboard to load the legacy stats?
> >>      >>      >
> >>      >>      >             For the evaluation during the release
> >> process, I
> >>      >>     suggest to
> >>      >>      >             use a
> >>      >>      >             standardized set of performance tests for all
> >>      >>     runners, e.g.:
> >>      >>      >
> >>      >>      >             - Nexmark
> >>      >>      >             - ParDo (Classic/Portable)
> >>      >>      >             - GroupByKey
> >>      >>      >             - IO
> >>      >>      >
> >>      >>      >
> >>      >>      >             -Max
> >>      >>      >
> >>      >>      >             On 21.07.20 01:23, Ahmet Altay wrote:
> >>      >>      >              >
> >>      >>      >              > On Mon, Jul 20, 2020 at 3:07 PM Ismaël
> >> Mejía
> >>      >>      >             <[email protected] <mailto:[email protected]
> >
> >>     <mailto:[email protected] <mailto:[email protected]>>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>
> >>      >>      >              > <mailto:[email protected]
> >>     <mailto:[email protected]>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>>
> >>     <mailto:[email protected] <mailto:[email protected]>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>>>>>
> >> wrote:
> >>      >>      >              >
> >>      >>      >              >     +1
> >>      >>      >              >
> >>      >>      >              >     This is not in the release guide and
> >>     we should
> >>      >>      >             probably re evaluate if
> >>      >>      >              >     this should be a release blocking
> >> reason.
> >>      >>      >              >     Of course exceptionally a performance
> >>      >> regression
> >>      >>      >             could be motivated by
> >>      >>      >              >     a correctness fix or a worth refactor,
> >>     so we
> >>      >>     should
> >>      >>      >             consider this.
> >>      >>      >              >
> >>      >>      >              >
> >>      >>      >              > +1 and I think we can also evaluate
> whether
> >>      >>     flaky tests
> >>      >>      >             should be
> >>      >>      >              > reviewed as release blockers or not. Some
> >>     flaky
> >>      >> tests
> >>      >>      >             would be hiding
> >>      >>      >              > real issues our users could face.
> >>      >>      >              >
> >>      >>      >              > To be explicit, I am supporting the idea
> of
> >>      >>     reviewing the
> >>      >>      >             release guide
> >>      >>      >              > but not changing the release process for
> >> the
> >>      >> already
> >>      >>      >             in-progress release.
> >>      >>      >              >
> >>      >>      >              >
> >>      >>      >              >     We have been tracking and fixing
> >>     performance
> >>      >>      >             regressions multiple
> >>      >>      >              >     times found simply by checking the
> >>     nexmark
> >>      >> tests
> >>      >>      >             including on the
> >>      >>      >              >     ongoing 2.23.0 release so value is
> >> there.
> >>      >> Nexmark
> >>      >>      >             does not cover yet
> >>      >>      >              >     python and portable runners so we are
> >>     probably
> >>      >>     still
> >>      >>      >             missing many
> >>      >>      >              >     issues and it is worth to work on
> >>     this. In any
> >>      >>     case
> >>      >>      >             we should probably
> >>      >>      >              >     decide what validations matter. We
> keep
> >>      >>     accumulating
> >>      >>      >             dashboards and
> >>      >>      >              >     tests that few people care about, so
> >> it is
> >>      >>     probably
> >>      >>      >             worth that we use
> >>      >>      >              >     them or get a way to alert us of
> >>     regressions
> >>      >>     during
> >>      >>      >             the release cycle
> >>      >>      >              >     to catch this even before the RCs.
> >>      >>      >              >
> >>      >>      >              >
> >>      >>      >              > I agree. And if we cannot use
> >>     dashboards/tests in a
> >>      >>      >             meaningful way, IMO
> >>      >>      >              > we can remove them. There is not much
> >> value to
> >>      >>     maintain
> >>      >>      >             them if they do
> >>      >>      >              > not provide important signals.
> >>      >>      >              >
> >>      >>      >              >
> >>      >>      >              >     On Fri, Jul 10, 2020 at 9:30 PM Udi
> >> Meiri
> >>      >>      >             <[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>
> >>      >>      >              >     <mailto:[email protected]
> >>     <mailto:[email protected]>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>>
> >>     <mailto:[email protected] <mailto:[email protected]>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>>>>>
> >>      >>      >             wrote:
> >>      >>      >              >      >
> >>      >>      >              >      > On Thu, Jul 9, 2020 at 12:48 PM
> >>     Maximilian
> >>      >>     Michels
> >>      >>      >              >     <[email protected]
> >>     <mailto:[email protected]> <mailto:[email protected]
> >> <mailto:[email protected]>>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>
> >>      >>      >             <mailto:[email protected]
> >>     <mailto:[email protected]> <mailto:[email protected]
> >> <mailto:[email protected]>>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>>> wrote:
> >>      >>      >              >      >>
> >>      >>      >              >      >> Not yet, I just learned about the
> >>      >>     migration to a
> >>      >>      >             new frontend,
> >>      >>      >              >     including
> >>      >>      >              >      >> a new backend (InfluxDB instead of
> >>      >> BigQuery).
> >>      >>      >              >      >>
> >>      >>      >              >      >> >  - Are the metrics available on
> >>      >>      > metrics.beam.apache.org <http://metrics.beam.apache.org>
> >>     <http://metrics.beam.apache.org>
> >>      >>     <http://metrics.beam.apache.org>
> >>      >>      >              >     <http://metrics.beam.apache.org>?
> >>      >>      >              >      >>
> >>      >>      >              >      >> Is http://metrics.beam.apache.org
> >>     online?
> >>      >>     I was
> >>      >>      >             never able to
> >>      >>      >              >     access it.
> >>      >>      >              >      >
> >>      >>      >              >      >
> >>      >>      >              >      > It doesn't support https. I had to
> >>     add an
> >>      >>      >             exception to the HTTPS
> >>      >>      >              >     Everywhere extension for
> >>      >>     "metrics.beam.apache.org <http://metrics.beam.apache.org>
> >>     <http://metrics.beam.apache.org>
> >>      >>      >             <http://metrics.beam.apache.org>
> >>      >>      >              >     <http://metrics.beam.apache.org>".
> >>      >>      >              >      >
> >>      >>      >              >      >>
> >>      >>      >              >      >>
> >>      >>      >              >      >> >  - What is the feature delta
> >>     between
> >>      >> usinig
> >>      >>      >              > metrics.beam.apache.org
> >>     <http://metrics.beam.apache.org>
> >>      >>     <http://metrics.beam.apache.org>
> >>     <http://metrics.beam.apache.org>
> >>      >>      >             <http://metrics.beam.apache.org> (much
> >>      >>      >              >     better UI) and using
> >>      >> apache-beam-testing.appspot.com
> >>     <http://apache-beam-testing.appspot.com>
> >>      >> <http://apache-beam-testing.appspot.com>
> >>      >>      >             <http://apache-beam-testing.appspot.com>
> >>      >>      >              >
> >>  <http://apache-beam-testing.appspot.com>?
> >>      >>      >              >      >>
> >>      >>      >              >      >> AFAIK it is an ongoing migration
> >>     and the
> >>      >> delta
> >>      >>      >             appears to be high.
> >>      >>      >              >      >>
> >>      >>      >              >      >> >  - Can we notice regressions
> >>     faster than
> >>      >>      >             release cadence?
> >>      >>      >              >      >>
> >>      >>      >              >      >> Absolutely! A report with the
> >> latest
> >>      >> numbers
> >>      >>      >             including
> >>      >>      >              >     statistics about
> >>      >>      >              >      >> the growth of metrics would be
> >> useful.
> >>      >>      >              >      >>
> >>      >>      >              >      >> >  - Can we get automated alerts?
> >>      >>      >              >      >>
> >>      >>      >              >      >> I think we could setup a Jenkins
> >>     job to do
> >>      >>     this.
> >>      >>      >              >      >>
> >>      >>      >              >      >> -Max
> >>      >>      >              >      >>
> >>      >>      >              >      >> On 09.07.20 20:26, Kenneth Knowles
> >>     wrote:
> >>      >>      >              >      >> > Questions:
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >   - Are the metrics available on
> >>      >>      > metrics.beam.apache.org <http://metrics.beam.apache.org>
> >>     <http://metrics.beam.apache.org>
> >>      >>     <http://metrics.beam.apache.org>
> >>      >>      >              >     <http://metrics.beam.apache.org>
> >>      >>      >              >      >> > <http://metrics.beam.apache.org
> >?
> >>      >>      >              >      >> >   - What is the feature delta
> >>     between
> >>      >> usinig
> >>      >>      >              > metrics.beam.apache.org
> >>     <http://metrics.beam.apache.org>
> >>      >>     <http://metrics.beam.apache.org>
> >>     <http://metrics.beam.apache.org>
> >>      >>      >             <http://metrics.beam.apache.org>
> >>      >>      >              >      >> > <http://metrics.beam.apache.org
> >
> >>     (much
> >>      >>     better
> >>      >>      >             UI) and using
> >>      >>      >              >      >> > apache-beam-testing.appspot.com
> >>     <http://apache-beam-testing.appspot.com>
> >>      >>     <http://apache-beam-testing.appspot.com>
> >>      >>      >             <http://apache-beam-testing.appspot.com>
> >>      >>      >              >
> >>  <http://apache-beam-testing.appspot.com>
> >>      >>      >              >
> >>  <http://apache-beam-testing.appspot.com>?
> >>      >>      >              >      >> >   - Can we notice regressions
> >>     faster than
> >>      >>      >             release cadence?
> >>      >>      >              >      >> >   - Can we get automated alerts?
> >>      >>      >              >      >> >
> >>      >>      >              >      >> > Kenn
> >>      >>      >              >      >> >
> >>      >>      >              >      >> > On Thu, Jul 9, 2020 at 10:21 AM
> >>      >>     Maximilian Michels
> >>      >>      >              >     <[email protected]
> >>     <mailto:[email protected]> <mailto:[email protected]
> >> <mailto:[email protected]>>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>
> >>      >>      >             <mailto:[email protected]
> >>     <mailto:[email protected]> <mailto:[email protected]
> >> <mailto:[email protected]>>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>>
> >>      >>      >              >      >> > <mailto:[email protected]
> >>     <mailto:[email protected]>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>>
> >>     <mailto:[email protected] <mailto:[email protected]>
> >>      >> <mailto:[email protected] <mailto:[email protected]>>>
> >>      >>      >             <mailto:[email protected]
> >>     <mailto:[email protected]> <mailto:[email protected]
> >> <mailto:[email protected]>>
> >>      >>     <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>>>> wrote:
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >     Hi,
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >     We recently saw an
> >> increase in
> >>      >> latency
> >>      >>      >             migrating from Beam
> >>      >>      >              >     2.18.0 to
> >>      >>      >              >      >> >     2.21.0 (Python SDK with
> Flink
> >>      >>     Runner). This
> >>      >>      >             proofed very
> >>      >>      >              >     hard to debug
> >>      >>      >              >      >> >     and it looks like each
> >>     version in
> >>      >>     between
> >>      >>      >             the two versions
> >>      >>      >              >     let to
> >>      >>      >              >      >> >     increased latency.
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >     This is not the first time
> >>     we saw
> >>      >> issues
> >>      >>      >             when migrating,
> >>      >>      >              >     another
> >>      >>      >              >      >> >     time we
> >>      >>      >              >      >> >     had a decline in
> >> checkpointing
> >>      >>     performance
> >>      >>      >             and thus added a
> >>      >>      >              >      >> >     checkpointing test [1] and
> >>     dashboard
> >>      >>     [2] (see
> >>      >>      >              >     checkpointing widget).
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >     That makes me wonder if we
> >>     should
> >>      >>     monitor
> >>      >>      >             performance
> >>      >>      >              >     (throughput /
> >>      >>      >              >      >> >     latency) for basic use cases
> >>     as part
> >>      >>     of the
> >>      >>      >             release
> >>      >>      >              >     testing. Currently,
> >>      >>      >              >      >> >     our release guide [3]
> >> mentions
> >>      >> running
> >>      >>      >             examples but not
> >>      >>      >              >     evaluating the
> >>      >>      >              >      >> >     performance. I think it
> >>     would be good
> >>      >>      >             practice to check
> >>      >>      >              >     relevant charts
> >>      >>      >              >      >> >     with performance
> >> measurements as
> >>      >>     part of of
> >>      >>      >             the release
> >>      >>      >              >     process. The
> >>      >>      >              >      >> >     release guide should reflect
> >>     that.
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >     WDYT?
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >     -Max
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >     PS: Of course, this requires
> >>     tests
> >>      >> and
> >>      >>      >             metrics to be
> >>      >>      >              >     available. This PR
> >>      >>      >              >      >> >     adds latency measurements
> >> to the
> >>      >>     load tests
> >>      >>      >             [4].
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >
> >>      >>      >              >      >> >     [1]
> >>      >> https://github.com/apache/beam/pull/11558
> >>      >>      >              >      >> >     [2]
> >>      >>      >              >      >> >
> >>      >>      >              >
> >>      >>      >
> >>      >>
> >>      >>
> >>
> >>
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> >>
> >>
> >>      >>
> >>      >>      >              >      >> >     [3]
> >>      >>      > https://beam.apache.org/contribute/release-guide/
> >>      >>      >              >      >> >     [4]
> >>      >> https://github.com/apache/beam/pull/12065
> >>      >>      >              >      >> >
> >>      >>      >              >
> >>      >>      >
> >>      >>
> >>      >>
> >>      >>
> >>      >> --
> >>      >>
> >>      >> Michał Walenia
> >>      >> Polidea <https://www.polidea.com/> | Software Engineer
> >>      >>
> >>      >> M: +48 791 432 002 <+48%20791%20432%20002>
> <tel:+48%20791%20432%20002> <tel:+48791432002 <+48%20791%20432%20002>
> >>     <tel:+48%20791%20432%20002>>
> >>      >> E: [email protected]
> >>     <mailto:[email protected]>
> >>     <mailto:[email protected]
> >> <mailto:[email protected]>>
> >>      >>
> >>      >> Unique Tech
> >>      >> Check out our projects! <https://www.polidea.com/our-work>
> >>      >>
> >>
>

Re: Monitoring performance for releases

Reply via email to