Re: Monitoring performance for releases

Tyson Hamilton Wed, 29 Jul 2020 08:02:24 -0700

Max,

The runner dimension are present when hovering over a particular graph. For
some more info, the load test configurations can be found here [1]. I
didn't get a chance to look into them but there are tests for all the
runners there, possibly not for every loadtest.


[1]: https://github.com/apache/beam/tree/master/.test-infra/jenkins

-Tyson

On Wed, Jul 29, 2020 at 3:46 AM Maximilian Michels <[email protected]> wrote:

> Looks like the permissions won't be necessary because backup data gets
> loaded into the local InfluxDb instance which makes writing queries
> locally possible.
>
> On 29.07.20 12:21, Maximilian Michels wrote:
> > Thanks Michał!
> >
> > It is a bit tricky to verify the exported query works if I don't have
> > access to the data stored in InfluxDb.
> >
> > ==> Could somebody give me permissions to [email protected] for
> > apache-beam-testing such that I can setup a ssh port-forwarding from the
> > InfluxDb pod to my machine? I do have access to see the pods but that is
> > not enough.
> >
> >> I think that the only test data is from Python streaming tests, which
> >> are not implemented right now (check out
> >>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=batch&var-sdk=python
> >> )
> >
> > Additionally, there is an entire dimension missing: Runners. I'm
> > assuming this data is for Dataflow?
> >
> > -Max
> >
> > On 29.07.20 11:55, Michał Walenia wrote:
> >> Hi there,
> >>
> >>  > Indeed the Python load test data appears to be missing:
> >>  >
> >>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
> >>
> >>
> >> I think that the only test data is from Python streaming tests, which
> >> are not implemented right now (check out
> >>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=batch&var-sdk=python
> >> )
> >>
> >> As for updating the dashboards, the manual for doing this is here:
> >>
> https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics#CommunityMetrics-UpdatingDashboards
> >>
> >>
> >> I hope this helps,
> >>
> >> Michal
> >>
> >> On Mon, Jul 27, 2020 at 4:31 PM Maximilian Michels <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >>     Indeed the Python load test data appears to be missing:
> >>
> >>
> http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python
> >>
> >>
> >>     How do we typically modify the dashboards?
> >>
> >>     It looks like we need to edit this json file:
> >>
> >>
> https://github.com/apache/beam/blob/8d460db620d2ff1257b0e092218294df15b409a1/.test-infra/metrics/grafana/dashboards/perftests_metrics/ParDo_Load_Tests.json#L81
> >>
> >>
> >>     I found some documentation on the deployment:
> >>
> >>
> https://cwiki.apache.org/confluence/display/BEAM/Test+Results+Monitoring
> >>
> >>
> >>     +1 for alerting or weekly emails including performance numbers for
> >>     fixed
> >>     intervals (1d, 1w, 1m, previous release).
> >>
> >>     +1 for linking the dashboards in the release guide to allow for a
> >>     comparison as part of the release process.
> >>
> >>     As a first step, consolidating all the data seems like the most
> >>     pressing
> >>     problem to solve.
> >>
> >>     @Kamil I could need some advice regarding how to proceed updating
> the
> >>     dashboards.
> >>
> >>     -Max
> >>
> >>     On 22.07.20 20:20, Robert Bradshaw wrote:
> >>      > On Tue, Jul 21, 2020 at 9:58 AM Thomas Weise <[email protected]
> >>     <mailto:[email protected]>
> >>      > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>      >
> >>      >     It appears that there is coverage missing in the Grafana
> >>     dashboards
> >>      >     (it could also be that I just don't find it).
> >>      >
> >>      >     For example:
> >>      >
> >>
> >>
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> >>
> >>      >
> >>      >     The GBK and ParDo tests have a selection for {batch,
> >>     streaming} and
> >>      >     SDK. No coverage for streaming and python? There is also no
> >>     runner
> >>      >     option currently.
> >>      >
> >>      >     We have seen repeated regressions with streaming, Python,
> >>     Flink. The
> >>      >     test has been contributed. It would be great if the results
> >>     can be
> >>      >     covered as part of release verification.
> >>      >
> >>      >
> >>      > Even better would be if we can use these dashboards (plus
> >>     alerting or
> >>      > similar?) to find issues before release verification. It's much
> >>     easier
> >>      > to fix things earlier.
> >>      >
> >>      >
> >>      >     Thomas
> >>      >
> >>      >
> >>      >
> >>      >     On Tue, Jul 21, 2020 at 7:55 AM Kamil Wasilewski
> >>      >     <[email protected]
> >>     <mailto:[email protected]>
> >>     <mailto:[email protected]
> >>     <mailto:[email protected]>>>
> >>      >     wrote:
> >>      >
> >>      >             The prerequisite is that we have all the stats in one
> >>     place.
> >>      >             They seem
> >>      >             to be scattered across
> >> http://metrics.beam.apache.org and
> >>      > https://apache-beam-testing.appspot.com.
> >>      >
> >>      >             Would it be possible to consolidate the two, i.e.
> >> use the
> >>      >             Grafana-based
> >>      >             dashboard to load the legacy stats?
> >>      >
> >>      >
> >>      >         I'm pretty sure that all dashboards have been moved to
> >>      > http://metrics.beam.apache.org. Let me know if I missed
> >>      >         something during the migration.
> >>      >
> >>      >         I think we should turn off
> >>      > https://apache-beam-testing.appspot.com in the near future. New
> >>      >         Grafana-based dashboards have been working seamlessly for
> >>     some
> >>      >         time now and there's no point in maintaining the older
> >>     solution.
> >>      >         We'd also avoid ambiguity in where the stats should be
> >>     looked for.
> >>      >
> >>      >         Kamil
> >>      >
> >>      >         On Tue, Jul 21, 2020 at 4:17 PM Maximilian Michels
> >>      >         <[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>      >
> >>      >              > It doesn't support https. I had to add an
> >> exception to
> >>      >             the HTTPS Everywhere extension for
> >>     "metrics.beam.apache.org <http://metrics.beam.apache.org>
> >>      >             <http://metrics.beam.apache.org>".
> >>      >
> >>      >             *facepalm* Thanks Udi! It would always hang on me
> >>     because I
> >>      >             use HTTPS
> >>      >             Everywhere.
> >>      >
> >>      >              > To be explicit, I am supporting the idea of
> >>     reviewing the
> >>      >             release guide but not changing the release process
> >>     for the
> >>      >             already in-progress release.
> >>      >
> >>      >             I consider the release guide immutable for the
> >>     process of a
> >>      >             release.
> >>      >             Thus, a change to the release guide can only affect
> >> new
> >>      >             upcoming
> >>      >             releases, not an in-process release.
> >>      >
> >>      >              > +1 and I think we can also evaluate whether flaky
> >>     tests
> >>      >             should be reviewed as release blockers or not. Some
> >> flaky
> >>      >             tests would be hiding real issues our users could
> >> face.
> >>      >
> >>      >             Flaky tests are also worth to take into account when
> >>      >             releasing, but a
> >>      >             little harder to find because may just happen to pass
> >>     during
> >>      >             building
> >>      >             the release. It is possible though if we strictly
> >> capture
> >>      >             flaky tests
> >>      >             via JIRA and mark them with the Fix Version for the
> >>     release.
> >>      >
> >>      >              > We keep accumulating dashboards and
> >>      >              > tests that few people care about, so it is
> >>     probably worth
> >>      >             that we use
> >>      >              > them or get a way to alert us of regressions
> >>     during the
> >>      >             release cycle
> >>      >              > to catch this even before the RCs.
> >>      >
> >>      >             +1 The release guide should be explicit about which
> >>      >             performance test
> >>      >             results to evaluate.
> >>      >
> >>      >             The prerequisite is that we have all the stats in one
> >>     place.
> >>      >             They seem
> >>      >             to be scattered across
> >> http://metrics.beam.apache.org and
> >>      > https://apache-beam-testing.appspot.com.
> >>      >
> >>      >             Would it be possible to consolidate the two, i.e.
> >> use the
> >>      >             Grafana-based
> >>      >             dashboard to load the legacy stats?
> >>      >
> >>      >             For the evaluation during the release process, I
> >>     suggest to
> >>      >             use a
> >>      >             standardized set of performance tests for all
> >>     runners, e.g.:
> >>      >
> >>      >             - Nexmark
> >>      >             - ParDo (Classic/Portable)
> >>      >             - GroupByKey
> >>      >             - IO
> >>      >
> >>      >
> >>      >             -Max
> >>      >
> >>      >             On 21.07.20 01:23, Ahmet Altay wrote:
> >>      >              >
> >>      >              > On Mon, Jul 20, 2020 at 3:07 PM Ismaël Mejía
> >>      >             <[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>
> >>      >              > <mailto:[email protected]
> >>     <mailto:[email protected]> <mailto:[email protected]
> >>     <mailto:[email protected]>>>> wrote:
> >>      >              >
> >>      >              >     +1
> >>      >              >
> >>      >              >     This is not in the release guide and we should
> >>      >             probably re evaluate if
> >>      >              >     this should be a release blocking reason.
> >>      >              >     Of course exceptionally a performance
> >> regression
> >>      >             could be motivated by
> >>      >              >     a correctness fix or a worth refactor, so we
> >>     should
> >>      >             consider this.
> >>      >              >
> >>      >              >
> >>      >              > +1 and I think we can also evaluate whether
> >>     flaky tests
> >>      >             should be
> >>      >              > reviewed as release blockers or not. Some flaky
> >> tests
> >>      >             would be hiding
> >>      >              > real issues our users could face.
> >>      >              >
> >>      >              > To be explicit, I am supporting the idea of
> >>     reviewing the
> >>      >             release guide
> >>      >              > but not changing the release process for the
> >> already
> >>      >             in-progress release.
> >>      >              >
> >>      >              >
> >>      >              >     We have been tracking and fixing performance
> >>      >             regressions multiple
> >>      >              >     times found simply by checking the nexmark
> >> tests
> >>      >             including on the
> >>      >              >     ongoing 2.23.0 release so value is there.
> >> Nexmark
> >>      >             does not cover yet
> >>      >              >     python and portable runners so we are probably
> >>     still
> >>      >             missing many
> >>      >              >     issues and it is worth to work on this. In any
> >>     case
> >>      >             we should probably
> >>      >              >     decide what validations matter. We keep
> >>     accumulating
> >>      >             dashboards and
> >>      >              >     tests that few people care about, so it is
> >>     probably
> >>      >             worth that we use
> >>      >              >     them or get a way to alert us of regressions
> >>     during
> >>      >             the release cycle
> >>      >              >     to catch this even before the RCs.
> >>      >              >
> >>      >              >
> >>      >              > I agree. And if we cannot use dashboards/tests in
> a
> >>      >             meaningful way, IMO
> >>      >              > we can remove them. There is not much value to
> >>     maintain
> >>      >             them if they do
> >>      >              > not provide important signals.
> >>      >              >
> >>      >              >
> >>      >              >     On Fri, Jul 10, 2020 at 9:30 PM Udi Meiri
> >>      >             <[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>
> >>      >              >     <mailto:[email protected]
> >>     <mailto:[email protected]> <mailto:[email protected]
> >>     <mailto:[email protected]>>>>
> >>      >             wrote:
> >>      >              >      >
> >>      >              >      > On Thu, Jul 9, 2020 at 12:48 PM Maximilian
> >>     Michels
> >>      >              >     <[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>
> >>      >             <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >>      >              >      >>
> >>      >              >      >> Not yet, I just learned about the
> >>     migration to a
> >>      >             new frontend,
> >>      >              >     including
> >>      >              >      >> a new backend (InfluxDB instead of
> >> BigQuery).
> >>      >              >      >>
> >>      >              >      >> >  - Are the metrics available on
> >>      > metrics.beam.apache.org <http://metrics.beam.apache.org>
> >>     <http://metrics.beam.apache.org>
> >>      >              >     <http://metrics.beam.apache.org>?
> >>      >              >      >>
> >>      >              >      >> Is http://metrics.beam.apache.org online?
> >>     I was
> >>      >             never able to
> >>      >              >     access it.
> >>      >              >      >
> >>      >              >      >
> >>      >              >      > It doesn't support https. I had to add an
> >>      >             exception to the HTTPS
> >>      >              >     Everywhere extension for
> >>     "metrics.beam.apache.org <http://metrics.beam.apache.org>
> >>      >             <http://metrics.beam.apache.org>
> >>      >              >     <http://metrics.beam.apache.org>".
> >>      >              >      >
> >>      >              >      >>
> >>      >              >      >>
> >>      >              >      >> >  - What is the feature delta between
> >> usinig
> >>      >              > metrics.beam.apache.org
> >>     <http://metrics.beam.apache.org> <http://metrics.beam.apache.org>
> >>      >             <http://metrics.beam.apache.org> (much
> >>      >              >     better UI) and using
> >>     apache-beam-testing.appspot.com
> >> <http://apache-beam-testing.appspot.com>
> >>      >             <http://apache-beam-testing.appspot.com>
> >>      >              >     <http://apache-beam-testing.appspot.com>?
> >>      >              >      >>
> >>      >              >      >> AFAIK it is an ongoing migration and the
> >> delta
> >>      >             appears to be high.
> >>      >              >      >>
> >>      >              >      >> >  - Can we notice regressions faster than
> >>      >             release cadence?
> >>      >              >      >>
> >>      >              >      >> Absolutely! A report with the latest
> >> numbers
> >>      >             including
> >>      >              >     statistics about
> >>      >              >      >> the growth of metrics would be useful.
> >>      >              >      >>
> >>      >              >      >> >  - Can we get automated alerts?
> >>      >              >      >>
> >>      >              >      >> I think we could setup a Jenkins job to do
> >>     this.
> >>      >              >      >>
> >>      >              >      >> -Max
> >>      >              >      >>
> >>      >              >      >> On 09.07.20 20:26, Kenneth Knowles wrote:
> >>      >              >      >> > Questions:
> >>      >              >      >> >
> >>      >              >      >> >   - Are the metrics available on
> >>      > metrics.beam.apache.org <http://metrics.beam.apache.org>
> >>     <http://metrics.beam.apache.org>
> >>      >              >     <http://metrics.beam.apache.org>
> >>      >              >      >> > <http://metrics.beam.apache.org>?
> >>      >              >      >> >   - What is the feature delta between
> >> usinig
> >>      >              > metrics.beam.apache.org
> >>     <http://metrics.beam.apache.org> <http://metrics.beam.apache.org>
> >>      >             <http://metrics.beam.apache.org>
> >>      >              >      >> > <http://metrics.beam.apache.org> (much
> >>     better
> >>      >             UI) and using
> >>      >              >      >> > apache-beam-testing.appspot.com
> >>     <http://apache-beam-testing.appspot.com>
> >>      >             <http://apache-beam-testing.appspot.com>
> >>      >              >     <http://apache-beam-testing.appspot.com>
> >>      >              >     <http://apache-beam-testing.appspot.com>?
> >>      >              >      >> >   - Can we notice regressions faster
> than
> >>      >             release cadence?
> >>      >              >      >> >   - Can we get automated alerts?
> >>      >              >      >> >
> >>      >              >      >> > Kenn
> >>      >              >      >> >
> >>      >              >      >> > On Thu, Jul 9, 2020 at 10:21 AM
> >>     Maximilian Michels
> >>      >              >     <[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>
> >>      >             <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>
> >>      >              >      >> > <mailto:[email protected]
> >>     <mailto:[email protected]> <mailto:[email protected]
> >> <mailto:[email protected]>>
> >>      >             <mailto:[email protected] <mailto:[email protected]>
> >>     <mailto:[email protected] <mailto:[email protected]>>>>> wrote:
> >>      >              >      >> >
> >>      >              >      >> >     Hi,
> >>      >              >      >> >
> >>      >              >      >> >     We recently saw an increase in
> >> latency
> >>      >             migrating from Beam
> >>      >              >     2.18.0 to
> >>      >              >      >> >     2.21.0 (Python SDK with Flink
> >>     Runner). This
> >>      >             proofed very
> >>      >              >     hard to debug
> >>      >              >      >> >     and it looks like each version in
> >>     between
> >>      >             the two versions
> >>      >              >     let to
> >>      >              >      >> >     increased latency.
> >>      >              >      >> >
> >>      >              >      >> >     This is not the first time we saw
> >> issues
> >>      >             when migrating,
> >>      >              >     another
> >>      >              >      >> >     time we
> >>      >              >      >> >     had a decline in checkpointing
> >>     performance
> >>      >             and thus added a
> >>      >              >      >> >     checkpointing test [1] and dashboard
> >>     [2] (see
> >>      >              >     checkpointing widget).
> >>      >              >      >> >
> >>      >              >      >> >     That makes me wonder if we should
> >>     monitor
> >>      >             performance
> >>      >              >     (throughput /
> >>      >              >      >> >     latency) for basic use cases as part
> >>     of the
> >>      >             release
> >>      >              >     testing. Currently,
> >>      >              >      >> >     our release guide [3] mentions
> >> running
> >>      >             examples but not
> >>      >              >     evaluating the
> >>      >              >      >> >     performance. I think it would be
> good
> >>      >             practice to check
> >>      >              >     relevant charts
> >>      >              >      >> >     with performance measurements as
> >>     part of of
> >>      >             the release
> >>      >              >     process. The
> >>      >              >      >> >     release guide should reflect that.
> >>      >              >      >> >
> >>      >              >      >> >     WDYT?
> >>      >              >      >> >
> >>      >              >      >> >     -Max
> >>      >              >      >> >
> >>      >              >      >> >     PS: Of course, this requires tests
> >> and
> >>      >             metrics to be
> >>      >              >     available. This PR
> >>      >              >      >> >     adds latency measurements to the
> >>     load tests
> >>      >             [4].
> >>      >              >      >> >
> >>      >              >      >> >
> >>      >              >      >> >     [1]
> >>     https://github.com/apache/beam/pull/11558
> >>      >              >      >> >     [2]
> >>      >              >      >> >
> >>      >              >
> >>      >
> >>
> >>
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
> >>
> >>      >              >      >> >     [3]
> >>      > https://beam.apache.org/contribute/release-guide/
> >>      >              >      >> >     [4]
> >>     https://github.com/apache/beam/pull/12065
> >>      >              >      >> >
> >>      >              >
> >>      >
> >>
> >>
> >>
> >> --
> >>
> >> Michał Walenia
> >> Polidea <https://www.polidea.com/> | Software Engineer
> >>
> >> M: +48 791 432 002 <+48%20791%20432%20002> <tel:+48791432002
> <+48%20791%20432%20002>>
> >> E: [email protected] <mailto:[email protected]>
> >>
> >> Unique Tech
> >> Check out our projects! <https://www.polidea.com/our-work>
> >>
>

Re: Monitoring performance for releases

Reply via email to