Re: Monitoring performance for releases

Maximilian Michels Mon, 03 Aug 2020 09:40:29 -0700

Here a first version of the updated release guide:https://github.com/apache/beam/pull/12455


Feel free to comment.


-Max

On 29.07.20 17:27, Maximilian Michels wrote:

Thanks! I'm following up with this PR to display the Flink Pardostreaming data: https://github.com/apache/beam/pull/12408

Streaming data appears to be missing for Dataflow. We can revise theJenkins jobs to add those.


-Max

On 29.07.20 17:01, Tyson Hamilton wrote:

Max,

The runner dimension are present when hovering over a particulargraph. For some more info, the load test configurations can be foundhere [1]. I didn't get a chance to look into them but there are testsfor all the runners there, possibly not for every loadtest.


[1]: https://github.com/apache/beam/tree/master/.test-infra/jenkins

-Tyson

On Wed, Jul 29, 2020 at 3:46 AM Maximilian Michels <[email protected]<mailto:[email protected]>> wrote:

Looks like the permissions won't be necessary because backup datagets

    loaded into the local InfluxDb instance which makes writing queries
    locally possible.

    On 29.07.20 12:21, Maximilian Michels wrote:
     > Thanks Michał!
     >
     > It is a bit tricky to verify the exported query works if I don't
    have
     > access to the data stored in InfluxDb.
     >
     > ==> Could somebody give me permissions to [email protected]
    <mailto:[email protected]> for
     > apache-beam-testing such that I can setup a ssh port-forwarding
    from the
     > InfluxDb pod to my machine? I do have access to see the pods but
    that is
     > not enough.
     >
     >> I think that the only test data is from Python streaming tests,
    which
     >> are not implemented right now (check out
     >>

http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=batch&var-sdk=python


     >> )
     >
     > Additionally, there is an entire dimension missing: Runners. I'm
     > assuming this data is for Dataflow?
     >
     > -Max
     >
     > On 29.07.20 11:55, Michał Walenia wrote:
     >> Hi there,
     >>
     >>  > Indeed the Python load test data appears to be missing:
     >>  >
     >>

http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python


     >>
     >>
     >> I think that the only test data is from Python streaming tests,
    which
     >> are not implemented right now (check out
     >>

http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=batch&var-sdk=python


     >> )
     >>

>> As for updating the dashboards, the manual for doing this ishere:

>>

https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics#CommunityMetrics-UpdatingDashboards


     >>
     >>
     >> I hope this helps,
     >>
     >> Michal
     >>
     >> On Mon, Jul 27, 2020 at 4:31 PM Maximilian Michels
    <[email protected] <mailto:[email protected]>
     >> <mailto:[email protected] <mailto:[email protected]>>> wrote:
     >>
     >>     Indeed the Python load test data appears to be missing:
     >>
     >>

http://metrics.beam.apache.org/d/MOi-kf3Zk/pardo-load-tests?orgId=1&var-processingType=streaming&var-sdk=python


     >>
     >>
     >>     How do we typically modify the dashboards?
     >>
     >>     It looks like we need to edit this json file:
     >>
     >>

https://github.com/apache/beam/blob/8d460db620d2ff1257b0e092218294df15b409a1/.test-infra/metrics/grafana/dashboards/perftests_metrics/ParDo_Load_Tests.json#L81


     >>
     >>
     >>     I found some documentation on the deployment:
     >>
     >>

https://cwiki.apache.org/confluence/display/BEAM/Test+Results+Monitoring

     >>
     >>
     >>     +1 for alerting or weekly emails including performance
    numbers for
     >>     fixed
     >>     intervals (1d, 1w, 1m, previous release).
     >>
     >>     +1 for linking the dashboards in the release guide to allow
    for a
     >>     comparison as part of the release process.
     >>

>> As a first step, consolidating all the data seems like themost

     >>     pressing
     >>     problem to solve.
     >>
     >>     @Kamil I could need some advice regarding how to proceed
    updating the
     >>     dashboards.
     >>
     >>     -Max
     >>
     >>     On 22.07.20 20:20, Robert Bradshaw wrote:
     >>      > On Tue, Jul 21, 2020 at 9:58 AM Thomas Weise
    <[email protected] <mailto:[email protected]>
     >>     <mailto:[email protected] <mailto:[email protected]>>
     >>      > <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>> wrote:
     >>      >

>> > It appears that there is coverage missing in theGrafana

     >>     dashboards
     >>      >     (it could also be that I just don't find it).
     >>      >
     >>      >     For example:
     >>      >
     >>
     >>

https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056


     >>
     >>      >
     >>      >     The GBK and ParDo tests have a selection for {batch,
     >>     streaming} and
     >>      >     SDK. No coverage for streaming and python? There is
    also no
     >>     runner
     >>      >     option currently.
     >>      >

>> > We have seen repeated regressions with streaming,Python,

     >>     Flink. The
     >>      >     test has been contributed. It would be great if the
    results
     >>     can be
     >>      >     covered as part of release verification.
     >>      >
     >>      >
     >>      > Even better would be if we can use these dashboards (plus
     >>     alerting or
     >>      > similar?) to find issues before release verification.
    It's much
     >>     easier
     >>      > to fix things earlier.
     >>      >
     >>      >
     >>      >     Thomas
     >>      >
     >>      >
     >>      >
     >>      >     On Tue, Jul 21, 2020 at 7:55 AM Kamil Wasilewski
     >>      >     <[email protected]
    <mailto:[email protected]>
     >>     <mailto:[email protected]
    <mailto:[email protected]>>
     >>     <mailto:[email protected]
    <mailto:[email protected]>
     >>     <mailto:[email protected]
    <mailto:[email protected]>>>>
     >>      >     wrote:
     >>      >
     >>      >             The prerequisite is that we have all the
    stats in one
     >>     place.
     >>      >             They seem
     >>      >             to be scattered across
     >> http://metrics.beam.apache.org and
     >>      > https://apache-beam-testing.appspot.com.
     >>      >
     >>      >             Would it be possible to consolidate the two,
    i.e.
     >> use the
     >>      >             Grafana-based
     >>      >             dashboard to load the legacy stats?
     >>      >
     >>      >
     >>      >         I'm pretty sure that all dashboards have been
    moved to
     >>      > http://metrics.beam.apache.org. Let me know if I missed
     >>      >         something during the migration.
     >>      >
     >>      >         I think we should turn off
     >>      > https://apache-beam-testing.appspot.com in the near
    future. New
     >>      >         Grafana-based dashboards have been working
    seamlessly for
     >>     some
     >>      >         time now and there's no point in maintaining the
    older
     >>     solution.
     >>      >         We'd also avoid ambiguity in where the stats
    should be
     >>     looked for.
     >>      >
     >>      >         Kamil
     >>      >
     >>      >         On Tue, Jul 21, 2020 at 4:17 PM Maximilian Michels
     >>      >         <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
     >>     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>> wrote:
     >>      >
     >>      >              > It doesn't support https. I had to add an
     >> exception to
     >>      >             the HTTPS Everywhere extension for
     >>     "metrics.beam.apache.org <http://metrics.beam.apache.org>
    <http://metrics.beam.apache.org>
     >>      >             <http://metrics.beam.apache.org>".
     >>      >

>> > *facepalm* Thanks Udi! It would always hangon me

     >>     because I
     >>      >             use HTTPS
     >>      >             Everywhere.
     >>      >
     >>      >              > To be explicit, I am supporting the idea of
     >>     reviewing the
     >>      >             release guide but not changing the release
    process
     >>     for the
     >>      >             already in-progress release.
     >>      >
     >>      >             I consider the release guide immutable for the
     >>     process of a
     >>      >             release.
     >>      >             Thus, a change to the release guide can only
    affect
     >> new
     >>      >             upcoming
     >>      >             releases, not an in-process release.
     >>      >
     >>      >              > +1 and I think we can also evaluate
    whether flaky
     >>     tests
     >>      >             should be reviewed as release blockers or
    not. Some
     >> flaky
     >>      >             tests would be hiding real issues our users
    could
     >> face.
     >>      >
     >>      >             Flaky tests are also worth to take into
    account when
     >>      >             releasing, but a
     >>      >             little harder to find because may just happen
    to pass
     >>     during
     >>      >             building
     >>      >             the release. It is possible though if we
    strictly
     >> capture
     >>      >             flaky tests
     >>      >             via JIRA and mark them with the Fix Version
    for the
     >>     release.
     >>      >
     >>      >              > We keep accumulating dashboards and
     >>      >              > tests that few people care about, so it is
     >>     probably worth
     >>      >             that we use

>> > > them or get a way to alert us ofregressions

     >>     during the
     >>      >             release cycle
     >>      >              > to catch this even before the RCs.
     >>      >
     >>      >             +1 The release guide should be explicit about
    which
     >>      >             performance test
     >>      >             results to evaluate.
     >>      >
     >>      >             The prerequisite is that we have all the
    stats in one
     >>     place.
     >>      >             They seem
     >>      >             to be scattered across
     >> http://metrics.beam.apache.org and
     >>      > https://apache-beam-testing.appspot.com.
     >>      >
     >>      >             Would it be possible to consolidate the two,
    i.e.
     >> use the
     >>      >             Grafana-based
     >>      >             dashboard to load the legacy stats?
     >>      >

>> > For the evaluation during the releaseprocess, I

     >>     suggest to
     >>      >             use a
     >>      >             standardized set of performance tests for all
     >>     runners, e.g.:
     >>      >
     >>      >             - Nexmark
     >>      >             - ParDo (Classic/Portable)
     >>      >             - GroupByKey
     >>      >             - IO
     >>      >
     >>      >
     >>      >             -Max
     >>      >
     >>      >             On 21.07.20 01:23, Ahmet Altay wrote:
     >>      >              >

>> > > On Mon, Jul 20, 2020 at 3:07 PM IsmaëlMejía

     >>      >             <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
     >>     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>
     >>      >              > <mailto:[email protected]
    <mailto:[email protected]>
     >>     <mailto:[email protected] <mailto:[email protected]>>
    <mailto:[email protected] <mailto:[email protected]>

>> <mailto:[email protected] <mailto:[email protected]>>>>>wrote:

     >>      >              >
     >>      >              >     +1
     >>      >              >
     >>      >              >     This is not in the release guide and
    we should
     >>      >             probably re evaluate if

>> > > this should be a release blockingreason.

     >>      >              >     Of course exceptionally a performance
     >> regression
     >>      >             could be motivated by
     >>      >              >     a correctness fix or a worth refactor,
    so we
     >>     should
     >>      >             consider this.
     >>      >              >
     >>      >              >
     >>      >              > +1 and I think we can also evaluate whether
     >>     flaky tests
     >>      >             should be
     >>      >              > reviewed as release blockers or not. Some
    flaky
     >> tests
     >>      >             would be hiding
     >>      >              > real issues our users could face.
     >>      >              >
     >>      >              > To be explicit, I am supporting the idea of
     >>     reviewing the
     >>      >             release guide

>> > > but not changing the release process forthe

     >> already
     >>      >             in-progress release.
     >>      >              >
     >>      >              >
     >>      >              >     We have been tracking and fixing
    performance
     >>      >             regressions multiple
     >>      >              >     times found simply by checking the
    nexmark
     >> tests
     >>      >             including on the

>> > > ongoing 2.23.0 release so value isthere.

     >> Nexmark
     >>      >             does not cover yet
     >>      >              >     python and portable runners so we are
    probably
     >>     still
     >>      >             missing many
     >>      >              >     issues and it is worth to work on
    this. In any
     >>     case
     >>      >             we should probably
     >>      >              >     decide what validations matter. We keep
     >>     accumulating
     >>      >             dashboards and

>> > > tests that few people care about, soit is

     >>     probably
     >>      >             worth that we use
     >>      >              >     them or get a way to alert us of
    regressions
     >>     during
     >>      >             the release cycle
     >>      >              >     to catch this even before the RCs.
     >>      >              >
     >>      >              >
     >>      >              > I agree. And if we cannot use
    dashboards/tests in a
     >>      >             meaningful way, IMO

>> > > we can remove them. There is not muchvalue to

     >>     maintain
     >>      >             them if they do
     >>      >              > not provide important signals.
     >>      >              >
     >>      >              >

>> > > On Fri, Jul 10, 2020 at 9:30 PM UdiMeiri

     >>      >             <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
     >>     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>
     >>      >              >     <mailto:[email protected]
    <mailto:[email protected]>
     >>     <mailto:[email protected] <mailto:[email protected]>>
    <mailto:[email protected] <mailto:[email protected]>
     >>     <mailto:[email protected] <mailto:[email protected]>>>>>
     >>      >             wrote:
     >>      >              >      >
     >>      >              >      > On Thu, Jul 9, 2020 at 12:48 PM
    Maximilian
     >>     Michels
     >>      >              >     <[email protected]

<mailto:[email protected]> <mailto:[email protected]<mailto:[email protected]>>

     >>     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>
     >>      >             <mailto:[email protected]

<mailto:[email protected]> <mailto:[email protected]<mailto:[email protected]>>

     >>     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>>> wrote:
     >>      >              >      >>
     >>      >              >      >> Not yet, I just learned about the
     >>     migration to a
     >>      >             new frontend,
     >>      >              >     including
     >>      >              >      >> a new backend (InfluxDB instead of
     >> BigQuery).
     >>      >              >      >>
     >>      >              >      >> >  - Are the metrics available on
     >>      > metrics.beam.apache.org <http://metrics.beam.apache.org>
    <http://metrics.beam.apache.org>
     >>     <http://metrics.beam.apache.org>
     >>      >              >     <http://metrics.beam.apache.org>?
     >>      >              >      >>
     >>      >              >      >> Is http://metrics.beam.apache.org
    online?
     >>     I was
     >>      >             never able to
     >>      >              >     access it.
     >>      >              >      >
     >>      >              >      >
     >>      >              >      > It doesn't support https. I had to
    add an
     >>      >             exception to the HTTPS
     >>      >              >     Everywhere extension for
     >>     "metrics.beam.apache.org <http://metrics.beam.apache.org>
    <http://metrics.beam.apache.org>
     >>      >             <http://metrics.beam.apache.org>
     >>      >              >     <http://metrics.beam.apache.org>".
     >>      >              >      >
     >>      >              >      >>
     >>      >              >      >>
     >>      >              >      >> >  - What is the feature delta
    between
     >> usinig
     >>      >              > metrics.beam.apache.org
    <http://metrics.beam.apache.org>
     >>     <http://metrics.beam.apache.org>
    <http://metrics.beam.apache.org>
     >>      >             <http://metrics.beam.apache.org> (much
     >>      >              >     better UI) and using
     >> apache-beam-testing.appspot.com
    <http://apache-beam-testing.appspot.com>
     >> <http://apache-beam-testing.appspot.com>
     >>      >             <http://apache-beam-testing.appspot.com>

>> > > <http://apache-beam-testing.appspot.com>?

     >>      >              >      >>
     >>      >              >      >> AFAIK it is an ongoing migration
    and the
     >> delta
     >>      >             appears to be high.
     >>      >              >      >>
     >>      >              >      >> >  - Can we notice regressions
    faster than
     >>      >             release cadence?
     >>      >              >      >>

>> > > >> Absolutely! A report with thelatest

     >> numbers
     >>      >             including
     >>      >              >     statistics about

>> > > >> the growth of metrics would beuseful.

     >>      >              >      >>
     >>      >              >      >> >  - Can we get automated alerts?
     >>      >              >      >>
     >>      >              >      >> I think we could setup a Jenkins
    job to do
     >>     this.
     >>      >              >      >>
     >>      >              >      >> -Max
     >>      >              >      >>
     >>      >              >      >> On 09.07.20 20:26, Kenneth Knowles
    wrote:
     >>      >              >      >> > Questions:
     >>      >              >      >> >
     >>      >              >      >> >   - Are the metrics available on
     >>      > metrics.beam.apache.org <http://metrics.beam.apache.org>
    <http://metrics.beam.apache.org>
     >>     <http://metrics.beam.apache.org>
     >>      >              >     <http://metrics.beam.apache.org>
     >>      >              >      >> > <http://metrics.beam.apache.org>?
     >>      >              >      >> >   - What is the feature delta
    between
     >> usinig
     >>      >              > metrics.beam.apache.org
    <http://metrics.beam.apache.org>
     >>     <http://metrics.beam.apache.org>
    <http://metrics.beam.apache.org>
     >>      >             <http://metrics.beam.apache.org>
     >>      >              >      >> > <http://metrics.beam.apache.org>
    (much
     >>     better
     >>      >             UI) and using
     >>      >              >      >> > apache-beam-testing.appspot.com
    <http://apache-beam-testing.appspot.com>
     >>     <http://apache-beam-testing.appspot.com>
     >>      >             <http://apache-beam-testing.appspot.com>

>> > > <http://apache-beam-testing.appspot.com> >> > > <http://apache-beam-testing.appspot.com>?

     >>      >              >      >> >   - Can we notice regressions
    faster than
     >>      >             release cadence?
     >>      >              >      >> >   - Can we get automated alerts?
     >>      >              >      >> >
     >>      >              >      >> > Kenn
     >>      >              >      >> >
     >>      >              >      >> > On Thu, Jul 9, 2020 at 10:21 AM
     >>     Maximilian Michels
     >>      >              >     <[email protected]

<mailto:[email protected]> <mailto:[email protected]<mailto:[email protected]>>

     >>     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>
     >>      >             <mailto:[email protected]

<mailto:[email protected]> <mailto:[email protected]<mailto:[email protected]>>

     >>     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>>
     >>      >              >      >> > <mailto:[email protected]
    <mailto:[email protected]>
     >>     <mailto:[email protected] <mailto:[email protected]>>
    <mailto:[email protected] <mailto:[email protected]>
     >> <mailto:[email protected] <mailto:[email protected]>>>
     >>      >             <mailto:[email protected]

<mailto:[email protected]> <mailto:[email protected]<mailto:[email protected]>>

     >>     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>>>> wrote:
     >>      >              >      >> >
     >>      >              >      >> >     Hi,
     >>      >              >      >> >

>> > > >> > We recently saw anincrease in

     >> latency
     >>      >             migrating from Beam
     >>      >              >     2.18.0 to
     >>      >              >      >> >     2.21.0 (Python SDK with Flink
     >>     Runner). This
     >>      >             proofed very
     >>      >              >     hard to debug
     >>      >              >      >> >     and it looks like each
    version in
     >>     between
     >>      >             the two versions
     >>      >              >     let to
     >>      >              >      >> >     increased latency.
     >>      >              >      >> >
     >>      >              >      >> >     This is not the first time
    we saw
     >> issues
     >>      >             when migrating,
     >>      >              >     another
     >>      >              >      >> >     time we

>> > > >> > had a decline incheckpointing

     >>     performance
     >>      >             and thus added a
     >>      >              >      >> >     checkpointing test [1] and
    dashboard
     >>     [2] (see
     >>      >              >     checkpointing widget).
     >>      >              >      >> >
     >>      >              >      >> >     That makes me wonder if we
    should
     >>     monitor
     >>      >             performance
     >>      >              >     (throughput /
     >>      >              >      >> >     latency) for basic use cases
    as part
     >>     of the
     >>      >             release
     >>      >              >     testing. Currently,

>> > > >> > our release guide [3]mentions

     >> running
     >>      >             examples but not
     >>      >              >     evaluating the
     >>      >              >      >> >     performance. I think it
    would be good
     >>      >             practice to check
     >>      >              >     relevant charts

>> > > >> > with performancemeasurements as

     >>     part of of
     >>      >             the release
     >>      >              >     process. The
     >>      >              >      >> >     release guide should reflect
    that.
     >>      >              >      >> >
     >>      >              >      >> >     WDYT?
     >>      >              >      >> >
     >>      >              >      >> >     -Max
     >>      >              >      >> >
     >>      >              >      >> >     PS: Of course, this requires
    tests
     >> and
     >>      >             metrics to be
     >>      >              >     available. This PR

>> > > >> > adds latency measurementsto the

     >>     load tests
     >>      >             [4].
     >>      >              >      >> >
     >>      >              >      >> >
     >>      >              >      >> >     [1]
     >> https://github.com/apache/beam/pull/11558
     >>      >              >      >> >     [2]
     >>      >              >      >> >
     >>      >              >
     >>      >
     >>
     >>

https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056


     >>
     >>      >              >      >> >     [3]
     >>      > https://beam.apache.org/contribute/release-guide/
     >>      >              >      >> >     [4]
     >> https://github.com/apache/beam/pull/12065
     >>      >              >      >> >
     >>      >              >
     >>      >
     >>
     >>
     >>
     >> --
     >>
     >> Michał Walenia
     >> Polidea <https://www.polidea.com/> | Software Engineer
     >>
     >> M: +48 791 432 002 <tel:+48%20791%20432%20002> <tel:+48791432002
    <tel:+48%20791%20432%20002>>
     >> E: [email protected]
    <mailto:[email protected]>

<mailto:[email protected]<mailto:[email protected]>>

     >>
     >> Unique Tech
     >> Check out our projects! <https://www.polidea.com/our-work>
     >>

Re: Monitoring performance for releases

Reply via email to