Hi all,

Related to Matthias' email, I've checked the notifications in the Slack
channel and noticed three major benchmark regressions. In the end, I've
decided to create Jira tickets for it [1] [2] [3] but I do agree that this
work needs to be formalized as soon as possible to avoid regressions. It
would also be great to include a process on how these regressions will be
fixed, because I have no idea who to ping/notify that these regressions
have occurred.

Best regards,

Martijn

[1] https://issues.apache.org/jira/browse/FLINK-30623
[2] https://issues.apache.org/jira/browse/FLINK-30624
[3] https://issues.apache.org/jira/browse/FLINK-30625

On Tue, Jan 10, 2023 at 1:56 PM Matthias Pohl
<matthias.p...@aiven.io.invalid> wrote:

> Hi Yanfei,
> any updates on the performance tests? ...or more specifically, any updates
> on the script for alerting on performance regressions?
>
> Does it make sense to formalize/document the process? Currently, the
> release management doesn't do anything in terms of performance
> test monitoring. Therefore, performance regressions are not necessarily
> identified actively (in contrast to CI instabilities). Or is this covered
> by the PMC? It would be interesting to know whether there's someone to
> reach out to who's monitoring the regression tests regularly. Would it make
> sense for this person to join the release calls?
>
> Or shall we work on formalizing/documenting the process and integrating
> this responsibility into what the release manager(s) are in charge of? My
> concern with that approach is that contributors might be less willing to
> volunteer in the release management if we collect everything in one role.
> Alternatively, we could split the release manager role up into sub-roles
> that contributors can volunteer for in a release (e.g. CI monitoring,
> performance test monitoring, Jira maintenance, ... just coming up with
> random tasks here).
>
> Alternatively, we could leave everything as is and just respond if there's
> some complaint. I'm curious about your (and other's) opinions.
>
> Matthias
>
> On Tue, Nov 29, 2022 at 2:13 PM Yanfei Lei <fredia...@gmail.com> wrote:
>
> > Hi Martijn,
> >
> > Thanks for bringing this up.
> >
> > In the past two months, this channel has helped us find many benchmark
> fail
> > issues, like FLINK-29883
> > <https://issues.apache.org/jira/browse/FLINK-29883>[1],
> > FLINK-29886 <https://issues.apache.org/jira/browse/FLINK-29886>[2],
> > FLINK-30015 <https://issues.apache.org/jira/browse/FLINK-30015>[3] and
> > FLINK-30181 <https://issues.apache.org/jira/browse/FLINK-30181>[4]. I
> also
> > have tried investigating several of the frequently reported regressions
> and
> > replied under the notification in slack channel(copy them here):
> >
> >    1. serializerHeavyString
> >    <
> >
> http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=2&revs=200
> > >:
> >    It is unstable for a long time, see [5]
> >    https://issues.apache.org/jira/browse/FLINK-27165 for possible
> reasons.
> >    2. Regressions are detected by a simple script which may have false
> >    positives and false negatives, especially for benchmarks with small
> >    absolute values, small value changes cause large percentage changes.
> see
> >    [6] for details.
> >
> >      Maybe slidingWindow
> > <
> >
> http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=slidingWindow&extr=on&quarts=on&equid=off&env=2&revs=200
> > >(value~=600),
> > stateBackends.ROCKS
> > <
> >
> http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=stateBackends.ROCKS&extr=on&quarts=on&equid=off&env=2&revs=200
> > >
> > (value~=260) and serializerHeavyString
> > <
> >
> http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=2&revs=200
> > >(value~=170)
> > are
> > not true regressions.
> >
> >    1. For deployAllTasks.STREAMING
> >    <
> >
> http://codespeed.dak8s.net:8000/timeline/#/?exe=8&ben=deployAllTasks.STREAMING&extr=on&quarts=on&equid=off&env=2&revs=200
> > >,
> >    this benchmark result is how much time it takes to deploy job, the
> less
> >    value the better performance, see [7] for details. FLINK-27571
> >    <https://issues.apache.org/jira/browse/FLINK-27571>[8] would fix this
> >    problem.
> >
> >
> > As mentioned before, regressions are detected by a simple script that is
> > less stable, FLINK-29825 <
> > https://issues.apache.org/jira/browse/FLINK-29825>[9]
> > is created to improve the benchmark's stability. I planned to invite more
> > volunteers to monitor it after the checking of regression became more
> > stable, but I've been stuck with something else lately, sorry for the
> late
> > response.  Any suggestions on handling benchmark regressions/fails are
> > welcome.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-29883
> >
> > [2] https://issues.apache.org/jira/browse/FLINK-29886
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-30015
> >
> > [4] https://issues.apache.org/jira/browse/FLINK-30181
> >
> > [5] https://issues.apache.org/jira/browse/FLINK-27165
> >
> > [6]
> >
> >
> https://github.com/apache/flink-benchmarks/blob/master/regression_report.py#L132-L136
> >
> > [7]
> >
> >
> https://github.com/apache/flink-benchmarks/blob/master/src/main/java/org/apache/flink/scheduler/benchmark/deploying/DeployingTasksInStreamingJobBenchmarkExecutor.java#L58
> >
> > [8] https://issues.apache.org/jira/browse/FLINK-27571
> >
> > [9] https://issues.apache.org/jira/browse/FLINK-29825
> >
> >
> > Best,
> >
> > Yanfei
> >
> > Martijn Visser <martijnvis...@apache.org> 于2022年11月29日周二 15:54写道:
> >
> > > Hi,
> > >
> > > Is there any update to be expected on the benchmark? I see results of
> the
> > > benchmark being posted to Slack, but it appears that it's not being
> > > monitored and no follow-up actions are being taken. I think it's
> > currently
> > > lacking a process on how to interpret the results and what action
> should
> > > be taken and by whom.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Thu, Nov 3, 2022 at 12:22 PM Jing Ge <j...@ververica.com> wrote:
> > >
> > > > Thanks yanfei for driving this!
> > > >
> > > > Looking forward to further discussion w.r.t. the workflow.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Mon, Oct 31, 2022 at 6:04 PM Mason Chen <mas.chen6...@gmail.com>
> > > wrote:
> > > >
> > > > > +1, thanks for driving this!
> > > > >
> > > > > On a side note, can we also ensure that a performance summary
> report
> > > for
> > > > > Flink major version upgrades is in release notes, once this
> > > > infrastructure
> > > > > becomes mature? From the user perspective, it would be nice to know
> > > what
> > > > > the expected (or unexpected) regressions in a major version upgrade
> > > are.
> > > > > I've seen the community do something like this before (e.g. the
> major
> > > > > rocksdb version bump in 1.14?) and it was quite valuable to know
> that
> > > > > upfront!
> > > > >
> > > > > Best,
> > > > > Mason
> > > > >
> > > > > On Fri, Oct 28, 2022 at 1:46 AM weijie guo <
> > guoweijieres...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Yanfei for driving this.
> > > > > >
> > > > > > It allows us to easily find the problem of performance
> regression.
> > > > > > Especially recently, I have made some improvements to the
> > scheduling
> > > > > > related parts, your work is very important to ensure that these
> > > changes
> > > > > do
> > > > > > not cause some unexpected problems.
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Weijie
> > > > > >
> > > > > >
> > > > > > Congxian Qiu <qcx978132...@gmail.com> 于2022年10月28日周五 16:03写道:
> > > > > >
> > > > > > > Thanks for driving this and making the performance monitoring
> > > public,
> > > > > > this
> > > > > > > can make us know and resolve the performance problem quickly.
> > > > > > >
> > > > > > > Looking forward to the workflow and detailed descriptions fo
> > > > > > > flink-dev-benchmarks.
> > > > > > >
> > > > > > > Best,
> > > > > > > Congxian
> > > > > > >
> > > > > > >
> > > > > > > Yun Tang <myas...@live.com> 于2022年10月27日周四 12:41写道:
> > > > > > >
> > > > > > > > Thanks, Yanfei for driving this to monitor the performance in
> > the
> > > > > > Apache
> > > > > > > > Flink Slack Channel.
> > > > > > > >
> > > > > > > > Look forward to the workflow and detailed descriptions of
> > > > > > > > flink-dev-benchmarks.
> > > > > > > >
> > > > > > > > Best
> > > > > > > > Yun Tang
> > > > > > > > ________________________________
> > > > > > > > From: Hangxiang Yu <master...@gmail.com>
> > > > > > > > Sent: Thursday, October 27, 2022 10:59
> > > > > > > > To: dev@flink.apache.org <dev@flink.apache.org>
> > > > > > > > Subject: Re: [ANNOUNCE] Performance Daily Monitoring Moved
> from
> > > > > > Ververica
> > > > > > > > to Apache Flink Slack Channel
> > > > > > > >
> > > > > > > > Hi, Yanfei.
> > > > > > > > Thanks for driving this.
> > > > > > > > It could help us to detect and resolve the regression problem
> > > > quickly
> > > > > > and
> > > > > > > > officially.
> > > > > > > > I'd like to join as a maintainer.
> > > > > > > > Looking forward to the workflow.
> > > > > > > >
> > > > > > > > On Wed, Oct 26, 2022 at 5:18 PM Yuan Mei <
> > yuanmei.w...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks, Yanfei, to drive this and make the performance
> > > monitoring
> > > > > > > > publicly
> > > > > > > > > available.
> > > > > > > > >
> > > > > > > > > Looking forward to seeing the workflow, and more details as
> > > > Martijn
> > > > > > > > > mentioned.
> > > > > > > > >
> > > > > > > > > Best
> > > > > > > > > Yuan
> > > > > > > > >
> > > > > > > > > On Wed, Oct 26, 2022 at 2:59 PM Martijn Visser <
> > > > > > > martijnvis...@apache.org
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Yanfei Lei,
> > > > > > > > > >
> > > > > > > > > > Thanks for setting this up! It would be interesting to
> also
> > > > know
> > > > > > > which
> > > > > > > > > > aspects of Flink are monitored for "performance". I'm
> > > assuming
> > > > > > there
> > > > > > > > are
> > > > > > > > > > specific pieces of functionality that are performance
> > tested,
> > > > but
> > > > > > it
> > > > > > > > > would
> > > > > > > > > > be great if this would be written down somewhere (next
> to a
> > > > > > procedure
> > > > > > > > how
> > > > > > > > > > to detect a regression and what should be next steps).
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > >
> > > > > > > > > > Martijn
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 26, 2022 at 8:21 AM Zakelly Lan <
> > > > > zakelly....@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi yanfei,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for driving this! It's a great help.
> > > > > > > > > > >
> > > > > > > > > > > I would like to join as a maintainer.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Zakelly
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Oct 26, 2022 at 11:32 AM yanfei lei <
> > > > > fredia...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > As discussed earlier, we plan to create a benchmark
> > > channel
> > > > > in
> > > > > > > > Apache
> > > > > > > > > > > Flink
> > > > > > > > > > > > slack[1], but the plan was shelved for a while[2].
> So I
> > > > went
> > > > > on
> > > > > > > > with
> > > > > > > > > > this
> > > > > > > > > > > > work, and created the #flink-dev-benchmarks channel
> for
> > > > > > > performance
> > > > > > > > > > > > regression notifications.
> > > > > > > > > > > >
> > > > > > > > > > > > We have a regression report script[3] that runs
> daily,
> > > and
> > > > a
> > > > > > > > > > notification
> > > > > > > > > > > > would be sent to the slack channel when the last few
> > > > > benchmark
> > > > > > > > > results
> > > > > > > > > > > are
> > > > > > > > > > > > significantly worse than the baseline.
> > > > > > > > > > > > Note, regressions are detected by a simple script
> which
> > > may
> > > > > > have
> > > > > > > > > false
> > > > > > > > > > > > positives and false negatives. And all benchmarks are
> > > > > executed
> > > > > > on
> > > > > > > > one
> > > > > > > > > > > > physical machine[4] which is provided by
> > > > > Ververica(Alibaba)[5],
> > > > > > > it
> > > > > > > > > > might
> > > > > > > > > > > > happen that hardware issues affect performance, like
> > > > > > > "[FLINK-18614
> > > > > > > > > > > > <https://issues.apache.org/jira/browse/FLINK-18614>]
> > > > > > Performance
> > > > > > > > > > > regression
> > > > > > > > > > > > 2020.07.13"[6].
> > > > > > > > > > > >
> > > > > > > > > > > > After the migration, we need a procedure to watch
> over
> > > the
> > > > > > entire
> > > > > > > > > > > > performance of Flink code together. For example, if a
> > > > > > regression
> > > > > > > > > > > > occurs, investigating the cause and resolving the
> > problem
> > > > are
> > > > > > > > needed.
> > > > > > > > > > In
> > > > > > > > > > > > the past, this procedure is maintained internally
> > within
> > > > > > > Ververica,
> > > > > > > > > but
> > > > > > > > > > > we
> > > > > > > > > > > > think making the procedure public would benefit all.
> I
> > > > > > volunteer
> > > > > > > to
> > > > > > > > > > serve
> > > > > > > > > > > > as one of the initial maintainers, and would be glad
> if
> > > > more
> > > > > > > > > > contributors
> > > > > > > > > > > > can join me. I'd also prepare some guidelines to help
> > > > others
> > > > > > get
> > > > > > > > > > familiar
> > > > > > > > > > > > with the workflow. I will start a new thread to
> discuss
> > > the
> > > > > > > > workflow
> > > > > > > > > > > soon.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > >
> > https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> > > > > > > > > > > > [2]
> https://issues.apache.org/jira/browse/FLINK-28468
> > > > > > > > > > > > [3]
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink-benchmarks/blob/master/regression_report.py
> > > > > > > > > > > > [4] http://codespeed.dak8s.net:8080
> > > > > > > > > > > > [5]
> > > > > > > >
> > https://lists.apache.org/thread/jzljp4233799vwwqnr0vc9wgqs0xj1ro
> > > > > > > > > > > >
> > > > > > > > > > > > [6]
> https://issues.apache.org/jira/browse/FLINK-18614
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Hangxiang.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to