Hi all, Related to Matthias' email, I've checked the notifications in the Slack channel and noticed three major benchmark regressions. In the end, I've decided to create Jira tickets for it [1] [2] [3] but I do agree that this work needs to be formalized as soon as possible to avoid regressions. It would also be great to include a process on how these regressions will be fixed, because I have no idea who to ping/notify that these regressions have occurred.
Best regards, Martijn [1] https://issues.apache.org/jira/browse/FLINK-30623 [2] https://issues.apache.org/jira/browse/FLINK-30624 [3] https://issues.apache.org/jira/browse/FLINK-30625 On Tue, Jan 10, 2023 at 1:56 PM Matthias Pohl <matthias.p...@aiven.io.invalid> wrote: > Hi Yanfei, > any updates on the performance tests? ...or more specifically, any updates > on the script for alerting on performance regressions? > > Does it make sense to formalize/document the process? Currently, the > release management doesn't do anything in terms of performance > test monitoring. Therefore, performance regressions are not necessarily > identified actively (in contrast to CI instabilities). Or is this covered > by the PMC? It would be interesting to know whether there's someone to > reach out to who's monitoring the regression tests regularly. Would it make > sense for this person to join the release calls? > > Or shall we work on formalizing/documenting the process and integrating > this responsibility into what the release manager(s) are in charge of? My > concern with that approach is that contributors might be less willing to > volunteer in the release management if we collect everything in one role. > Alternatively, we could split the release manager role up into sub-roles > that contributors can volunteer for in a release (e.g. CI monitoring, > performance test monitoring, Jira maintenance, ... just coming up with > random tasks here). > > Alternatively, we could leave everything as is and just respond if there's > some complaint. I'm curious about your (and other's) opinions. > > Matthias > > On Tue, Nov 29, 2022 at 2:13 PM Yanfei Lei <fredia...@gmail.com> wrote: > > > Hi Martijn, > > > > Thanks for bringing this up. > > > > In the past two months, this channel has helped us find many benchmark > fail > > issues, like FLINK-29883 > > <https://issues.apache.org/jira/browse/FLINK-29883>[1], > > FLINK-29886 <https://issues.apache.org/jira/browse/FLINK-29886>[2], > > FLINK-30015 <https://issues.apache.org/jira/browse/FLINK-30015>[3] and > > FLINK-30181 <https://issues.apache.org/jira/browse/FLINK-30181>[4]. I > also > > have tried investigating several of the frequently reported regressions > and > > replied under the notification in slack channel(copy them here): > > > > 1. serializerHeavyString > > < > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=2&revs=200 > > >: > > It is unstable for a long time, see [5] > > https://issues.apache.org/jira/browse/FLINK-27165 for possible > reasons. > > 2. Regressions are detected by a simple script which may have false > > positives and false negatives, especially for benchmarks with small > > absolute values, small value changes cause large percentage changes. > see > > [6] for details. > > > > Maybe slidingWindow > > < > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=slidingWindow&extr=on&quarts=on&equid=off&env=2&revs=200 > > >(value~=600), > > stateBackends.ROCKS > > < > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=stateBackends.ROCKS&extr=on&quarts=on&equid=off&env=2&revs=200 > > > > > (value~=260) and serializerHeavyString > > < > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=2&revs=200 > > >(value~=170) > > are > > not true regressions. > > > > 1. For deployAllTasks.STREAMING > > < > > > http://codespeed.dak8s.net:8000/timeline/#/?exe=8&ben=deployAllTasks.STREAMING&extr=on&quarts=on&equid=off&env=2&revs=200 > > >, > > this benchmark result is how much time it takes to deploy job, the > less > > value the better performance, see [7] for details. FLINK-27571 > > <https://issues.apache.org/jira/browse/FLINK-27571>[8] would fix this > > problem. > > > > > > As mentioned before, regressions are detected by a simple script that is > > less stable, FLINK-29825 < > > https://issues.apache.org/jira/browse/FLINK-29825>[9] > > is created to improve the benchmark's stability. I planned to invite more > > volunteers to monitor it after the checking of regression became more > > stable, but I've been stuck with something else lately, sorry for the > late > > response. Any suggestions on handling benchmark regressions/fails are > > welcome. > > > > [1] https://issues.apache.org/jira/browse/FLINK-29883 > > > > [2] https://issues.apache.org/jira/browse/FLINK-29886 > > > > [3] https://issues.apache.org/jira/browse/FLINK-30015 > > > > [4] https://issues.apache.org/jira/browse/FLINK-30181 > > > > [5] https://issues.apache.org/jira/browse/FLINK-27165 > > > > [6] > > > > > https://github.com/apache/flink-benchmarks/blob/master/regression_report.py#L132-L136 > > > > [7] > > > > > https://github.com/apache/flink-benchmarks/blob/master/src/main/java/org/apache/flink/scheduler/benchmark/deploying/DeployingTasksInStreamingJobBenchmarkExecutor.java#L58 > > > > [8] https://issues.apache.org/jira/browse/FLINK-27571 > > > > [9] https://issues.apache.org/jira/browse/FLINK-29825 > > > > > > Best, > > > > Yanfei > > > > Martijn Visser <martijnvis...@apache.org> 于2022年11月29日周二 15:54写道: > > > > > Hi, > > > > > > Is there any update to be expected on the benchmark? I see results of > the > > > benchmark being posted to Slack, but it appears that it's not being > > > monitored and no follow-up actions are being taken. I think it's > > currently > > > lacking a process on how to interpret the results and what action > should > > > be taken and by whom. > > > > > > Best regards, > > > > > > Martijn > > > > > > On Thu, Nov 3, 2022 at 12:22 PM Jing Ge <j...@ververica.com> wrote: > > > > > > > Thanks yanfei for driving this! > > > > > > > > Looking forward to further discussion w.r.t. the workflow. > > > > > > > > Best regards, > > > > Jing > > > > > > > > On Mon, Oct 31, 2022 at 6:04 PM Mason Chen <mas.chen6...@gmail.com> > > > wrote: > > > > > > > > > +1, thanks for driving this! > > > > > > > > > > On a side note, can we also ensure that a performance summary > report > > > for > > > > > Flink major version upgrades is in release notes, once this > > > > infrastructure > > > > > becomes mature? From the user perspective, it would be nice to know > > > what > > > > > the expected (or unexpected) regressions in a major version upgrade > > > are. > > > > > I've seen the community do something like this before (e.g. the > major > > > > > rocksdb version bump in 1.14?) and it was quite valuable to know > that > > > > > upfront! > > > > > > > > > > Best, > > > > > Mason > > > > > > > > > > On Fri, Oct 28, 2022 at 1:46 AM weijie guo < > > guoweijieres...@gmail.com> > > > > > wrote: > > > > > > > > > > > Thanks Yanfei for driving this. > > > > > > > > > > > > It allows us to easily find the problem of performance > regression. > > > > > > Especially recently, I have made some improvements to the > > scheduling > > > > > > related parts, your work is very important to ensure that these > > > changes > > > > > do > > > > > > not cause some unexpected problems. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Weijie > > > > > > > > > > > > > > > > > > Congxian Qiu <qcx978132...@gmail.com> 于2022年10月28日周五 16:03写道: > > > > > > > > > > > > > Thanks for driving this and making the performance monitoring > > > public, > > > > > > this > > > > > > > can make us know and resolve the performance problem quickly. > > > > > > > > > > > > > > Looking forward to the workflow and detailed descriptions fo > > > > > > > flink-dev-benchmarks. > > > > > > > > > > > > > > Best, > > > > > > > Congxian > > > > > > > > > > > > > > > > > > > > > Yun Tang <myas...@live.com> 于2022年10月27日周四 12:41写道: > > > > > > > > > > > > > > > Thanks, Yanfei for driving this to monitor the performance in > > the > > > > > > Apache > > > > > > > > Flink Slack Channel. > > > > > > > > > > > > > > > > Look forward to the workflow and detailed descriptions of > > > > > > > > flink-dev-benchmarks. > > > > > > > > > > > > > > > > Best > > > > > > > > Yun Tang > > > > > > > > ________________________________ > > > > > > > > From: Hangxiang Yu <master...@gmail.com> > > > > > > > > Sent: Thursday, October 27, 2022 10:59 > > > > > > > > To: dev@flink.apache.org <dev@flink.apache.org> > > > > > > > > Subject: Re: [ANNOUNCE] Performance Daily Monitoring Moved > from > > > > > > Ververica > > > > > > > > to Apache Flink Slack Channel > > > > > > > > > > > > > > > > Hi, Yanfei. > > > > > > > > Thanks for driving this. > > > > > > > > It could help us to detect and resolve the regression problem > > > > quickly > > > > > > and > > > > > > > > officially. > > > > > > > > I'd like to join as a maintainer. > > > > > > > > Looking forward to the workflow. > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 5:18 PM Yuan Mei < > > yuanmei.w...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks, Yanfei, to drive this and make the performance > > > monitoring > > > > > > > > publicly > > > > > > > > > available. > > > > > > > > > > > > > > > > > > Looking forward to seeing the workflow, and more details as > > > > Martijn > > > > > > > > > mentioned. > > > > > > > > > > > > > > > > > > Best > > > > > > > > > Yuan > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 2:59 PM Martijn Visser < > > > > > > > martijnvis...@apache.org > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Yanfei Lei, > > > > > > > > > > > > > > > > > > > > Thanks for setting this up! It would be interesting to > also > > > > know > > > > > > > which > > > > > > > > > > aspects of Flink are monitored for "performance". I'm > > > assuming > > > > > > there > > > > > > > > are > > > > > > > > > > specific pieces of functionality that are performance > > tested, > > > > but > > > > > > it > > > > > > > > > would > > > > > > > > > > be great if this would be written down somewhere (next > to a > > > > > > procedure > > > > > > > > how > > > > > > > > > > to detect a regression and what should be next steps). > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > Martijn > > > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 8:21 AM Zakelly Lan < > > > > > zakelly....@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi yanfei, > > > > > > > > > > > > > > > > > > > > > > Thanks for driving this! It's a great help. > > > > > > > > > > > > > > > > > > > > > > I would like to join as a maintainer. > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Zakelly > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 11:32 AM yanfei lei < > > > > > fredia...@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > As discussed earlier, we plan to create a benchmark > > > channel > > > > > in > > > > > > > > Apache > > > > > > > > > > > Flink > > > > > > > > > > > > slack[1], but the plan was shelved for a while[2]. > So I > > > > went > > > > > on > > > > > > > > with > > > > > > > > > > this > > > > > > > > > > > > work, and created the #flink-dev-benchmarks channel > for > > > > > > > performance > > > > > > > > > > > > regression notifications. > > > > > > > > > > > > > > > > > > > > > > > > We have a regression report script[3] that runs > daily, > > > and > > > > a > > > > > > > > > > notification > > > > > > > > > > > > would be sent to the slack channel when the last few > > > > > benchmark > > > > > > > > > results > > > > > > > > > > > are > > > > > > > > > > > > significantly worse than the baseline. > > > > > > > > > > > > Note, regressions are detected by a simple script > which > > > may > > > > > > have > > > > > > > > > false > > > > > > > > > > > > positives and false negatives. And all benchmarks are > > > > > executed > > > > > > on > > > > > > > > one > > > > > > > > > > > > physical machine[4] which is provided by > > > > > Ververica(Alibaba)[5], > > > > > > > it > > > > > > > > > > might > > > > > > > > > > > > happen that hardware issues affect performance, like > > > > > > > "[FLINK-18614 > > > > > > > > > > > > <https://issues.apache.org/jira/browse/FLINK-18614>] > > > > > > Performance > > > > > > > > > > > regression > > > > > > > > > > > > 2020.07.13"[6]. > > > > > > > > > > > > > > > > > > > > > > > > After the migration, we need a procedure to watch > over > > > the > > > > > > entire > > > > > > > > > > > > performance of Flink code together. For example, if a > > > > > > regression > > > > > > > > > > > > occurs, investigating the cause and resolving the > > problem > > > > are > > > > > > > > needed. > > > > > > > > > > In > > > > > > > > > > > > the past, this procedure is maintained internally > > within > > > > > > > Ververica, > > > > > > > > > but > > > > > > > > > > > we > > > > > > > > > > > > think making the procedure public would benefit all. > I > > > > > > volunteer > > > > > > > to > > > > > > > > > > serve > > > > > > > > > > > > as one of the initial maintainers, and would be glad > if > > > > more > > > > > > > > > > contributors > > > > > > > > > > > > can join me. I'd also prepare some guidelines to help > > > > others > > > > > > get > > > > > > > > > > familiar > > > > > > > > > > > > with the workflow. I will start a new thread to > discuss > > > the > > > > > > > > workflow > > > > > > > > > > > soon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > https://www.mail-archive.com/dev@flink.apache.org/msg58666.html > > > > > > > > > > > > [2] > https://issues.apache.org/jira/browse/FLINK-28468 > > > > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink-benchmarks/blob/master/regression_report.py > > > > > > > > > > > > [4] http://codespeed.dak8s.net:8080 > > > > > > > > > > > > [5] > > > > > > > > > > https://lists.apache.org/thread/jzljp4233799vwwqnr0vc9wgqs0xj1ro > > > > > > > > > > > > > > > > > > > > > > > > [6] > https://issues.apache.org/jira/browse/FLINK-18614 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best, > > > > > > > > Hangxiang. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >