After completing the side quest
<https://github.com/apache/flink-benchmarks/pull/90>[1] of enabling async
profiler when running the JMH benchmarks I've been unable to reproduce the
performance change between the last known good run and the first run
highlighted as a regression.
Results from my fedora f40 workstation using

# JMH version: 1.37
# VM version: JDK 11.0.23, OpenJDK 64-Bit Server VM, 11.0.23+9
# VM invoker: /home/sam/.sdkman/candidates/java/11.0.23-tem/bin/java
# VM options: -Djava.rmi.server.hostname=127.0.0.1
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.ssl
# Blackhole mode: full + dont-inline hint (auto-detected, use
-Djmh.blackhole.autoDetect=false to disable)
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: /tmp/profile-results/163b9cca6d2/jmh-result.csv
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ "Benchmark","Mode","Threads","Samples","Score","Score Error
(99.9%)","Unit"
   2   │
"org.apache.flink.benchmark.SerializationFrameworkMiniBenchmarks.serializerHeavyString","thrpt",1,30,179.453066,5.725733,"ops/ms"
   3   │
"org.apache.flink.benchmark.SerializationFrameworkMiniBenchmarks.serializerHeavyString:async","thrpt",1,1,NaN,NaN,"---"
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: /tmp/profile-results/f38d8ca43f6/jmh-result.csv
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ "Benchmark","Mode","Threads","Samples","Score","Score Error
(99.9%)","Unit"
   2   │
"org.apache.flink.benchmark.SerializationFrameworkMiniBenchmarks.serializerHeavyString","thrpt",1,30,178.861842,6.711582,"ops/ms"
   3   │
"org.apache.flink.benchmark.SerializationFrameworkMiniBenchmarks.serializerHeavyString:async","thrpt",1,1,NaN,NaN,"---"
───────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Where f38d8ca43f6 is the last known good run and 163b9cca6d2 is the first
regression.

One question I have from comparing my local results to those on flink-speed
<https://flink-speed.xyz/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=3&revs=200>[2]
is it possible the JDK version changed between the runs (I don't see the
actual JDK build listed anywhere so I can't check versions or
distributions)?

I've also tried comparing building flink with the java11-target profile vs
the default JDK 8 build and that does not change the performance.

Sam

[1] https://github.com/apache/flink-benchmarks/pull/90
[2]
https://flink-speed.xyz/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=3&revs=200

On Wed, 29 May 2024 at 16:53, Sam Barker <s...@quadrocket.co.uk> wrote:

> > I guess that improvement is a fluctuation. You can double check the
> performance results[1] of the last few days. The performance isn't
> recovered.
>
> Hmm yeah the improvement was a fluctuation and smaller than I remembered
> seeing (maybe I had zoomed into the timeline too much).
>
> > I fixed an issue related to kryo serialization in FLINK-35215. IIUC,
> serializerHeavyString doesn't use the kryo serialization. I try to
> run serializerHeavyString demo locally, and didn't see the
> kryo serialization related code is called.
>
> I don't see it either, but then again I don't see commons-io in the call
> stacks either despite the regression...
>
> I'm continuing to investigate the regression.
>
> On Mon, 27 May 2024 at 20:15, Rui Fan <1996fan...@gmail.com> wrote:
>
>> Thanks Sam for the comment!
>>
>> > It looks like the most recent run of JDK 11 saw a big improvement of the
>> > performance of the test.
>>
>> I guess that improvement is a fluctuation. You can double check the
>> performance results[1] of the last few days. The performance isn't
>> recovered.
>>
>
>
>
>>
>> > That improvement seems related to which is a fix for FLINK-35215.
>>
>> I fixed an issue related to kryo serialization in FLINK-35215. IIUC,
>> serializerHeavyString doesn't use the kryo serialization. I try to
>> run serializerHeavyString demo locally, and didn't see the
>> kryo serialization related code is called.
>>
>> Please correct me if I'm wrong, thanks~
>>
>> [1]
>>
>> http://flink-speed.xyz/timeline/#/?exe=6&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=3&revs=200
>>
>> Best,
>> Rui
>>
>> On Thu, May 23, 2024 at 1:27 PM Sam Barker <s...@quadrocket.co.uk> wrote:
>>
>> > It looks like the most recent run of JDK 11 saw a big improvement[1] of
>> the
>> > performance of the test. That improvement seems related to [2] which is
>> a
>> > fix for FLINK-35215 [3]. That suggests to me that the test isn't as
>> > isolated to the performance of the code its trying to test as would be
>> > ideal. However I've only just started looking at the test suite and
>> trying
>> > to run locally so I'm not very well placed to judge.
>> >
>> > It does however suggest that this shouldn't be a blocker for the
>> release.
>> >
>> >
>> >
>> > [1] http://flink-speed.xyz/changes/?rev=c1baf07d76&exe=6&env=3
>> > [2]
>> >
>> >
>> https://github.com/apache/flink/commit/c1baf07d7601a683f42997dc35dfaef4e41bc928
>> > [3] https://issues.apache.org/jira/browse/FLINK-35215
>> >
>> > On Wed, 22 May 2024 at 00:15, Piotr Nowojski <pnowoj...@apache.org>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > Given what you wrote, that you have investigated the issue and
>> couldn't
>> > > find any easy explanation, I would suggest closing this ticket as
>> "Won't
>> > > do" or "Can not reproduce" and ignoring the problem.
>> > >
>> > > In the past there have been quite a bit of cases where some benchmark
>> > > detected a performance regression. Sometimes those can not be
>> reproduced,
>> > > other times (as it's the case here), some seemingly unrelated change
>> is
>> > > causing the regression. The same thing happened in this benchmark many
>> > > times in the past [1], [2], [3], [4]. Generally speaking this
>> benchmark
>> > has
>> > > been in the spotlight a couple of times [5].
>> > >
>> > > Note that there have been cases where this benchmark did detect a
>> > > performance regression :)
>> > >
>> > > My personal suspicion is that after that commons-io version bump,
>> > > something poked JVM/JIT to compile the code a bit differently for
>> string
>> > > serialization causing this regression. We have a couple of benchmarks
>> > that
>> > > seem to be prone to such semi intermittent issues. For example the
>> same
>> > > benchmark was subject to this annoying pattern [6], that I've spotted
>> in
>> > > quite a bit of benchmarks over the years [6]:
>> > >
>> > > [image: image.png]
>> > > (https://imgur.com/a/AoygmWS)
>> > >
>> > > Where benchmark results are very stable within a single JVM fork. But
>> > > between two forks, they can reach two different "stable" levels. Here
>> it
>> > > looks like 50% of the chance of getting stable "200 records/ms" and
>> 50%
>> > > chances of "250 records/ms".
>> > >
>> > > A small interlude. Each of our benchmarks run in 3 different JVM
>> forks,
>> > 10
>> > > warm up iterations and 10 measurement iterations. Each iteration
>> > > lasts/invokes the benchmarking method at least for one second. So by
>> > "very
>> > > stable" results, I mean that for example after the 2nd or 3rd warm up
>> > > iteration, the results stabilize < +/-1%, and stay on that level for
>> the
>> > > whole duration of the fork.
>> > >
>> > > Given that we are repeating the same benchmark in 3 different forks,
>> we
>> > > can have by pure chance:
>> > > - 3 slow fork - total average 200 records/ms
>> > > - 2 slow fork, 1 fast fork - average 216 r/ms
>> > > - 1 slow fork, 2 fast forks - average 233 r/ms
>> > > - 3 fast forks - average 250 r/ms
>> > >
>> > > So this benchmark is susceptible to enter some different semi stable
>> > > states. As I wrote above, I guess something with the commons-io
>> version
>> > > bump just swayed it to a different semi stable state :( I have never
>> > gotten
>> > > desperate enough to actually dig further what's exactly causing this
>> kind
>> > > of issues.
>> > >
>> > > Best,
>> > > Piotrek
>> > >
>> > > [1] https://issues.apache.org/jira/browse/FLINK-18684
>> > > [2] https://issues.apache.org/jira/browse/FLINK-27133
>> > > [3] https://issues.apache.org/jira/browse/FLINK-27165
>> > > [4] https://issues.apache.org/jira/browse/FLINK-31745
>> > > [5]
>> > >
>> >
>> https://issues.apache.org/jira/browse/FLINK-35040?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20Resolved%2C%20Closed)%20AND%20text%20~%20%22serializerHeavyString%22
>> > > [6]
>> > >
>> >
>> http://flink-speed.xyz/timeline/#/?exe=1&ben=serializerHeavyString&extr=on&quarts=on&equid=off&env=2&revs=1000
>> > >
>> > > wt., 21 maj 2024 o 12:50 Rui Fan <1996fan...@gmail.com> napisał(a):
>> > >
>> > >> Hi devs:
>> > >>
>> > >> We(release managers of flink 1.20) wanna update one performance
>> > >> regresses to the flink dev mail list.
>> > >>
>> > >> # Background:
>> > >>
>> > >> The performance of serializerHeavyString starts regress since April
>> 3,
>> > >> and we created FLINK-35040[1] to follow it.
>> > >>
>> > >> In brief:
>> > >> - The performance only regresses for jdk 11, and Java 8 and Java 17
>> are
>> > >> fine.
>> > >> - The regression reason is upgrading commons-io version from 2.11.0
>> to
>> > >> 2.15.1
>> > >>   - This upgrading is done in FLINK-34955[2].
>> > >>   - The performance can be recovered after reverting the commons-io
>> > >> version
>> > >> to 2.11.0
>> > >>
>> > >> You can get more details from FLINK-35040[1].
>> > >>
>> > >> # Problem
>> > >>
>> > >> We try to generate the flame graph (wall mode) to analyze why
>> upgrading
>> > >> the commons-io version affects the performance. These flamegraphs can
>> > >> be found in FLINK-35040[1]. (Many thanks to Zakelly for generating
>> these
>> > >> flamegraphs from the benchmark server).
>> > >>
>> > >> Unfortunately, we cannot find any code of commons-io dependency is
>> > called.
>> > >> Also, we try to analyze if any other dependencies are changed during
>> > >> upgrading
>> > >> commons-io version. The result is no, other dependencies are totally
>> the
>> > >> same.
>> > >>
>> > >> # Request
>> > >>
>> > >> After the above analysis, we cannot find why the performance of
>> > >> serializerHeavyString
>> > >> starts to regress for jdk11.
>> > >>
>> > >> We are looking forward to hearing valuable suggestions from the Flink
>> > >> community.
>> > >> Thanks everyone in advance.
>> > >>
>> > >> Note:
>> > >> 1. I cannot reproduce the regression on my Mac with jdk11, and we
>> > suspect
>> > >>   this regression may be caused by the benchmark environment.
>> > >> 2. We will accept this regression if the issue still cannot be
>> solved.
>> > >>
>> > >> [1] https://issues.apache.org/jira/browse/FLINK-35040
>> > >> [2] https://issues.apache.org/jira/browse/FLINK-34955
>> > >>
>> > >> Best,
>> > >> Weijie, Ufuk, Robert and Rui
>> > >>
>> > >
>> >
>>
>

Reply via email to