[
https://issues.apache.org/jira/browse/FLINK-37435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935021#comment-17935021
]
Kurt Ostfeld edited comment on FLINK-37435 at 3/13/25 1:03 AM:
---------------------------------------------------------------
I created a new benchmark in the flink-benchmarks project with two files:
[https://gist.github.com/kurtostfeld/1a6a6cf1a73d85f238fe0522be6f2d43]
[https://gist.github.com/kurtostfeld/a7e7bdc36a26bfb793c9d01b1a8520d4]
I'm not checking this in. You can copy these two source files into the source
tree and run the benchmark via:
```
mvn package
java -jar target/benchmarks.jar -rf csv
"org.apache.flink.benchmark.full.KryoBenchmark"
```
It results in (using my laptop with Temurin openjdk 17 distribution):
Benchmark Mode Cnt Score Error Units
KryoBenchmark.readKryoBaseline thrpt 25 534.628 ± 6.197 ops/ms
KryoBenchmark.readKryoVersionB thrpt 25 542.362 ± 7.574 ops/ms
KryoBenchmark.readKryoVersionC thrpt 25 537.827 ± 8.429 ops/ms
KryoBenchmark.readKryoVersionD thrpt 25 816.206 ± 11.167 ops/ms
KryoBenchmark.readKryoVersionE thrpt 25 1255.128 ± 49.761 ops/ms
KryoBenchmark.readKryoVersionF thrpt 25 2251.305 ± 99.973 ops/ms
KryoBenchmark.readKryoVersionG thrpt 25 4069.846 ± 820.285 ops/ms
To explain the results, starting from the slowest baseline benchmark that is
mirroring PojoSerializationBenchmark.readKryo to the fastest benchmark:
- KryoBenchmark.readKryoBaseline (534.628 ops/ms). This simply mirrors the
official PojoSerializationBenchmark.readKryo benchmark.
- KryoBenchmark.readKryoVersionB (542.362 ops/ms). This is an expanded for
clarity version of the baseline benchmark with nearly identical benchmark
results.
- KryoBenchmark.readKryoVersionC (537.827 ops/ms). This version removes
unnecessary layers of InputStream wrappers. This provides no performance
improvement.
- KryoBenchmark.readKryoVersionD (816.206 ops/ms). This version switches from
NoFetchingInput to OldNoFetchInput which is a near copy/paste of
NoFetchingInput from before the Kryo upgrade.
- KryoBenchmark.readKryoVersionE (1255.128 ops/ms). This version switches from
OldNoFetchInput to Input.
- KryoBenchmark.readKryoVersionF (2251.305 ops/ms). This switches from the
heavily customized Kryo created by Flink KryoSerializer to a much simpler Kryo
configuration.
- KryoBenchmark.readKryoVersionG (4069.846 ops/ms). This does Input -> byte[]
where the previous benchmarks do Input -> ByteArrayInputStream -> byte[].
To summarize, that's a ~8x performance difference from the way
PojoSerializationBenchmark.readKryo works to a more optimized version caused by
three changes:
1. NoFetchingInput -> OldNoFetchingInput -> Input.
2. Simple Kryo config vs complex Kryo config done by Flink KryoSerializer
3. Input -> byte[] instead of Input -> ByteArrayInputStream -> byte[]
* It looks like the OldNoFetchingInput -> NoFetchingInput changes made during
the Kryo upgrade may have caused the performance drop. It's not as simple as
rolling back those changes. The old NoFetchingInput was causing errors with
Kryo 5.
* The only significant changes to the NoFetchingInput class is in the require
method. The new require method is mostly a copy/paste from the Kryo 5 Input
class with changes so that it will never read ahead more than required, which
is the point of the NoFetching variation.
* Kryo 5 Input runs faster than either the old or new version of
NoFetchingInput because it will cache or read ahead more than needed. The Flink
framework doesn't like that, hence the NoFetching variations.
One option to consider for performance would be to add a TypeSerializer
deserialize option to deserialize an object straight from a byte[].
The other changes can make this benchmark much faster, but can't be easily
dropped-in without bigger architectural changes.
was (Author: JIRAUSER300008):
I created a new benchmark in the flink-benchmarks project with two files:
[https://gist.github.com/kurtostfeld/1a6a6cf1a73d85f238fe0522be6f2d43]
[https://gist.github.com/kurtostfeld/a7e7bdc36a26bfb793c9d01b1a8520d4]
I'm not checking this in. You can copy these two source files into the source
tree and run the benchmark via:
```
mvn package
java -jar target/benchmarks.jar -rf csv
"org.apache.flink.benchmark.full.KryoBenchmark"
```
It results in (using my laptop with Temurin openjdk 17 distribution):
Benchmark Mode Cnt Score Error Units
KryoBenchmark.readKryoBaseline thrpt 25 534.628 ± 6.197 ops/ms
KryoBenchmark.readKryoVersionB thrpt 25 542.362 ± 7.574 ops/ms
KryoBenchmark.readKryoVersionC thrpt 25 537.827 ± 8.429 ops/ms
KryoBenchmark.readKryoVersionD thrpt 25 816.206 ± 11.167 ops/ms
KryoBenchmark.readKryoVersionE thrpt 25 1255.128 ± 49.761 ops/ms
KryoBenchmark.readKryoVersionF thrpt 25 2251.305 ± 99.973 ops/ms
KryoBenchmark.readKryoVersionG thrpt 25 4069.846 ± 820.285 ops/ms
To explain the results, starting from the slowest baseline benchmark that is
mirroring PojoSerializationBenchmark.readKryo to the fastest benchmark:
- KryoBenchmark.readKryoBaseline (534.628 ops/ms). This simply mirrors the
official PojoSerializationBenchmark.readKryo benchmark.
- KryoBenchmark.readKryoVersionB (542.362 ops/ms). This is an expanded for
clarity version of the baseline benchmark with nearly identical benchmark
results.
- KryoBenchmark.readKryoVersionC (537.827 ops/ms). This version removes
unnecessary layers of InputStream wrappers. This provides no performance
improvement.
- KryoBenchmark.readKryoVersionD (816.206 ops/ms). This version switches from
NoFetchingInput to OldNoFetchInput which is a near copy/paste of
NoFetchingInput from before the Kryo upgrade.
- KryoBenchmark.readKryoVersionE (1255.128 ops/ms). This version switches from
OldNoFetchInput to Input.
- KryoBenchmark.readKryoVersionF (2251.305 ops/ms). This switches from the
heavily customized Kryo created by Flink KryoSerializer to a much simpler Kryo
configuration.
- KryoBenchmark.readKryoVersionG (4069.846 ops/ms). This does Input -> byte[]
where the previous benchmarks do Input -> ByteArrayInputStream -> byte[].
To summarize, that's a ~8x performance difference from the way
PojoSerializationBenchmark.readKryo works to a more optimized version caused by
three changes:
1. NoFetchingInput -> OldNoFetchingInput -> Input.
2. Simple Kryo config vs complex Kryo config done by Flink KryoSerializer
3. Input -> byte[] instead of Input -> ByteArrayInputStream -> byte[]
* It looks like the OldNoFetchingInput -> NoFetchingInput changes made during
the Kryo upgrade may have caused the performance drop. It's not as simple as
rolling back those changes. The old NoFetchingInput was causing errors with
Kryo 5.
* The only significant changes to the NoFetchingInput class is in the require
method. The new require method is mostly a copy/paste from the Kryo 5 Input
class with changes so that it will never read ahead more than required, which
is the point of the NoFetching variation.
The other changes can make this benchmark much faster, but can't be easily
dropped-in without bigger architectural changes.
> Kryo related perf regression since March 5th
> --------------------------------------------
>
> Key: FLINK-37435
> URL: https://issues.apache.org/jira/browse/FLINK-37435
> Project: Flink
> Issue Type: Bug
> Components: API / Type Serialization System, Benchmarks
> Affects Versions: 2.0.0
> Reporter: Zakelly Lan
> Priority: Major
> Attachments: image-2025-03-07-12-29-54-443.png,
> profile-results-after.zip, profile-results-before.zip
>
>
> Seems a obvious regression across all java version.
> http://flink-speed.xyz/timeline/?exe=6%2C12%2C13&base=&ben=readKryo&env=3&revs=200&equid=off&quarts=on&extr=on
> http://flink-speed.xyz/timeline/?exe=6%2C12%2C13&base=&ben=serializerKryo&env=3&revs=200&equid=off&quarts=on&extr=on
--
This message was sent by Atlassian Jira
(v8.20.10#820010)