schlosna commented on PR #24473:
URL: https://github.com/apache/flink/pull/24473#issuecomment-1992844641
Thanks for taking a look at this PR.
> 1. What's your setup for the path ?
We have checkpoints writing to a variety of file systems depending on the
infrastructure, so it might be cloud blob storage (e.g. S3 or S3 like) or a
local Linux/POSIX filesystem when running on bare metal or a persistent volume
claim in kubernetes.
> 2. Could you also share the JFR after your optimization ?
I do not have a JFR for this running a modified Flink build that I can
share, but I created a simple [JMH Benchmark to compare the old vs. new
implementations](https://github.com/apache/flink/files/14580149/NormalizeBenchmark.java.txt)
that shows a ~5x allocation reduction, as well as a ~4x speedup on Intel &
~3.5x speedup on Apple M1 Pro.
```
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
VM version: JDK 17.0.10, OpenJDK 64-Bit Server VM, 17.0.10+8-LTS
Benchmark Mode Cnt Score
Error Units
NormalizeBenchmark.newNormalize avgt5 269.649 ±
23.957 ns/op
NormalizeBenchmark.newNormalize:gc.alloc.rate.norm avgt5 316.800 ±
0.001B/op
NormalizeBenchmark.oldNormalize avgt5 1119.999 ±
57.073 ns/op
NormalizeBenchmark.oldNormalize:gc.alloc.rate.norm avgt5 1603.200 ±
0.001B/op
```
```
2021 Apple MacBookPro M1 Pro
VM version: JDK 17.0.10, OpenJDK 64-Bit Server VM, 17.0.10+7-LTS
Benchmark Mode Cnt Score
Error Units
NormalizeBenchmark.newNormalize avgt5 167.362 ±
1.396 ns/op
NormalizeBenchmark.newNormalize:gc.alloc.rate.norm avgt5 316.800 ±
0.001B/op
NormalizeBenchmark.oldNormalize avgt5 598.058 ±
9.701 ns/op
NormalizeBenchmark.oldNormalize:gc.alloc.rate.norm avgt5 1579.200 ±
0.001B/op
```
Textual details from JFR for a test Flink pipeline where ~1% of all
allocations were due to `java.util.regex.Pattern` from
`org.apache.flink.core.fs.Path.normalizePath(String):243` via
`org.apache.flink.core.fs.Path.initialize(String, String, String)` &
`org.apache.flink.core.fs.Path.(String)` constructor:
```
Class Alloc Total Total Allocation (%)
- ---
int[] 2.468 GiB4.43237405980632 %
Stack Trace
Count Percentage
-
- --
java.util.regex.Pattern.compile()
18 21.2 %
java.util.regex.Pattern.(String, int)
18 21.2 %
java.util.regex.Pattern.compile(String)
17 20 %
java.lang.String.replaceAll(String, String)
17 20 %
org.apache.flink.core.fs.Path.normalizePath(String)
10 11.8 %
org.apache.flink.core.fs.Path.initialize(String, String, String)
10 11.8 %
org.apache.flink.core.fs.Path.(String)
5 5.88 %
org.apache.flink.core.fs.Path.(Path, String)
5 5.88 %
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckp