Roman Grebennikov created FLINK-14346:
-----------------------------------------

             Summary: Performance issue with StringSerializer
                 Key: FLINK-14346
                 URL: https://issues.apache.org/jira/browse/FLINK-14346
             Project: Flink
          Issue Type: Improvement
          Components: API / Type Serialization System, Benchmarks
    Affects Versions: 1.9.0
         Environment: Tested on Flink 1.9.0, adoptopenjdk 8u222.
            Reporter: Roman Grebennikov


While doing a performance profiling for our Flink state-heavy streaming job, we 
found that quite  a significant amount of CPU time is spent inside 
StringSerializer writing data to the underlying byte buffer. The hottest part 
of the code is the StringValue.writeString function. And replacing the default 
StringSerializer with the custom one (to just play with a baseline), which is 
just calling DataOutput.writeUTF/readUTF surprisingly yielded to almost 2x 
speedup for string serialization.

As writeUTF and writeString have incompatible wire formats, replacing latter 
with former is not a good idea in general as it may break checkpoint/savepoint 
compatibility.

We also did an early performance analysis of the root cause of this performance 
issue, and the main reason of JDK's writeUTF being faster is that it's code is 
not writing directly to output stream byte-by-byte, but instead creating an 
underlying temporary byte buffer. This yields to a HotSpot almost perfectly 
unrolling the main loop, which results in much better data parallelism.

I've tried to port the ideas from the JVM's implementation of writeUTF back to 
StringValue.writeString, and my current result is nice, having quite 
significant speedup compared to the current implementation:

{{[info] Benchmark Mode Cnt Score Error Units}}
{{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
{{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
{{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}

 

{{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the 
current upstream implementation in Flink, and the measureNew is the improved 
one. }}

 

{{The code for the benchmark (and the improved version of the serializer) is 
here: [https://github.com/shuttie/flink-string-serializer]}}

 

{{Next steps:}}
 # {{More benchmarks for non-ascii strings.}}
 # {{Benchmarks for long strings.}}
 # {{Benchmarks for deserialization.}}
 # {{Tests for old-new wire format compatibility.}}
 # {{PR to the Flink codebase.}}

{{Is there an interest for this kind of performance improvement?}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to