shuttie commented on issue #10358: [FLINK-14346] [serialization] faster implementation of StringValue writeString and readString URL: https://github.com/apache/flink/pull/10358#issuecomment-561599467 @StephanEwen yes, most of the difference comes from multiple single-byte read operations. And CPU cannot parallelize them, as there is a data dependency between characters. Buffering the whole string we improve the situation with parallelism, so CPU can process multiple characters at once. I've considered breaking the serialization format for strings (and even did an experiment with this approach), but there are a ton of side-effects for end-users (like you've mentioned keys in rocksdb) and the only positive result of this was a slight improvement for long non-ascii strings, compared to the implementation in this PR. I guess it's not worth it :) I've also added a test for validating binary compatibility of this change.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services