shuttie commented on issue #10358: [FLINK-14346] [serialization] faster 
implementation of StringValue writeString and readString
URL: https://github.com/apache/flink/pull/10358#issuecomment-561599467
 
 
   @StephanEwen yes, most of the difference comes from multiple single-byte 
read operations. And CPU cannot parallelize them, as there is a data dependency 
between characters. Buffering the whole string we improve the situation with 
parallelism, so CPU can process multiple characters at once.
   
   I've considered breaking the serialization format for strings (and even did 
an experiment with this approach), but there are a ton of side-effects for 
end-users (like you've mentioned keys in rocksdb) and the only positive result 
of this was a slight improvement for long non-ascii strings, compared to the 
implementation in this PR. I guess it's not worth it :)
   
   I've also added a test for validating binary compatibility of this change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to