divijvaidya commented on PR #13312:
URL: https://github.com/apache/kafka/pull/13312#issuecomment-1448299260

   > It is interesting that unrolling the loop while keeping the same 
underlying logic yields this increase in throughput for writing varints
   
   Yes, indeed. In non hotspot code paths, which would be over-optimization and 
we may just want to leave it to the compiler to perform unrolling. But in hot 
code paths such as this PR, manual changes could lead to faster execution of 
cases where the compiler is not smart enough.
   
   As example of a prior change of similar nature, see: 
https://github.com/apache/kafka/pull/11721 (that PR claims to improve CPU by 6% 
in a real workload)
   
   > What CPU architectures have this been tested on?
   
   The processor architecture for the above numbers is `Intel(R) Xeon(R) 
Platinum 8175M CPU @ 2.50GHz`. I have also added number for ARM now.
   
   > Do we have any data on how these gains translate in latency reduction with 
real requests from canonical workloads?
   
   I haven't performed that benchmark right now. I believe that the JMH 
benchmark gains should be good enough justification to merge this change. But 
as an aside, yes, I do plan to benchmark latency reduction later when my other 
performance related PRs (there are some compression ones open) have also been 
merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to