viktorsomogyi commented on pull request #9519:
URL: https://github.com/apache/kafka/pull/9519#issuecomment-725539926


   @ijuma I don't think I looked at it but will try it out soon, its speed 
seems promising. I haven't found a 128bit java implementation but the 64bit in 
lz4 can be tried out as well. I need to see if the collision rate changes 
significantly if we reduce the hash size to 64. Theoretically for 100 million 
distinct keys the probability of collision is 1.469367×10^-23 with 18 bit while 
for 64 bits it's 2.710138×10^-4 which is significantly larger but it might be 
enough for the log cleaning use-case.
   
   @junrao I can compile a statistic for this but in my tests the collision 
rate was on the same level, sometimes slightly better, sometimes slightly 
worse. The attached image is what I collected manually for the largest test 
cases but I'll do something more elaborate if you think so :)
   
![image](https://user-images.githubusercontent.com/1820518/98840553-3aa37180-2447-11eb-87e4-198856261d23.png)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to