viktorsomogyi edited a comment on pull request #9519:
URL: https://github.com/apache/kafka/pull/9519#issuecomment-725539926


   @ijuma I don't think I looked at it but will try it out soon, its speed 
seems promising. I haven't found a 128bit java implementation but I can try out 
the 64bit in lz4 as well. I need to see if the collision rate changes 
significantly if we reduce the hash size to 64. Theoretically for 100 million 
distinct keys the probability of collision is 1.469367×10^-23 with 128 bit 
while for 64 bits it's 2.710138×10^-4 which is significantly larger but it 
might be enough for the log cleaning use-case.
   
   @junrao I can compile a statistic for this but in my tests the collision 
rate was on the same level, sometimes slightly better, sometimes slightly 
worse. The attached image is what I collected manually for the largest test 
cases but I'll do something more elaborate if you think so :) - but based on 
this I think yes, the uniqueness of the generated hashes is on the same level.
   
![image](https://user-images.githubusercontent.com/1820518/98840553-3aa37180-2447-11eb-87e4-198856261d23.png)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to