[ https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207505#comment-16207505 ]
Sam Tunnicliffe commented on CASSANDRA-13291: --------------------------------------------- {{RandomPartitioner::hashToBigInteger}} is double hashing its input (I think this is what Jason was referring to in his previous comment), and so its output doesn't match the previous implementation. When using RP, getting a token for a key is probably the hottest path for hashing. The current code uses a {{ThreadLocal<MessageDigest>}} which it resets after use, presumably to mitigate that. Under the covers {{Hasher.md5().hashBytes()}} clones a prototype {{MessageDigest}}, so this is going to result in a lot more instance creation. (See {{AbstractStreamingHashFunction::hashBytes -> MessageDigestHashFunction::newHasher}}). I'm not sure of the original motivations for the threadlocal, or whether those are still justified, but it seems like we should investigate outside of microbenchmarks before committing this. > Replace usages of MessageDigest with Guava's Hasher > --------------------------------------------------- > > Key: CASSANDRA-13291 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13291 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Michael Kjellman > Assignee: Michael Kjellman > Attachments: CASSANDRA-13291-trunk.diff > > > During my profiling of C* I frequently see lots of aggregate time across > threads being spent inside the MD5 MessageDigest implementation. Given that > there are tons of modern alternative hashing functions better than MD5 > available -- both in terms of providing better collision resistance and > actual computational speed -- I wanted to switch out our usage of MD5 for > alternatives (like adler128 or murmur3_128) and test for performance > improvements. > Unfortunately, I found given the fact we use MessageDigest everywhere -- > switching out the hashing function to something like adler128 or murmur3_128 > (for example) -- which don't ship with the JDK -- wasn't straight forward. > The goal of this ticket is to propose switching out usages of MessageDigest > directly in favor of Hasher from Guava. This means going forward we can > change a single line of code to switch the hashing algorithm being used > (assuming there is an implementation in Guava). -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org