[ 
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207505#comment-16207505
 ] 

Sam Tunnicliffe commented on CASSANDRA-13291:
---------------------------------------------

{{RandomPartitioner::hashToBigInteger}} is double hashing its input (I think 
this is what Jason was referring to in his previous comment), and so its output 
doesn't match the previous implementation.

When using RP, getting a token for a key is probably the hottest path for 
hashing. The current code uses a {{ThreadLocal<MessageDigest>}} which it resets 
after use, presumably to mitigate that. Under the covers 
{{Hasher.md5().hashBytes()}} clones a prototype {{MessageDigest}}, so this is 
going to result in a lot more instance creation. (See 
{{AbstractStreamingHashFunction::hashBytes -> 
MessageDigestHashFunction::newHasher}}).

I'm not sure of the original motivations for the threadlocal, or whether those 
are still justified, but it seems like we should investigate outside of 
microbenchmarks before committing this.


> Replace usages of MessageDigest with Guava's Hasher
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13291
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>         Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across 
> threads being spent inside the MD5 MessageDigest implementation. Given that 
> there are tons of modern alternative hashing functions better than MD5 
> available -- both in terms of providing better collision resistance and 
> actual computational speed -- I wanted to switch out our usage of MD5 for 
> alternatives (like adler128 or murmur3_128) and test for performance 
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --  
> switching out the hashing function to something like adler128 or murmur3_128 
> (for example) -- which don't ship with the JDK --  wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest 
> directly in favor of Hasher from Guava. This means going forward we can 
> change a single line of code to switch the hashing algorithm being used 
> (assuming there is an implementation in Guava).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to