[ 
https://issues.apache.org/jira/browse/CASSANDRA-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210932#comment-16210932
 ] 

Sam Tunnicliffe commented on CASSANDRA-13291:
---------------------------------------------

The comment in {{o.a.c.utils.MD5Digest}} says that the 
{{ThreadLocal<MessageDigest>}} only really exists for the benefit of 
{{GuidGenerator}} & {{RandomPartitioner}}, but we also use it for computing 
prepared statement ids in {{QueryProcessor}}.

Only 2 functions in {{GuidGenerator}} are actually used, both by RP, so these 
can be moved there and GG removed entirely.

This also highlights some inconsistency between the calculation of ids for 
statements and resultset metadata. Prior to this patch, both calculations used 
the TL on {{FBUtilities}}. Now the former uses the relocated TL in 
{{MD5Digest}}, via it's {{compute}} method, whereas the latter has been 
switched to use a single-use {{Hasher}} instance. Is it a goal to be able to 
switch out the hashing algorithm for these 2 cases too? At least for 
consistency's sake, it seems like either both or neither should be using 
{{HashingUtils}}, although this leads to my next point.

Can we clarify the scope of the planned future changes? The original 
description says that profiling shows a lot of time spent in {{MessageDigest}}, 
but it's not clear whether that's simply in replicas generating digests for 
{{ReadCommand}} or in all usages (i.e. including RP and the statement/metadata 
ids too). If the goal for this ticket is to enable us to switch all possible 
MD5 uses (i.e. everything except RP), then it also needs to deal with the use 
of {{MD5Digest}} for metadata/statement ids otherwise there's going to be 
further refactoring required when we come to make the switch. 

On the other hand, if we only care about digest calculation for 
{{ReadCommands}} that's fine, but I think in that case we need to:
* be clear about the scope and the fact that we're not eliminating MD5 
MessageDigest as it will still be used on the majority of requests, at least on 
the coordinator 
* ensure that everywhere an instance of {{MD5Digest}} is passed, it doesn't 
originate from {{HashingUtils.CURRENT_HASH_FUNCTION}}.


> Replace usages of MessageDigest with Guava's Hasher
> ---------------------------------------------------
>
>                 Key: CASSANDRA-13291
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13291
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>         Attachments: CASSANDRA-13291-trunk.diff
>
>
> During my profiling of C* I frequently see lots of aggregate time across 
> threads being spent inside the MD5 MessageDigest implementation. Given that 
> there are tons of modern alternative hashing functions better than MD5 
> available -- both in terms of providing better collision resistance and 
> actual computational speed -- I wanted to switch out our usage of MD5 for 
> alternatives (like adler128 or murmur3_128) and test for performance 
> improvements.
> Unfortunately, I found given the fact we use MessageDigest everywhere --  
> switching out the hashing function to something like adler128 or murmur3_128 
> (for example) -- which don't ship with the JDK --  wasn't straight forward.
> The goal of this ticket is to propose switching out usages of MessageDigest 
> directly in favor of Hasher from Guava. This means going forward we can 
> change a single line of code to switch the hashing algorithm being used 
> (assuming there is an implementation in Guava).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to