crypto:md5 vs erlang:md5
------------------------

                 Key: COUCHDB-757
                 URL: https://issues.apache.org/jira/browse/COUCHDB-757
             Project: CouchDB
          Issue Type: Improvement
         Environment: GNU/Linux
            Reporter: Filipe Manana
         Attachments: crypto_md5.patch

Just noticed that crypto:md5 is faster than erlang:md5 by about an order of 
magnitude when hashing just 8Kb or 4Kb of data.
Basically we use md5 hashing when writing and reading documents and attachments 
through couch_file and couch_stream.

Eshell V5.8  (abort with ^G)
1> crypto:start().
ok
2> Bin1 = crypto:rand_bytes(4 * 1024).
<<92,239,233,29,1,237,96,193,188,97,4,72,51,90,96,91,187,
  112,112,198,7,173,105,99,205,65,105,94,144,...>>
3>        
3> {T1, _} = timer:tc(erlang, md5, [Bin1]).
{211,
 <<20,235,111,74,212,254,194,144,49,70,205,105,124,106,
   131,230>>}
4> 
4> {T2, _} = timer:tc(crypto, md5, [Bin1]).
{60,
 <<20,235,111,74,212,254,194,144,49,70,205,105,124,106,
   131,230>>}
5> 
5> Bin2 = crypto:rand_bytes(8 * 1024).     
<<246,66,158,227,62,127,62,239,202,232,133,244,191,9,136,
  6,164,179,109,166,253,41,144,185,177,39,177,88,142,...>>
6> 
6> {T3, _} = timer:tc(erlang, md5, [Bin2]).
{446,
 <<7,55,252,42,249,30,58,22,245,12,111,82,131,58,199,51>>}
7> 
7> {T4, _} = timer:tc(crypto, md5, [Bin2]).
{77,
 <<7,55,252,42,249,30,58,22,245,12,111,82,131,58,199,51>>}
8> 


I know there's a ticket around with the goal of the possibility to remove the 
dependency on the crypto module, but for environments where this is not a 
problem it would be a plus.

Made a test that wrote 400 attachments with about 60Kbs and noticed an average 
response time of 0.16s versus 0.18s (erlang:md5).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to