Just a note from my own experience way back when: I tried hashing using various algos and measuring bucket use as the main comparison criteria. I found that the crypto hashes left a fair number of unused buckets. Of course, CRCs were far worse. What gave the most normal distribution was to simply take the bytes 4 at a time as 32-bit integers and simply add them.
Not to say that this is really the holy grail, just pointing out that compute speed shouldn't be the only criteria you use for comparison. On Wed, 2017-01-11 at 15:43 +0100, Richard Levitte wrote: > A note: I have absolutely nothing against the addition of SIPhash in > our collection of hash algos. My scepticism was only in regards to > using it as a string hasher for our hash tables indexes. > > Cheers, > Richard > > In message <20170111.153458.1623912899597806811.levi...@openssl.org> on Wed, > 11 Jan 2017 15:34:58 +0100 (CET), Richard Levitte <levi...@openssl.org> said: > > levitte> In message > <1e19cdfea8224717b3eee11e2d8ac...@usma1ex-dag1mb1.msg.corp.akamai.com> on > Wed, 11 Jan 2017 03:13:39 +0000, "Salz, Rich" <rs...@akamai.com> said: > levitte> > levitte> rsalz> The needs for OpenSSL's LHASH are exactly what SipHash was > designed for: fast on short strings. > levitte> rsalz> OpenSSL's hash currently *does not* call MD5 or SHA1; the MD5 > code is commented out. > levitte> rsalz> Yes, performance tests would greatly inform the decision. > levitte> > levitte> Done, using the reference siphash implementation. > levitte> > levitte> https://github.com/levitte/openssl/tree/test-string-hashes > levitte> > levitte> A run on my laptop gave these results: > levitte> > levitte> : ; ./util/shlib_wrap.sh apps/openssl speed siphash lhash > levitte> Doing lhash for 3s on 16 size blocks: 27635188 lhash's in 3.00s > levitte> Doing lhash for 3s on 64 size blocks: 6934726 lhash's in 3.00s > levitte> Doing lhash for 3s on 256 size blocks: 1698489 lhash's in 3.00s > levitte> Doing lhash for 3s on 1024 size blocks: 431185 lhash's in 3.00s > levitte> Doing lhash for 3s on 8192 size blocks: 53868 lhash's in 3.00s > levitte> Doing lhash for 3s on 16384 size blocks: 27041 lhash's in 3.00s > levitte> Doing siphash for 3s on 16 size blocks: 22488748 siphash's in > 3.00s > levitte> Doing siphash for 3s on 64 size blocks: 10485674 siphash's in > 3.00s > levitte> Doing siphash for 3s on 256 size blocks: 3320898 siphash's in > 3.00s > levitte> Doing siphash for 3s on 1024 size blocks: 894647 siphash's in > 3.00s > levitte> Doing siphash for 3s on 8192 size blocks: 114170 siphash's in > 3.00s > levitte> Doing siphash for 3s on 16384 size blocks: 57151 siphash's in > 3.00s > levitte> OpenSSL 1.1.1-dev xx XXX xxxx > levitte> built on: reproducible build, date unspecified > levitte> options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) > blowfish(ptr) > levitte> compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_THREADS > -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 > -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM > -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM > -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM > -DOPENSSLDIR="\"/usr/local/ssl\"" > -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -DDEBUG_UNUSED -Wswitch > -DPEDANTIC -pedantic -Wno-long-long -Wall -Wsign-compare -Wmissing-prototypes > -Wshadow -Wformat -Wtype-limits -Werror -Wa,--noexecstack > levitte> The 'numbers' are in 1000s of bytes per second processed. > levitte> type 16 bytes 64 bytes 256 bytes 1024 bytes > 8192 bytes 16384 bytes > levitte> lhash 147387.67k 147940.82k 144937.73k > 147177.81k 147095.55k 147679.91k > levitte> siphash 119939.99k 223694.38k 283383.30k > 305372.84k 311760.21k 312120.66k > levitte> > levitte> So it seems that for short strings, OPENSSL_LH_strhash (*) wins some, > levitte> while siphash wins big for larger strings. > levitte> > levitte> I have no idea how they compare with regard to distribution (which, > levitte> considering I ask for the same size output from both, should be the > levitte> main factor that affects the sensitivity to hash flooding)... > levitte> > levitte> Our use of OPENSSL_LH_strhash() is with configuration sections and > levitte> names, ASN.1 object names and the function names in the openssl app. > levitte> All our other uses of lhash use their own hashing functions, and are > levitte> usually very short still (definitely less than 16 bytes). > levitte> > levitte> My conclusion is that performance-wise, siphash doesn't give us any > levitte> advantage over OpenSSL_LH_strhash for our uses. > levitte> > levitte> Cheers, > levitte> Richard > levitte> > levitte> (*) Strictly speaking, it's a modified version that takes a length > and > levitte> tolerates all 8-bit bytes, including 0x00. > levitte> > levitte> -- > levitte> Richard Levitte levi...@openssl.org > levitte> OpenSSL Project http://www.openssl.org/~levitte/ > levitte> -- > levitte> openssl-dev mailing list > levitte> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev > levitte> -- openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev