On 7.10.2013 14:50, k...@rice.edu wrote: > On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote: >>> 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2]. >>> For fun, try not hashing those ints at all and see how that performs >>> (that, >>> I think, is what you get from HashSet<int> in Java/C#). >> >> I've used crc32 mostly because it's easily available in the code (i.e. >> I'm lazy), but I've done some quick research / primitive benchmarking >> too. For example hashing 2e9 integers takes this much time: >> >> FNV-1 = 11.9 >> FNV-1a = 11.9 >> jenkins = 38.8 >> crc32 = 32.0 >> >> So it's not really "slow" and the uniformity seems to be rather good. >> >> I'll try FNV in the future, however I don't think that's the main issue >> right now. >> > Hi Tomas, > > If you are going to use a function that is not currently in the code, > please consider xxhash: > > http://code.google.com/p/xxhash/ > > Here are some benchmarks for some of the faster hash functions: > > Name Speed Q.Score Author > xxHash 5.4 GB/s 10 > MumurHash 3a 2.7 GB/s 10 Austin Appleby > SpookyHash 2.0 GB/s 10 Bob Jenkins > SBox 1.4 GB/s 9 Bret Mulvey > Lookup3 1.2 GB/s 9 Bob Jenkins > CityHash64 1.05 GB/s 10 Pike & Alakuijala > FNV 0.55 GB/s 5 Fowler, Noll, Vo > CRC32 0.43 GB/s 9 > MD5-32 0.33 GB/s 10 Ronald L. Rivest > SHA1-32 0.28 GB/s 10
Hi, thanks for the link. I repeated the simple benchmark (code is here: http://pastebin.com/e9BS03MA) with these results: FNV duration = 9.821 FNVa duration = 9.753 jenkins duration = 37.658 crc32 duration = 7.127 xxhash duration = 68.814 The only difference is that this time I added -O3 (which I forgot last time). That's probably the reason why CRC32 even faster than FNV. Anyway, xxhash seems to be much slower than the other functions, at least in this particular case. I don't think the poor results necessarily contradict the table of results you've posted, because that's on "a large block of random data" while I'm hashing very short amounts of data (4B / 8B). So my guess is xxhash init takes much more time than the other algorithms. Chances are it'd be faster on large amounts of data, but that's not really useful. Maybe I'll revisit it in the future if I'll decide to add support for varlena types. Until then I'll stick with crc32 which is fast and has much better Quality score than FNV. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers