On 7.10.2013 14:50, k...@rice.edu wrote:
> On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote:
>>> 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2].
>>>    For fun, try not hashing those ints at all and see how that performs 
>>> (that,
>>>    I think, is what you get from HashSet<int> in Java/C#).
>>
>> I've used crc32 mostly because it's easily available in the code (i.e.
>> I'm lazy), but I've done some quick research / primitive benchmarking
>> too. For example hashing 2e9 integers takes this much time:
>>
>> FNV-1   = 11.9
>> FNV-1a  = 11.9
>> jenkins = 38.8
>> crc32   = 32.0
>>
>> So it's not really "slow" and the uniformity seems to be rather good.
>>
>> I'll try FNV in the future, however I don't think that's the main issue
>> right now.
>>
> Hi Tomas,
> 
> If you are going to use a function that is not currently in the code,
> please consider xxhash:
> 
> http://code.google.com/p/xxhash/
> 
> Here are some benchmarks for some of the faster hash functions:
> 
> Name            Speed       Q.Score   Author
> xxHash          5.4 GB/s     10
> MumurHash 3a    2.7 GB/s     10       Austin Appleby
> SpookyHash      2.0 GB/s     10       Bob Jenkins
> SBox            1.4 GB/s      9       Bret Mulvey
> Lookup3         1.2 GB/s      9       Bob Jenkins
> CityHash64      1.05 GB/s    10       Pike & Alakuijala
> FNV             0.55 GB/s     5       Fowler, Noll, Vo
> CRC32           0.43 GB/s     9
> MD5-32          0.33 GB/s    10       Ronald L. Rivest
> SHA1-32         0.28 GB/s    10

Hi,

thanks for the link. I repeated the simple benchmark (code is here:
http://pastebin.com/e9BS03MA) with these results:

  FNV  duration    = 9.821
  FNVa duration    = 9.753
  jenkins duration = 37.658
  crc32 duration   = 7.127
  xxhash duration  = 68.814

The only difference is that this time I added -O3 (which I forgot last
time). That's probably the reason why CRC32 even faster than FNV.

Anyway, xxhash seems to be much slower than the other functions, at
least in this particular case.

I don't think the poor results necessarily contradict the table of
results you've posted, because that's on "a large block of random data"
while I'm hashing very short amounts of data (4B / 8B). So my guess is
xxhash init takes much more time than the other algorithms. Chances are
it'd be faster on large amounts of data, but that's not really useful.

Maybe I'll revisit it in the future if I'll decide to add support for
varlena types. Until then I'll stick with crc32 which is fast and has
much better Quality score than FNV.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to