On 06/23/2013 11:22 PM, Walter Bright wrote:
Profiling shows the calcHash function is a significant contributor to
compilation time (3.25% of total time). So making it faster is a win.
Even making dmd 1% faster would be a nice win - all those little drops
add up.


You'll find a benchmark at the end of the post.
http://forum.dlang.org/thread/op.v2yky9pdsqu...@dawg-freebsd.lan


Obviously the switch in a for loop is a branch prediction killer.
This should work much better.

for (size_t i = 0; i < len / 4; ++i)
{
    hash *= 37;
    hash += *(const uint32_t *)str;
    str += 4;
}
switch (len & 3)
{
case 0:
    break;
case 1:
    hash *= 37;
    hash += *(const uint8_t *)str;
    break;
case 2:
    hash *= 37;
    hash += *(const uint16_t *)str;
    break;
case 3:
    hash *= 37;
    hash += (*(const uint16_t *)str << 8) +
             ((const uint8_t *)str)[2];
    break;
default:
    assert(0);
}
return hash;

Finding a faster hash algorithm isn't simple because prime multiplication is so simple. At least we're not diving the hash result by 37 any longer. I think using the Hsieh hash is a good choice because it has quite an OK distribution so it leads to fewer collisions. It's also pretty fast for short strings and we already use it in druntime.

Reply via email to