On Monday, 24 June 2013 at 11:11:42 UTC, qznc wrote:
On Sunday, 23 June 2013 at 21:22:40 UTC, Walter Bright wrote:
https://github.com/D-Programming-Language/dmd/blob/master/src/root/stringtable.c#L21
Profiling shows the calcHash function is a significant
contributor to compilation time (3.25% of total time). So
making it faster is a win. Even making dmd 1% faster would be
a nice win - all those little drops add up.
There are many, many string hash functions findable through
google. Anyone want to spend the effort to make a faster one?
Remember, ya gotta prove it's faster!
A nice timing test would be the time expending compiling
Phobos. I would suggest that the 64 bit build of dmd be used
for timing tests.
Also, be careful, many of those hash functions on the
intarnets have a license that makes it unusable for dmd.
My first idea was to look at Python, since it requires a lot of
hash calculation dynamically and probably is one of the most
tuned implementations. Interesting how naive it is:
http://hg.python.org/cpython/file/7aab60b70f90/Objects/object.c#l759
Just a simple for loop over chars. No switch.
Wikipedia: "Python Software Foundation License (PSFL) is a
BSD-style permissive free software license"
Python is one of the slower interpreted languages. It would be
more interesting to look at luajit which actually does something
clever.
If the string is at least 4 chars long it only hashes the first 4
bytes, the last 4, the 4 starting at floor(len/2)-2 and the 4
starting at floor(len/4)-1. Any of these may overlap of course
but that isn't a problem.
The code is MIT licensed and can be found here:
http://repo.or.cz/w/luajit-2.0.git/blob/053041a9f47e3d341f98682ea1e4907a578e4920:/src/lj_str.c#l104
As others have mentioned it will be hard to improve the speed
enough to get dmd 1% faster - but if anything can it's dirty
tricks.