Because I was always curious to do some hashmap profiling with real
data, I did some more. Here are the results:
My implementation. Power of two size (25% free):
building hashmap: 8 seconds 28880605 ticks
looking entries up: 0 seconds 31685 ticks
My implementation. Prime number size (25% free):
building hashmap: 8 seconds 26802252 ticks
looking entries up: 0 seconds 27719 ticks
My implementation. Prime number size (10% free):
building hashmap: 8 seconds 29323952 ticks
looking entries up: 0 seconds 32768 ticks
My implementation. Prime number size (20% free):
26877474 entries
building hashmap: 8 seconds 28895211 ticks
sum 74128
looking entries up: 0 seconds 33222 ticks
My implementation. Prime number size (50% free):
building hashmap: 8 seconds 27738475 ticks
looking entries up: 0 seconds 25696 ticks
OrderedAA (implementation of Daniel Kozak):
building array: 13 seconds 45663219 ticks
lookup: 0 seconds 236323 ticks
Builtin AA:
building array: 14 seconds 47175035 ticks
lookup: 0 seconds 33692 ticks
You can see, that both my implementation and daniel kozaks outperform
the builtin AA when filling the AA. Both my and daniel kozaks version
preallocate the internal array. Although daniel kozaks version does one
additional allocation per entry to build the double linked lists.
When not preallocating my implementation still outperforms the builtin AA.
When looking up data, my implementation outperforms both other
implementations. My implementation is only slightly faster then the
builtin one. Daniel Kozaks version is 8 times slower during lookup
compared to my implementation. I think this is mostly due to cache
misses caused by the linked list storage of the entries.
It is also interresting to note, that using prime number sizes for the
internal array of the AA improved the lookup performance slightly in my
implementation. A certain portion of all entries in the internal array
are always kept free. Reducing this amount leads to worse performance
during looup, increasing it, gives better performance. You can gain a
few percent speed if you are willing to waste 50% memory. My
measurements show that 25% free entries seem to be the sweet spot.