In playing around trying to understand and make a better AA, I have updated again the dsource/projects/aa/trunk/druntime/aaA.d, this time to be a single linked list version.
In the template versions of various implementations, and in Java HashMap, I noticed that its nice to be able to specify the initial capacity, and a load ratio (nodes to table length), and see how performance varies. It would be nice to have an _aaInit function, where these things are specified. And corresponding .init property. And a .capacity (read and write) property. And a .loadRatio (double) property. How about a .statistics property? valuetype[keytype] aa. aa.init(capacity, loadratio). ...fill it up... uint[] stats = aa.statistics; The implementation template classes now have these things. Capacity and load ratio are pretty standard for AA. They could be added to the current AA as well, which I might get around to. The _aaInit function should push the TypeInfo_AssociativeArray, once and for all, and not have different TypeInfos and valuesizes, re-pushed to different functions, var-arged before keys and values, as if they might change at any time. _aaRehash may get a different TypeInfo when called directly than when called from inside _aaGet, which I am sure will be corrected soon. There may be a vague general principle here, of having classes that can be auto-initialized if they support certain properties. The AA is really a class type hiding in a reference, and it would be nice to support that directly. Since the implementation is a reference type, the compiler can check for nullness, and call an init thunk, before calling standard AA functions and properties that require non-null reference. Its either initialized or it isn't. The sizeof AA is 4, on my machine, not 8, as is supposed to be the common value in the D web site documentation on AA. Its the size of a pointer, so the site would be more accurate to just say its an opaque reference pointer sized type. Again the aaA.d has been given a runtime TypeInfo initialisation tweak so that it can shovel integer value keys straight into the hash value of the nodes, and avoid TypeInfo compare and getHash. It needs a bit more testing. Its easier to work and experiment with than the more complicated tree version. The current AA has nice balanced trees , but they do not make much of a performance difference at lower load ratios ( < 4 ), so that extra pointer seems to be a waste. The current AA will rehash before the load level becomes too sluggish. I have uploaded some better refined tests to compare the various implementations at dsource/aa. The linear congruential sequence generator variant is an example of "open addressing" hash table, and its performance is limited by the drawbacks of this model. The other implementations do better, waste less memory, and can handle table load ratios greater than 1. Performance varies very much with the sort of keys used, much more so than the implementation variants, and table loads. Wrapping a string in class and caching the hash value gets much better performance than raw string keys. The string hash distribution was a bit better with a hash function using 31 as the character multiplier rather than 11. -- May be I will get a job using Java soon..