@cblake - ok perfect. also naming things search-aware is important so I concur.

The python's OrderedDict: I didn't know that, it's a good thing nim is not 
alone then. And in fact the presence of this is enough to not go and rename it. 
(less disturbance)

In C++ the set/map is usually implemented with a red black tree. It fits a 
category which the language lawyers call a "node base container". Sorts are 
log(N) and happens at each operation that must trigger a rebalancing of the 
tree. (insert/remove)

The undered_set/unordered_map stuff are hash tables, implemented with an 
index(hashes)-array of pointers to linked lists. This allows for stable memory 
positioning of the values during the needs for rehashings. (decoupling index 
from values) Which allows operations to be "non iterator invalidating".

The hashes are not bound to the table's size, which means that they pass 
through a modulo operator before being used as indices. The statistics say that 
modulo is unfair in most cases, but the most fair for prime numbers. Which 
means that the "buckets" (hash table reservation size) are maintained and 
resized by jumps to the next reasonable prime number.

A problem of this scheme is that memory gets fragmented horribly, requiring 
allocators for serious work. I have written an open address hash map (here 
[https://sourceforge.net/projects/cgenericopenaddresshashmap/](https://sourceforge.net/projects/cgenericopenaddresshashmap/))
 that allows to keep everything in a flat buffer. This has always better 
performance from small types.

The nim implementation appears to require powers of two for the table size, 
which means that the modulo must be done by discarding the most significant 
bits of the hashes, which loses entropy compared to the prime numbers solution.

Reply via email to