Re: Memory usage of AAs?

Steven Schveighoffer Wed, 30 Mar 2011 06:36:15 -0700

On Tue, 29 Mar 2011 22:20:05 -0400, Nick Sabalausky <[email protected]> wrote:

"spir" <[email protected]> wrote in message
news:[email protected]...
On 03/30/2011 01:24 AM, Nick Sabalausky wrote:
My understanding of hash tables is that they allocate a fixed sizearray
and
map keys to indicies within the range 0..predefined_length_of_the_AA.

So I've been wondering, how many elements do D's built-in AAs have? And
what's the content of each one, just a single pointer?
Each element is a data structure, often called bucket (typically a link
list), storing (key:value) pairs for which the key, once hashed and
modulo-ed, maps to the given index. That's why the famous O(1) lookuptime
for hash tables is very theoretic: the constant part holds average time
for hashing which is very variable, plus another average time forlinearly
traversing the bucket. The latter part depends on table load factor (=
number of elements / number of buckets) and proper statisticalrepartition
of keys into buckets.

The key (!) points are finding a good hash func to "linearize" said
repartition (which depends on actual key-data domain, not only on their
type...), but better ones rapidly eat much time (ones used in practicearerather simple & fast); and finding optimal load factor, and growthscheme.
In practice, all of this tends to make hash tables an implementation
nightmare (for me). I'd love to find practicle alternatives, butefficient
binary trees also are complex and even more depend on kind of keys, I
guess.
Right, I know that, but that's not what I was asking. Take thishypothetical
implementation:

struct Bucket(TKey, TVal)
{
    ...
}

enum hashTableSize = ...;

struct Hash(TKey, TVal)
{
    Bucket!(TKey, TVal)[hashTableSize] data;

    TVal get(TKey key) { ... }
    void set(TKey key, TVal value) { ... }
}
I assume that D's AA are something at least vaguely like that. Myquestions
are:
1. What does D use for "hashTableSize"? Or does "hashTableSize" vary? Ifitvaries, what's a typical rough ballpark size? (And just out ofcuriosity, if
it varies, is it decided at compile-time, or does it change even at
runtime?)

It varies. The hash table size is not constant, the load factor is. Theload factor is the number of elements in the hash divided by the number ofbuckets. You never want to fill up all the spaces, because the more fullyou get, the more chances for collisions there are. Essentially, thetricky part about hashing is what to do about collisions (two elements aredifferent, but go in the same bucket).

So what happens is when the load factor exceeds a predefined constant(e.g. in dcollections the load factor defaults to .75), the table"rehashes", or increases (usually logarithmically) the size of the array,and re-inserts all its elements.

I believe there is a minimum array size, and things are increased fromthere. I think also you can do a manual "rehash" which should adjust thesize of the array to match a certain load factor (below the maximum).

In some implementations, hashes will even shrink when the load factor goesbelow a minimum (dcollections does not do this to avoid invalidatingranges). There are a million different ways to implement the basic hash.The most complex part though, is usually the collision handling. In myalgo book, there are at least 3 ways to handle collisions, and I thinkthere are countless more. If you look up hashing on wikipedia, you'll geta much better explanation.

2. What is "sizeof(Bucket(TKey, TVal))"? And I mean the shallow size, not
deep size. Is it dependent on TKey or TVal? Or is it just simply apointer
to the start of a linked list (and therefore "sizeof(size_t)")?


Here is the AA implementation:

https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d

From that page, you can see that AA is your bucket (note this is runtimestuff, so there are no templates), and BB is your Hash struct. It lookslike BB has an array of AA pointers.


-Steve

Re: Memory usage of AAs?

Reply via email to