> I was trying to see how some have implemented a hashtable.  I took a gather 
> at dictobject.h/.c.  It seems that underneath it all it's a linked list and 
> that is used in order to store the actual information (I'm looking at 
> PyDictEntry.)
>
> Am I correct in my assumption or is there more to this?  I'm still looking 
> into how new entries are handled.

The Python dictionary implementation has been pretty well optimized
over the years, so it might not be the most easy-to-read code. You
might actually try and latch onto a copy of dictobject.[ch] from a
very old version of Python (1.5-ish). That will have much less in the
way of speed tricks obfuscating the code.

Very generally (I'm writing with a lot of water under the bridge since
I last thought about this), a dictionary contains an array whose
length is typically a power of two (2**n). When considering a key for
insertion or lookup, a hash value is computed, and the last n bits of
the resulting value are used as an index into that array. Each element
of the array consists of a linked list of all the key/value pairs
whose keys hash to that value. Once you've identified an element in
the hash array, the linked list is traversed looking for the key.
There are three operations: get, set, delete. Each operation has one
of two actions to perform depending whether the key is found or not:

* get - if found, return the corresponding value, if not, raise KeyError
* set - if found, replace the old value with the new, if not, add a
new key/value pair to the list
* del if found, delete the key/value pair, if not, raise KeyError

The challenge is to come up with a reasonable size hash array and a
suitable hash function which balances the desire to not chew up all of
memory with the desire to have very short lists. In Python's case, I
believe that dictionaries with strings as keys (and the hash function
used for strings) have been optimized for how Python's runtime works.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to