Re: How does a dictionary work exactly?

2015-07-17 Thread Ned Batchelder
On Thursday, July 16, 2015 at 2:59:02 PM UTC-4, Skip Montanaro wrote:
 On Thu, Jul 16, 2015 at 1:36 PM, yoursurrogate...@gmail.com
 yoursurrogate...@gmail.com wrote:
  If I understand correctly, lookup would not be a constant, yes?
 
 On the contrary, that's what you desire, nearly constant time
 execution. To the greatest extent possible, you want the linked lists
 to be of length zero or one. Part of the magic is in figuring out good
 places to expand the size of the hash array. You don't want it to grow
 too big, but you still want most linked lists to be very short. The
 resize operation isn't done too often because it itself is expensive.
 I believe Python dicts start out with an overly large initial hash
 array (again, dredging up old memories of threads on python-dev) as an
 optimization to avoid lots of early resize operations.
 
 Skip

Maybe people are reading a different implementation than I am.  Python's
dict object doesn't use linked lists to deal with hash collisions, it probes
other slots instead.

Brandon Rhodes did a great talk about how dicts work:
http://pyvideo.org/video/276/the-mighty-dictionary-55

BTW: The Python 3 implementation is more complicated than in Python 2, I
think to deal with sharing keys among dictionaries that have the same set
of keys.

--Ned.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How does a dictionary work exactly?

2015-07-17 Thread Skip Montanaro
On Fri, Jul 17, 2015 at 9:32 AM, Ned Batchelder n...@nedbatchelder.com wrote:
 Maybe people are reading a different implementation than I am.  Python's
 dict object doesn't use linked lists to deal with hash collisions, it probes
 other slots instead.

No, I was working a) from memory, and b) not looking at the
implementation, which I last did a long, long time ago...

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


How does a dictionary work exactly?

2015-07-16 Thread yoursurrogate...@gmail.com
Hello,

I was trying to see how some have implemented a hashtable.  I took a gather at 
dictobject.h/.c.  It seems that underneath it all it's a linked list and that 
is used in order to store the actual information (I'm looking at PyDictEntry.)

Am I correct in my assumption or is there more to this?  I'm still looking into 
how new entries are handled.  

Thanks


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How does a dictionary work exactly?

2015-07-16 Thread Skip Montanaro
 I was trying to see how some have implemented a hashtable.  I took a gather 
 at dictobject.h/.c.  It seems that underneath it all it's a linked list and 
 that is used in order to store the actual information (I'm looking at 
 PyDictEntry.)

 Am I correct in my assumption or is there more to this?  I'm still looking 
 into how new entries are handled.

The Python dictionary implementation has been pretty well optimized
over the years, so it might not be the most easy-to-read code. You
might actually try and latch onto a copy of dictobject.[ch] from a
very old version of Python (1.5-ish). That will have much less in the
way of speed tricks obfuscating the code.

Very generally (I'm writing with a lot of water under the bridge since
I last thought about this), a dictionary contains an array whose
length is typically a power of two (2**n). When considering a key for
insertion or lookup, a hash value is computed, and the last n bits of
the resulting value are used as an index into that array. Each element
of the array consists of a linked list of all the key/value pairs
whose keys hash to that value. Once you've identified an element in
the hash array, the linked list is traversed looking for the key.
There are three operations: get, set, delete. Each operation has one
of two actions to perform depending whether the key is found or not:

* get - if found, return the corresponding value, if not, raise KeyError
* set - if found, replace the old value with the new, if not, add a
new key/value pair to the list
* del if found, delete the key/value pair, if not, raise KeyError

The challenge is to come up with a reasonable size hash array and a
suitable hash function which balances the desire to not chew up all of
memory with the desire to have very short lists. In Python's case, I
believe that dictionaries with strings as keys (and the hash function
used for strings) have been optimized for how Python's runtime works.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How does a dictionary work exactly?

2015-07-16 Thread Skip Montanaro
On Thu, Jul 16, 2015 at 1:36 PM, yoursurrogate...@gmail.com
yoursurrogate...@gmail.com wrote:
 If I understand correctly, lookup would not be a constant, yes?

On the contrary, that's what you desire, nearly constant time
execution. To the greatest extent possible, you want the linked lists
to be of length zero or one. Part of the magic is in figuring out good
places to expand the size of the hash array. You don't want it to grow
too big, but you still want most linked lists to be very short. The
resize operation isn't done too often because it itself is expensive.
I believe Python dicts start out with an overly large initial hash
array (again, dredging up old memories of threads on python-dev) as an
optimization to avoid lots of early resize operations.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list