On 12/12/2012 01:15 AM, Nick Coghlan wrote:
On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <di...@microsoft.com
<mailto:di...@microsoft.com>> wrote:
OTOH changing certain dictionaries in IronPython (such as keyword
args) to be
ordered would certainly be possible. Personally I just wouldn't
want to see it
be the default as that seems like unnecessary overhead when the
specialized
class exists.
Which reminds me, I was going to note that one of the main gains with
ordered keyword arguments, is their use in the construction of
string-keyed objects where you want to be able to control the order of
iteration (e.g. for serialisation or display purposes). Currently you
have to go the path of something like namedtuple where you define the
order of iteration in one operation, and set the values in another.
So here's a brand new argument against ordered dicts: The existence of
perfect hashing schemes. They fundamentally conflict with ordered dicts.
I played with using them for vtable dispatches in Cython this summer,
and they can perform really, really well for branch-predicted lookups in
hot loops, because you always/nearly always eliminate linear probing and
so there's no branch misses or extra comparisons. (The overhead of a
perfect hash table lookup over a traditional vtable lookup was only a
couple of cycles in my highly artificial fully branch-predicted
micro-benchmark.)
There's some overhead in setup; IIRC, ~20 microseconds for 64 elements,
2 GHz CPU, though that was a first prototype implementation and both
algorithmic improvements and tuning should be possible.
So it's not useful for everything, but perhaps for things like module
dictionaries and classes an "optionally perfect" dict can make sense.
Note: I'm NOT suggesting the use of perfect hashing, just making sure
it's existence is mentioned and that one is aware that if always-ordered
dicts become the language standard it precludes this option far off in
the future.
(Something like a sort() method could still work and make the dict
"unperfect"; one could also have a pack() method that made the dict
perfect again.).
That concludes the on-topic parts of my post.
--
Dag Sverre Seljebotn
APPENDIX
Going off-topic for those who are interested, here's the longwinded and
ugly details. My code [1] is based on the paper [2] (psuedo-code in
Appendix A), but I adapted it a bit to be useful for tens/hundreds of
elements rather than billions.
The ingredients:
1) You need the hash to be 32 bits (or 64) of good entropy (md5 or
murmurhash or similar). (Yes, that's a tall order for CPython, I'm just
describing the scheme.) (If the hash collides on all bits you *will*
collide, so some fallback is still necesarry, just unlikely.)
2) To lookup, the idea is (psuedo-code!)
typedef struct {
int m_f m_g, r, k;
int16_t d[k]; /* "small" int, like current proposal */
} table_header_t;
And then one computes index of an element with hash "h" using the function
((h >> tab->r) & tab->m_f) ^ tab->d[h & tab->m_g]
rather than the usual "h % n". While more arithmetic, arithmetic is
cheap and branch misses are not.
3) To set up/repack a table one needs to find the parameters. The
general idea is:
a) Partition the hashes into k bins by using "h & m_g". There will be
collisions, but the number of bins with many collisions will be very
small; most bins will have 2 or 1 or 0 elements.
b) Starting with the largest bin, distribute the elements according to
the hash function. If a bin collides with the existing contents, try
another value for d[binindex] until it doesn't.
The r parameter let's you try again 32 (or 64) times to find a solution.
In my testcases there was ~0.1% chance of not finding a solution (that
is, exhausting possible choices of r) with 64-bit hashes with 4 or 8
elements and no empty table elements. For any other number of elements,
or with some empty elements, the chance of failure was much lower.)
[1] It's not exactly a great demo, but it contains the algorithm. If
there's much interest I should clean it up and make a proper benchmark
demo out of it:
https://github.com/dagss/pyextensibletype/blob/perfecthash/include/perfecthash.h
[2] Pagh (1999)
http://www.brics.dk/RS/99/13/BRICS-RS-99-13.ps.gz
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com