Antoine Pitrou <pit...@free.fr> added the comment: > For example, on the x64 machine the following dict() mapping > 10,000,000 very short unicode keys (~7 chars) to integers eats 149 > bytes per entry.
This is counting the keys too. Under 3.x: >>> d = {} >>> for i in range(0, 10000000): d[str(i)] = i ... >>> sys.getsizeof(d) 402653464 >>> sys.getsizeof(d) / len(d) 40.2653464 So, the dict itself uses ~40 bytes/entry. Since this is a 64-bit Python build, each entry uses three words of 8 bytes each, that is 24 bytes per entry (one word for the key, one word for the associated value, one word for the cached hash value). So, you see the ratio of allocated entries in the hash table over used entries is only a bit above 2, which is reasonable. Do note that unicode objects themselves are not that compact: >>> sys.getsizeof("1000000") 72 If you have many of them, you might use bytestrings instead: >>> sys.getsizeof(b"1000000") 40 I've modified your benchmark to run under 3.x and will post it in a later message (I don't know whether bio.trie exists for 3.x, though). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9520> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com