Antoine Pitrou <[email protected]> added the comment:
So, here is the modified benchmark. It first creates a cache of the random
wordlist, because it is quite long to compute (for N up to 10000000). The cache
is reused in subsequent runs. This takes some memory though (probably won't run
it if you have less than 2GB). The cache itself is quite small on-disk (around
80 MB).
I'm outputting the total dict size and the space occupied per individual key,
and I've also timed lookups separately. Here are results on my machine:
1000 words ( 961 keys), 3269137 words/s, 16980987 lookups/s, 51
bytes/key (0.0MB)
10000 words ( 9042 keys), 2697648 words/s, 13680052 lookups/s, 87
bytes/key (0.8MB)
100000 words ( 83168 keys), 2462269 words/s, 6956074 lookups/s, 37
bytes/key (3.0MB)
500000 words ( 389442 keys), 1802431 words/s, 4733774 lookups/s, 64
bytes/key (24.0MB)
1000000 words ( 755372 keys), 1702130 words/s, 4485229 lookups/s, 66
bytes/key (48.0MB)
2000000 words ( 1463359 keys), 1616658 words/s, 4251021 lookups/s, 68
bytes/key (96.0MB)
5000000 words ( 3501140 keys), 1608889 words/s, 3909212 lookups/s, 57
bytes/key (192.0MB)
10000000 words ( 6764089 keys), 1531136 words/s, 3600395 lookups/s, 59
bytes/key (384.0MB)
As you can see, the O(1) behaviour seems almost observed up to the 10000000
words limit (given the way the data is computed, I'm afraid I can't go further
than that). Of course, very small dicts are faster because everything fits in
the CPU caches.
----------
Added file: http://bugs.python.org/file18520/dcbench-py3k.py
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue9520>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com