On 6/10/15 11:48 PM, Dolph Mathews wrote:
tl;dr *.iteritems() is faster and more memory efficient than .items()
in python2*
Using xrange() in python2 instead of range() because it's more memory
efficient and consistent between python 2 and 3...
# xrange() + .items()
python -m timeit -n 20 for\ i\ in\
dict(enumerate(xrange(1000000))).items():\ pass
20 loops, best of 3: 729 msec per loop
peak memory usage: 203 megabytes
# xrange() + .iteritems()
python -m timeit -n 20 for\ i\ in\
dict(enumerate(xrange(1000000))).iteritems():\ pass
20 loops, best of 3: 644 msec per loop
peak memory usage: 176 megabytes
# python 3
python3 -m timeit -n 20 for\ i\ in\
dict(enumerate(range(1000000))).items():\ pass
20 loops, best of 3: 826 msec per loop
peak memory usage: 198 megabytes
it is just me, or are these differences pretty negligible considering
this is the "1 million item dictionary", which in itself is a unicorn in
openstack code or really most code anywhere?
as was stated before, if we have million-item dictionaries floating
around, that code has problems. I already have to wait full seconds
for responses to come back when I play around with Neutron + Horizon in
a devstack VM, and that's with no data at all. 100ms extra for a
hypothetical million item structure would be long after the whole app
has fallen over from having just ten thousand of anything, much less a
million.
My only concern with items() is that it is semantically different in
Py2k / Py3k. Code that would otherwise have a "dictionary changed size"
issue under iteritems() / py3k items() would succeed under py2k
items(). If such a coding mistake is not covered by tests (as this is
a data-dependent error condition), it would manifest as a sudden error
condition on Py3k only.
And if you really want to see the results with range() in python2...
# range() + .items()
python -m timeit -n 20 for\ i\ in\
dict(enumerate(range(1000000))).items():\ pass
20 loops, best of 3: 851 msec per loop
peak memory usage: 254 megabytes
# range() + .iteritems()
python -m timeit -n 20 for\ i\ in\
dict(enumerate(range(1000000))).iteritems():\ pass
20 loops, best of 3: 919 msec per loop
peak memory usage: 184 megabytes
To benchmark memory consumption, I used the following on bare metal:
$ valgrind --tool=massif --pages-as-heap=yes
-massif-out-file=massif.out $COMMAND_FROM_ABOVE
$ cat massif.out | grep mem_heap_B | sort -u
$ python2 --version
Python 2.7.9
$ python3 --version
Python 3.4.3
On Wed, Jun 10, 2015 at 8:36 PM, gordon chung <g...@live.ca
<mailto:g...@live.ca>> wrote:
> Date: Wed, 10 Jun 2015 21:33:44 +1200
> From: robe...@robertcollins.net <mailto:robe...@robertcollins.net>
> To: openstack-dev@lists.openstack.org
<mailto:openstack-dev@lists.openstack.org>
> Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()
>
> On 10 June 2015 at 17:22, gordon chung <g...@live.ca
<mailto:g...@live.ca>> wrote:
> > maybe the suggestion should be "don't blindly apply
six.iteritems or items" rather than don't apply iteritems at all.
admittedly, it's a massive eyesore, but it's a very real use case
that some projects deal with large data results and to enforce the
latter policy can have negative effects[1]. one "million item
dictionary" might be negligible but in a multi-user, multi-*
environment that can have a significant impact on the amount
memory required to store everything.
>
> > [1] disclaimer: i have no real world results but i assume
memory management was the reason for the switch in logic from py2
to py3
>
> I wouldn't make that assumption.
>
> And no, memory isn't an issue. If you have a million item dict,
> ignoring the internal overheads, the dict needs 1 million object
> pointers. The size of a list with those pointers in it is 1M
(pointer
> size in bytes). E.g. 4M or 8M. Nothing to worry about given the
> footprint of such a program :)
iiuc, items() (in py2) will create a copy of the dictionary in
memory to be processed. this is useful for cases such as
concurrency where you want to ensure consistency but doing a quick
test i noticed a massive spike in memory usage between items() and
iteritems.
'for i in dict(enumerate(range(1000000))).items(): pass' consumes
significantly more memory than 'for i in
dict(enumerate(range(1000000))).iteritems(): pass'. on my system,
the difference in memory consumption was double when using items()
vs iteritems() and the cpu util was significantly more as well...
let me know if there's anything that stands out as inaccurate.
unless there's something wrong with my ignorant testing above, i
think it's something projects should consider when mass applying
any iteritems/items patch.
cheers,
gord
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev