Re: [openstack-dev] [all][python3] use of six.iteritems()

Mike Bayer Thu, 11 Jun 2015 07:16:10 -0700


On 6/10/15 11:48 PM, Dolph Mathews wrote:

tl;dr *.iteritems() is faster and more memory efficient than .items()in python2*
Using xrange() in python2 instead of range() because it's more memoryefficient and consistent between python 2 and 3...
# xrange() + .items()
python -m timeit -n 20 for\ i\ in\dict(enumerate(xrange(1000000))).items():\ pass
20 loops, best of 3: 729 msec per loop
peak memory usage: 203 megabytes

# xrange() + .iteritems()
python -m timeit -n 20 for\ i\ in\dict(enumerate(xrange(1000000))).iteritems():\ pass
20 loops, best of 3: 644 msec per loop
peak memory usage: 176 megabytes

# python 3
python3 -m timeit -n 20 for\ i\ in\dict(enumerate(range(1000000))).items():\ pass
20 loops, best of 3: 826 msec per loop
peak memory usage: 198 megabytes

it is just me, or are these differences pretty negligible consideringthis is the "1 million item dictionary", which in itself is a unicorn inopenstack code or really most code anywhere?

as was stated before, if we have million-item dictionaries floatingaround, that code has problems. I already have to wait full secondsfor responses to come back when I play around with Neutron + Horizon ina devstack VM, and that's with no data at all. 100ms extra for ahypothetical million item structure would be long after the whole apphas fallen over from having just ten thousand of anything, much less amillion.

My only concern with items() is that it is semantically different inPy2k / Py3k. Code that would otherwise have a "dictionary changed size"issue under iteritems() / py3k items() would succeed under py2kitems(). If such a coding mistake is not covered by tests (as this isa data-dependent error condition), it would manifest as a sudden errorcondition on Py3k only.



And if you really want to see the results with range() in python2...

# range() + .items()

python -m timeit -n 20 for\ i\ in\dict(enumerate(range(1000000))).items():\ pass

20 loops, best of 3: 851 msec per loop
peak memory usage: 254 megabytes

# range() + .iteritems()

python -m timeit -n 20 for\ i\ in\dict(enumerate(range(1000000))).iteritems():\ pass

20 loops, best of 3: 919 msec per loop
peak memory usage: 184 megabytes


To benchmark memory consumption, I used the following on bare metal:

$ valgrind --tool=massif --pages-as-heap=yes-massif-out-file=massif.out $COMMAND_FROM_ABOVE

$ cat massif.out | grep mem_heap_B | sort -u

$ python2 --version
Python 2.7.9

$ python3 --version
Python 3.4.3

On Wed, Jun 10, 2015 at 8:36 PM, gordon chung <g...@live.ca<mailto:g...@live.ca>> wrote:


    > Date: Wed, 10 Jun 2015 21:33:44 +1200
    > From: robe...@robertcollins.net <mailto:robe...@robertcollins.net>
    > To: openstack-dev@lists.openstack.org
    <mailto:openstack-dev@lists.openstack.org>
    > Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()
    >
    > On 10 June 2015 at 17:22, gordon chung <g...@live.ca
    <mailto:g...@live.ca>> wrote:
    > > maybe the suggestion should be "don't blindly apply
    six.iteritems or items" rather than don't apply iteritems at all.
    admittedly, it's a massive eyesore, but it's a very real use case
    that some projects deal with large data results and to enforce the
    latter policy can have negative effects[1]. one "million item
    dictionary" might be negligible but in a multi-user, multi-*
    environment that can have a significant impact on the amount
    memory required to store everything.
    >
    > > [1] disclaimer: i have no real world results but i assume
    memory management was the reason for the switch in logic from py2
    to py3
    >
    > I wouldn't make that assumption.
    >
    > And no, memory isn't an issue. If you have a million item dict,
    > ignoring the internal overheads, the dict needs 1 million object
    > pointers. The size of a list with those pointers in it is 1M
    (pointer
    > size in bytes). E.g. 4M or 8M. Nothing to worry about given the
    > footprint of such a program :)

    iiuc, items() (in py2) will create a copy of  the dictionary in
    memory to be processed. this is useful for cases such as
    concurrency where you want to ensure consistency but doing a quick
    test i noticed a massive spike in memory usage between items() and
    iteritems.

    'for i in dict(enumerate(range(1000000))).items(): pass' consumes
    significantly more memory than 'for i in
    dict(enumerate(range(1000000))).iteritems(): pass'. on my system,
    the difference in memory consumption was double when using items()
    vs iteritems() and the cpu util was significantly more as well...
    let me know if there's anything that stands out as inaccurate.

    unless there's something wrong with my ignorant testing above, i
    think it's something projects should consider when mass applying
    any iteritems/items patch.

    cheers,
    gord

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][python3] use of six.iteritems()

Reply via email to