tl;dr: PEP-393 reduces the memory usage for strings of a very small
Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB.

Am 26.08.2011 16:55, schrieb Guido van Rossum:
> It would be nice if someone wrote a test to roughly verify these
> numbers, e.v. by allocating lots of strings of a certain size and
> measuring the process size before and after (being careful to adjust
> for the list or other data structure required to keep those objects
> alive).

I have now written a Django application to measure the effect of PEP
393, using the debug mode (to find all strings), and sys.getsizeof:

https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py

The results for 3.3 and pep-393 are attached.

The Django app is small in every respect: trivial ORM, very few
objects (just for the sake of exercising the ORM at all),
no templating, short strings. The memory snapshot is taken in
the middle of a request.

The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE.

The tally of strings by length confirms that both tests have indeed
comparable sets of objects (not surprising since it is identical Django
source code and the identical application). Most strings in this
benchmark are shorter than 16 characters, and a few have several
thousand characters. The tally of byte lengths shows that it's the
really long memory blocks that are gone with the PEP.

Digging into the internal representation, it's possibly to estimate
"unaccounted" bytes. For PEP 393:

   bytes - 80*strings - (chars+strings) = 190053

This is the total of the wchar_t and UTF-8 representations for objects
that have them, plus any 2-byte and four-byte strings accounted
incorrectly in above formula. Unfortunately, for "default"

   bytes + 56*strings - 4*(chars+strings) = 0

as unicode__sizeof__ doesn't account for the (separate) PyBytes
object that may carry the default encoding. So in practice, the 3.3
number should be somewhat larger.

In both cases, the app didn't cope for internal fragmentation;
this would be possible by rounding up each string size to the next
multiple of 8 (given that it's all allocated through the object
allocator).

It should be possible to squeeze a little bit out of the 190kB,
by finding objects for which the wchar_t or UTF-8 representations
are created unnecessarily.

Regards,
Martin
3.3.0a0 (default:45b63a8a76c9, Aug 29 2011, 21:45:49) 
[GCC 4.6.1 20110526 (prerelease)]
Strings: 36075
Chars: 1303746
Bytes: 7379484
Other objects: 1906432

By Length (length: numstrings)
Up to 4: 5710
Up to 8: 8997
Up to 16: 11657
Up to 32: 4267
Up to 64: 2319
Up to 128: 1373
Up to 256: 828
Up to 512: 558
Up to 1024: 233
Up to 2048: 104
Up to 4096: 23
Up to 8192: 5
Up to 16384: 0
Up to 32768: 1

By Size (size: numstrings)
Up to 40: 0
Up to 80: 7913
Up to 160: 21796
Up to 320: 3317
Up to 640: 1452
Up to 1280: 847
Up to 2560: 482
Up to 5120: 183
Up to 10240: 65
Up to 20480: 18
Up to 40960: 1
Up to 81920: 1
3.3.0a0 (pep-393:6ffa3b569228, Aug 29 2011, 22:00:31) 
[GCC 4.6.1 20110526 (prerelease)]
Strings: 36091
Chars: 1304098
Bytes: 4417522
Other objects: 1866616

By Length (length: numstrings)
Up to 4: 5728
Up to 8: 8997
Up to 16: 11658
Up to 32: 4239
Up to 64: 2335
Up to 128: 1382
Up to 256: 828
Up to 512: 558
Up to 1024: 233
Up to 2048: 104
Up to 4096: 23
Up to 8192: 5
Up to 16384: 0
Up to 32768: 1

By Size (size: numstrings)
Up to 40: 0
Up to 80: 0
Up to 160: 33247
Up to 320: 1500
Up to 640: 1007
Up to 1280: 226
Up to 2560: 86
Up to 5120: 21
Up to 10240: 3
Up to 20480: 1
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to