tl;dr: PEP-393 reduces the memory usage for strings of a very small Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB.
Am 26.08.2011 16:55, schrieb Guido van Rossum: > It would be nice if someone wrote a test to roughly verify these > numbers, e.v. by allocating lots of strings of a certain size and > measuring the process size before and after (being careful to adjust > for the list or other data structure required to keep those objects > alive). I have now written a Django application to measure the effect of PEP 393, using the debug mode (to find all strings), and sys.getsizeof: https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py The results for 3.3 and pep-393 are attached. The Django app is small in every respect: trivial ORM, very few objects (just for the sake of exercising the ORM at all), no templating, short strings. The memory snapshot is taken in the middle of a request. The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE. The tally of strings by length confirms that both tests have indeed comparable sets of objects (not surprising since it is identical Django source code and the identical application). Most strings in this benchmark are shorter than 16 characters, and a few have several thousand characters. The tally of byte lengths shows that it's the really long memory blocks that are gone with the PEP. Digging into the internal representation, it's possibly to estimate "unaccounted" bytes. For PEP 393: bytes - 80*strings - (chars+strings) = 190053 This is the total of the wchar_t and UTF-8 representations for objects that have them, plus any 2-byte and four-byte strings accounted incorrectly in above formula. Unfortunately, for "default" bytes + 56*strings - 4*(chars+strings) = 0 as unicode__sizeof__ doesn't account for the (separate) PyBytes object that may carry the default encoding. So in practice, the 3.3 number should be somewhat larger. In both cases, the app didn't cope for internal fragmentation; this would be possible by rounding up each string size to the next multiple of 8 (given that it's all allocated through the object allocator). It should be possible to squeeze a little bit out of the 190kB, by finding objects for which the wchar_t or UTF-8 representations are created unnecessarily. Regards, Martin
3.3.0a0 (default:45b63a8a76c9, Aug 29 2011, 21:45:49) [GCC 4.6.1 20110526 (prerelease)] Strings: 36075 Chars: 1303746 Bytes: 7379484 Other objects: 1906432 By Length (length: numstrings) Up to 4: 5710 Up to 8: 8997 Up to 16: 11657 Up to 32: 4267 Up to 64: 2319 Up to 128: 1373 Up to 256: 828 Up to 512: 558 Up to 1024: 233 Up to 2048: 104 Up to 4096: 23 Up to 8192: 5 Up to 16384: 0 Up to 32768: 1 By Size (size: numstrings) Up to 40: 0 Up to 80: 7913 Up to 160: 21796 Up to 320: 3317 Up to 640: 1452 Up to 1280: 847 Up to 2560: 482 Up to 5120: 183 Up to 10240: 65 Up to 20480: 18 Up to 40960: 1 Up to 81920: 1
3.3.0a0 (pep-393:6ffa3b569228, Aug 29 2011, 22:00:31) [GCC 4.6.1 20110526 (prerelease)] Strings: 36091 Chars: 1304098 Bytes: 4417522 Other objects: 1866616 By Length (length: numstrings) Up to 4: 5728 Up to 8: 8997 Up to 16: 11658 Up to 32: 4239 Up to 64: 2335 Up to 128: 1382 Up to 256: 828 Up to 512: 558 Up to 1024: 233 Up to 2048: 104 Up to 4096: 23 Up to 8192: 5 Up to 16384: 0 Up to 32768: 1 By Size (size: numstrings) Up to 40: 0 Up to 80: 0 Up to 160: 33247 Up to 320: 1500 Up to 640: 1007 Up to 1280: 226 Up to 2560: 86 Up to 5120: 21 Up to 10240: 3 Up to 20480: 1
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com