On 2014-02-10 06:07, wxjmfa...@gmail.com wrote: > Python does not save memory at all. A str (unicode string) > uses less memory only - and only - because and when one uses > explicitly characters which are consuming less memory. > > Not only the memory gain is zero, Python falls back to the > worse case. > > >>> sys.getsizeof('a' * 1000000) > 1000025 > >>> sys.getsizeof('a' * 1000000 + 'oe') > 2000040 > >>> sys.getsizeof('a' * 1000000 + 'oe' + '\U00010000') > 4000048
If Python used UTF-32 for EVERYTHING, then all three of those cases would be 4000048, so it clearly disproves your claim that "python does not save memory at all". > The opposite of what the utf8/utf16 do! > > >>> sys.getsizeof(('a' * 1000000 + 'oe' + > >>> '\U00010000').encode('utf-8')) > 1000023 > >>> sys.getsizeof(('a' * 1000000 + 'oe' + > >>> '\U00010000').encode('utf-16')) > 2000025 However, as pointed out repeatedly, string-indexing in fixed-width encodings are O(1) while indexing into variable-width encodings (e.g. UTF8/UTF16) are O(N). The FSR gives the benefits of O(1) indexing while saving space when a string doesn't need to use a full 32-bit width. -tkc -- https://mail.python.org/mailman/listinfo/python-list