On Thu, 06 Feb 2014 05:51:54 -0800, wxjmfauth wrote: > Sorry, I'm only pointing you may lose memory when working with short > strings as it was explained. I really, very really, do not see what is > absurd or obsure in: > >>>> sys.getsizeof('abc' + 'EURO') > 46 >>>> sys.getsizeof(('abc' + 'EURO').encode('utf-32')) > 37
Why do you care about NINE bytes? The least amount of memory in any PC that I know about is 500000000 bytes, more than fifty million times more. And you are whinging about wasting nine bytes? If you care about that lousy nine bytes, Python is not the language for you. Go and program in C, where you can spent ten or twenty times longer programming, but save nine bytes in every string. Nobody cares about your memory "benchmark" except you. Python is not designed to save memory, Python is designed to use as much memory as needed to give the programmer an easier job. In C, I can store a single integer in a single byte. In Python, horror upon horrors, it takes 14 bytes!!! py> sys.getsizeof(1) 14 We consider it A GOOD THING that Python spends memory for programmer convenience and safety. Python looks for memory optimizations when it can save large amounts of memory, not utterly trivial amounts. So in a Python wide build, a ten-thousand block character string requires a little bit more than 40KB. In Python 3.3, that can be reduced to only 10KB for a purely Latin-1 string, or 20K for a string without any astral characters. That's the sort of memory savings that are worthwhile, reducing memory usage by 75%. Could Python save memory by using UTF-8? Yes. But it would cost complexity and time, strings would be even slower than they are now. That is not a trade-off that the core developers have chosen to make, and I agree with them. -- Steven -- https://mail.python.org/mailman/listinfo/python-list