On Thu, 20 Dec 2012 11:40:21 -0800, wxjmfauth wrote: > I do not care > about this optimization. I'm not an ascii user. As a non ascii user, > this optimization is just irrelevant.
WRONG. Every Python user is an ASCII user. Every Python program has hundreds or thousands of ASCII strings. # === example === import random There's already one ASCII string in your code: the module name "random" is ASCII. Let's look inside that module: py> dir(random) ['BPF', 'LOG4', 'NV_MAGICCONST', 'RECIP_BPF', 'Random', 'SG_MAGICCONST', 'SystemRandom', 'TWOPI', '_BuiltinMethodType', '_MethodType', '_Sequence', '_Set', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__initializing__', '__loader__', '__name__', '__package__', '_acos', '_ceil', '_cos', '_e', '_exp', '_inst', '_log', '_pi', '_random', '_sha512', '_sin', '_sqrt', '_test', '_test_generator', '_urandom', '_warn', 'betavariate', 'choice', 'expovariate', 'gammavariate', 'gauss', 'getrandbits', 'getstate', 'lognormvariate', 'normalvariate', 'paretovariate', 'randint', 'random', 'randrange', 'sample', 'seed', 'setstate', 'shuffle', 'triangular', 'uniform', 'vonmisesvariate', 'weibullvariate'] That's another 58 ASCII strings. Let's pick one of those: py> dir(random.Random) ['VERSION', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_randbelow', 'betavariate', 'choice', 'expovariate', 'gammavariate', 'gauss', 'getrandbits', 'getstate', 'lognormvariate', 'normalvariate', 'paretovariate', 'randint', 'random', 'randrange', 'sample', 'seed', 'setstate', 'shuffle', 'triangular', 'uniform', 'vonmisesvariate', 'weibullvariate'] That's another 51 ASCII strings. Let's pick one of them: py> dir(random.Random.shuffle) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] And another 34 ASCII strings. So to get access to just *one* method of *one* class of *one* module, we have already seen up to 144 ASCII strings. (Some of them will be duplicated.) Even if every one of *your* classes, methods, functions, modules and variables are using non-ASCII names, you will still use ASCII strings for built-in functions and standard library modules. > What should a Python user think, if he sees his strings are comsuming > more memory just because he uses non ascii characters WRONG! His strings are consuming just as much memory as they need to. You cannot fit ten thousand different characters into a single byte. A single byte can represent only 2**8 = 256 characters. Two bytes can only represent 65536 characters at most. Four bytes can represent the entire range of every character ever represented in human history, and more, but it is terribly wasteful: most strings do not use a billion different characters, and so use of a four-byte character encoding uses up to four times as much memory as necessary. You are imagining that non-ASCII users are being discriminated against, with their strings being unfairly bloated. But that is not the case. Their strings would be equally large in a Python wide-build, give or take whatever overhead of the string object that change from version to version. If you are not comparing a wide-build of Python to Python 3.3, then your comparison is faulty. You are comparing "buggy Unicode, cannot handle the supplementary planes" with "fixed Unicode, can handle the supplementary planes". Python 3.2 narrow builds save memory by introducing bugs into Unicode strings. Python 3.3 fixes those bugs and still saves memory. -- Steven -- http://mail.python.org/mailman/listinfo/python-list