On 28/07/2013 20:23, [email protected] wrote: [snip]
Compare these (a BDFL exemple, where I'using a non-ascii char) Py 3.2 (narrow build)
Why are you using a narrow build of Python 3.2? It doesn't treat all codepoints equally (those outside the BMP can't be stored in one code unit) and, therefore, it isn't "Unicode compliant"!
timeit.timeit("a = 'hundred'; 'x' in a")0.09897159682121348timeit.timeit("a = 'hundre€'; 'x' in a")0.09079501961732461sys.getsizeof('d')32sys.getsizeof('€')32sys.getsizeof('dd')34sys.getsizeof('d€')34 Py3.3timeit.timeit("a = 'hundred'; 'x' in a")0.12183182740848858timeit.timeit("a = 'hundre€'; 'x' in a")0.2365732969632326sys.getsizeof('d')26sys.getsizeof('€')40sys.getsizeof('dd')27sys.getsizeof('d€')42 Tell me which one seems to be more "unicode compliant"? The goal of Unicode is to handle every char "equaly". Now, the problem: memory. Do not forget that à la "FSR" mechanism for a non-ascii user is *irrelevant*. As soon as one uses one single non-ascii, your ascii feature is lost. (That why we have all these dedicated coding schemes, utfs included).sys.getsizeof('abc' * 1000 + 'z')3026sys.getsizeof('abc' * 1000 + '\U00010010')12044 A bit secret. The larger a repertoire of characters is, the more bits you needs. Secret #2. You can not escape from this. jmf
-- http://mail.python.org/mailman/listinfo/python-list
