Le dimanche 19 août 2012 19:48:06 UTC+2, Paul Rubin a écrit : > > > But they are not ascii pages, they are (as stated) MOSTLY ascii. > > E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses > > a much more memory-expensive encoding than UTF-8. > >
Imagine an us banking application, everything in ascii, except ... the € currency symbole, code point 0x20ac. Well, it seems some software producers know what they are doing. >>> '€'.encode('cp1252') b'\x80' >>> '€'.encode('mac-roman') b'\xdb' >>> '€'.encode('iso-8859-1') Traceback (most recent call last): File "<eta last command>", line 1, in <module> UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 0: ordinal not in range(256) jmf -- http://mail.python.org/mailman/listinfo/python-list