Le dimanche 19 août 2012 19:48:06 UTC+2, Paul Rubin a écrit :
> 
> 
> But they are not ascii pages, they are (as stated) MOSTLY ascii.
> 
> E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses
> 
> a much more memory-expensive encoding than UTF-8.
> 
> 

Imagine an us banking application, everything in ascii,
except ... the € currency symbole, code point 0x20ac.

Well, it seems some software producers know what they
are doing.

>>> '€'.encode('cp1252')
b'\x80'
>>> '€'.encode('mac-roman')
b'\xdb'
>>> '€'.encode('iso-8859-1')
Traceback (most recent call last):
  File "<eta last command>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' 
in position 0: ordinal not in range(256)

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to