Antoine Pitrou writes:
 > Le jeudi 25 août 2011 à 02:15 +0900, Stephen J. Turnbull a écrit :
 > > Antoine Pitrou writes:
 > >  > On Thu, 25 Aug 2011 01:34:17 +0900
 > >  > "Stephen J. Turnbull" <step...@xemacs.org> wrote:
 > >  > > 
 > >  > > Martin has long claimed that the fact that I/O is done in terms of
 > >  > > UTF-16 means that the internal representation is UTF-16
 > >  > 
 > >  > Which I/O?
 > > 
 > > Eg, display of characters in the interpreter.
 > 
 > I don't know why you say it's "done in terms of UTF-16", then. Unicode
 > strings are simply encoded to whatever character set is detected as the
 > terminal's character set.

But it's not "simple" at the level we're talking about!

Specifically, *in-memory* surrogates are properly respected when doing
the encoding, and therefore such I/O is not UCS-2 or "raw code units".
This treatment is different from sizing and indexing of unicodes,
where surrogates are not treated differently from other code points.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to