Antoine Pitrou writes: > Le jeudi 25 août 2011 à 02:15 +0900, Stephen J. Turnbull a écrit : > > Antoine Pitrou writes: > > > On Thu, 25 Aug 2011 01:34:17 +0900 > > > "Stephen J. Turnbull" <step...@xemacs.org> wrote: > > > > > > > > Martin has long claimed that the fact that I/O is done in terms of > > > > UTF-16 means that the internal representation is UTF-16 > > > > > > Which I/O? > > > > Eg, display of characters in the interpreter. > > I don't know why you say it's "done in terms of UTF-16", then. Unicode > strings are simply encoded to whatever character set is detected as the > terminal's character set.
But it's not "simple" at the level we're talking about! Specifically, *in-memory* surrogates are properly respected when doing the encoding, and therefore such I/O is not UCS-2 or "raw code units". This treatment is different from sizing and indexing of unicodes, where surrogates are not treated differently from other code points. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com