On 06/17/2013 10:42 PM, Steven D'Aprano wrote:
On Mon, 17 Jun 2013 21:06:57 -0400, Dave Angel wrote:

On 06/17/2013 08:41 PM, Steven D'Aprano wrote:

     <SNIP>

In Python 3.2 and older, the data will be either UTF-4 or UTF-8,
selected when the Python compiler itself is compiled.

I think that was a typo.  Do you perhaps UCS-2 or UCS-4

Yes, that would be better.

UCS-2 is identical to UTF-16, except it doesn't support non-BMP
characters and therefore doesn't have surrogate pairs.

UCS-4 is functionally equivalent to UTF-16,

Perhaps you mean UTF-32 ?

 as far as I can tell. (I'm
not really sure what the difference is.)


Now you've got me curious, by bringing up surrogate pairs. Do you know whether a narrow build (say 3.2) really works as UTF16, so when you encode a surrogate pair (4 bytes) to UTF-8, it encodes a single Unicode character into a single UTF-8 sequence (prob. 4 bytes long) ?



--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to