Re: Cult-like behaviour [was Re: Kindness]

Rhodri James Mon, 16 Jul 2018 13:06:02 -0700

On 16/07/18 20:40, Marko Rauhamaa wrote:

Terry Reedy<tjre...@udel.edu>:

On 7/15/2018 5:28 PM, Marko Rauhamaa wrote:

if your new system used Python3's UTF-32 strings as a foundation,

Since 3.3, Python's strings are not (always) UFT-32 strings.

You are right. Python's strings are a superset of UTF-32. More
accurately, Python's strings are UTF-32 plus surrogate characters.

Nor are they always UCS-2 (or partly UTF-16) strings. Nor are the
always Latin-1 or Ascii strings. Python's Flexible String
Representation uses the narrowest possible internal code for any
particular string. This is all transparent to the user except for
memory size.

How CPython chooses to represent its strings internally is not what I'm
talking about.

UTF-32, after all, is a variable-width encoding.

Nope.  It a fixed-width (32 bits, 4 bytes) encoding.

Perhaps you should ask more questions before pontificating.

You mean each code point is one code point wide. But that's rather an
irrelevant thing to state. The main point is that UTF-32 (aka Unicode)
uses one or more code points to represent what people would consider an
individual character.


UTF-32 != Unicode, but that's a separate esoteric argument.

The problem everyone is having with you, Marko, is that you are usingthe terminology incorrectly. When you say that more than one codepointcan be used to represent what people would consider an individualcharacter, you are correct (and would be more correct if you called"what people would consider an individual character" a "glyph"). Whenyou call UTF-32 a variable-width encoding, you are incorrect.

You are of course welcome to use whatever terminology you personallylike, like Humpty Dumpty. However when you point to a duck and say"That's a gnu," people are likely to stop taking you seriously.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list

Re: Cult-like behaviour [was Re: Kindness]

Reply via email to