Hello Paul, others,

On 2025-10-15 06:32, Paul Hoffman wrote:
On Oct 14, 2025, at 13:12, John R Levine <[email protected]> wrote:

"A color display should be able to differentiate 🔴 (U+1F534), 🟢 (U+1F7E2), and 🔵 
(U+1F535)."

"A color display should be able to differentiate 🔴 (LARGE RED CIRCLE), 🟢 (LARGE 
GREEN CIRCLE), and 🔵 (LARGE BLUE CIRCLE)."

There is a strong guarantee that the codepoint will not change. There is a 
weaker guarantee that the character names will [not] change.

This seems pretty strong:

"seems" is the operative word there. External forces on the Unicode Consortium 
to change names will be stronger than those to change (or remove) a codepoint.

Sorry, wrong. There are examples of code points being changed, for example from version 1.1 to version 2.0 for a few thousand Hangul Syllables (see e.g. https://www.unicode.org/reports/tr44/#Character_Age).

There are no examples of character names being changed (except in draft stage). The reason that names are taken as written in stone (to the extent that mistakes are not corrected directly, but only by adding aliases, as Carsten has pointed out) is historic. It goes back to way before Unicode.

Names were the mechanism used to identify characters across different encodings (e.g. different national variants of ISO-646 or different parts of ISO-8859). Changing a name would have meant changing any number of encoding standards encoding that character, which was clearly seen as a very bad idea.

On the other hand, changing a code point for a character was localized to a single encoding. It was whoever was responsible for that encoding who was to decide whether it was worth changing a code point. It was still a very bad idea, but it happened.

This lead to the general principle "character names NEVER change", which continues up to now and into the future. There have been external forces that tried to pressure the Unicode Consortium to change names, but the Unicode Consortium has always been stronger.


Further, the use of codepoints or names is to help the reader find the points 
in various lists and web sites. Codepoints are more likely to get them to the 
right place.

Not really. There are quite a few web sites that e.g. have a page per character, and search by name brings these up easily. For both code points and names, there are some exceptions, the cases where a name or code point falls together with something more generic. But in these cases, adding 'Unicode' should help.

And then there's of course the issue that often (not always), the name will already contribute enough information to actually make a lookup unnecessary, whereas the code point wouldn't. The example at the start of this mail shows this very well. And these are exactly the cases where we should expect common sense to use names.

Regards,    Martin.

--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to