Hello Paul, others,
On 2025-10-15 06:32, Paul Hoffman wrote:
On Oct 14, 2025, at 13:12, John R Levine <[email protected]> wrote:
"A color display should be able to differentiate 🔴 (U+1F534), 🟢 (U+1F7E2), and 🔵
(U+1F535)."
"A color display should be able to differentiate 🔴 (LARGE RED CIRCLE), 🟢 (LARGE
GREEN CIRCLE), and 🔵 (LARGE BLUE CIRCLE)."
There is a strong guarantee that the codepoint will not change. There is a
weaker guarantee that the character names will [not] change.
This seems pretty strong:
"seems" is the operative word there. External forces on the Unicode Consortium
to change names will be stronger than those to change (or remove) a codepoint.
Sorry, wrong. There are examples of code points being changed, for
example from version 1.1 to version 2.0 for a few thousand Hangul
Syllables (see e.g. https://www.unicode.org/reports/tr44/#Character_Age).
There are no examples of character names being changed (except in draft
stage). The reason that names are taken as written in stone (to the
extent that mistakes are not corrected directly, but only by adding
aliases, as Carsten has pointed out) is historic. It goes back to way
before Unicode.
Names were the mechanism used to identify characters across different
encodings (e.g. different national variants of ISO-646 or different
parts of ISO-8859). Changing a name would have meant changing any number
of encoding standards encoding that character, which was clearly seen as
a very bad idea.
On the other hand, changing a code point for a character was localized
to a single encoding. It was whoever was responsible for that encoding
who was to decide whether it was worth changing a code point. It was
still a very bad idea, but it happened.
This lead to the general principle "character names NEVER change", which
continues up to now and into the future. There have been external forces
that tried to pressure the Unicode Consortium to change names, but the
Unicode Consortium has always been stronger.
Further, the use of codepoints or names is to help the reader find the points
in various lists and web sites. Codepoints are more likely to get them to the
right place.
Not really. There are quite a few web sites that e.g. have a page per
character, and search by name brings these up easily. For both code
points and names, there are some exceptions, the cases where a name or
code point falls together with something more generic. But in these
cases, adding 'Unicode' should help.
And then there's of course the issue that often (not always), the name
will already contribute enough information to actually make a lookup
unnecessary, whereas the code point wouldn't. The example at the start
of this mail shows this very well. And these are exactly the cases where
we should expect common sense to use names.
Regards, Martin.
--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]