I think that we're more or less in broad agreement, but I wanted to comment on this:
On Sun, Oct 27, 2019 at 09:41:00PM -0700, Andrew Barnert wrote: > Yes, that’s the whole point of the message you were responding to: > extended grapheme clusters are the Unicode approximation of > characters; code units are not. I don't think that's quite correct. See: http://www.unicode.org/glossary/#abstract_character http://www.unicode.org/glossary/#character http://www.unicode.org/glossary/#extended_grapheme_cluster http://www.unicode.org/glossary/#code_point From the glossay definition of code point: "A value, or position, for a character, in any coded character set." In other words, the code point is a numeric code such as U+041 that represents a character such as "A". (Except when it is a numeric code that represents a non-character.) And from definitions D60 and D61 here: http://www.unicode.org/versions/Unicode12.1.0/ch03.pdf "Grapheme clusters and extended grapheme clusters may not have any particular linguistic significance" "The grapheme cluster represents a horizontally segmentable unit of text, consisting of some grapheme base (which may consist of a Korean SYLLABLE) together with any number of nonspacing marks applied to it." [Emphasis added.] "A grapheme cluster is similar, but not identical to a combining character sequence." So it is much more complicated than just "code point != character, extended grapheme cluster = character". Lots of code points are characters; lots of graphemes aren't characters but syllables or some other linguistic entity, or no linguistic entity at all; and lots of things that are characters aren't graphemes, such such combining character sequences. And none of this mentions what to do with variation selectors, flags etc. The whole thing is very complicated and I don't pretend to understand all the details. (Until now, I thought that combining character sequences were grapheme clusters. Apparently they aren't.) -- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EWQL4T7QGVSSPBYTAM7BSLFVZ2WSB5SO/ Code of Conduct: http://python.org/psf/codeofconduct/