Sure, that is a fair point. But in this case Unicode defines both codepoint and (extended) grapheme cluster, so I felt it might be worth including a quick note about which one is being reversed (though to be fair, nearly every language picks codepoint except maybe Swift, IIUC).
In either case it's not something I feel very strongly about. -David On 2021/05/17 15:20:57, Antoine Pitrou <anto...@python.org> wrote: > > Le 17/05/2021 à 17:17, David Li a écrit : > > A little clarification on my point: it's not that a single codepoint > > gets encoded with more than four bytes, it's that a grapheme > > cluster/human-delimited 'character' might be multiple codepoints, so > > reversing the individual codepoints may produce an unexpected > > result. For instance a flag emoji is actually two codepoints (two > > special 'letter' codepoints that represent the country code), so > > reversing a US flag naively will give you an odd '[SU]' instead. > > This sounds like saying that reversing a valid French word does not > produce a valid French word (well, in most cases). The kernel > documentation can't contain an entire tutorial about Unicode characters > and what to expect from them, IMHO. > > Regards > > Antoine. >