Robert Vanden Eynde writes: > Not for control characters.
There's a standard convention for "naming" control characters (U+0000, U+0001, etc), which is recommended by the Unicode Standard (in slightly generalized form) for characters that otherwise don't have names, as "code point labels". This has been suggested by MRAB in the past. Personally I would generalized Steven d'Aprano's function a bit, and provide a "CONTROL-" prefix for these instead of "U+". I don't see why even the C0 ASCII control function aliases should be particularly privileged, especially since the main alias is the spelled-out name, not the more commonly used 2- or 3-character abbreviation (will people associate "alarm" with "BEL"? I don't). Many are just meaningless (the 4 "device control" codes). And some are actively misleading: U+0018 (^X) "cancel" and U+001A (^Z) "substitute", which are generally interpreted as "exit" (an interactive program) and "end of file" (on Windows), or as "cut" and "revert" in CUA UI. I for one would find it more useful if they aliased to "ctrl-c-prefix" and "zap-up-to-char".[1] And nobody's ever heard of the C1 ISO 6249 control characters (not to mention that three of them are literally figments of somebody's imagination, and never standardized). So I think using NameAliases.txt for this purpose is silly. If we're going to provide aliases based on the traditional control functions, I would use only the NameAliases.txt aliases for the following: NUL, BEL, BS, HT, LF, VT, FF, CR, ESC, SP, DEL, NEL, NBSP, and SHY. (NEL is included because it's recommended that it be treated as a newline function in the Unicode standard.) For the rest, I would use CONTROL-<code>, which is more likely to make sense in most contexts.[2] > About the Han case, they all have a > unicodedata.name<http://unicodedata.name> don't they ? (Sorry if I > misread your message) Yes, they have names, constructed algorithmically from the code point: "CJK UNIFIED IDEOGRAPH-4E00". I know what that one is (the character that denotes the number 1). But that's the only one that I know offhand. I think Han (which are named daily, surely millions, if not billions, of times) should be treated as well as controls (which even programmers rarely bother to name, especially for those that don't have standard escape sequences). That's why I strongly advocate that there be provision for extension, and that the databases at least be provided by a module that can be updated far more frequently than the stdlib is. Footnotes: [1] Those are the commands they are bound to in Emacs. [2] There are a few others that I personally would find useful and unambiguous because they're used in multilingual ISO 2022 encodings, but that's rather far into the weeds. They're rarely seen in practice; most of the time 7-bit codes with escape sequences are used, or 8-bit codes without control sequences. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/