https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936
--- Comment #25 from joseph at codesourcery dot com <joseph at codesourcery dot com> --- Older versions of C++ - up to C++20 - would reject such characters (not allowed in identifiers based on the list of allowed characters in that standard version) even when not converted to a token, because (a) those older versions had (as-if) conversion of extended characters to UCNs in translation phase 1, and (b) UCNs not permitted in identifiers still matched the syntax for identifier preprocessing tokens ("Otherwise, the next preprocessing token is the longest sequence of characters that matches the syntax of a preprocessing token, even if that would cause further lexical analysis to fail") and then violated a semantic rule on which UCNs are allowed in identifiers. C++23 instead converts UCNs to extended characters in phase 3 rather than doing the reverse conversion, and has (as of N4944, at least), [lex.pptoken], "... single non-whitespace characters that do not lexically match the other preprocessing token categories ... If any character not in the basic character set matches the last category, the program is ill-formed.". That's part of the description of preprocessing tokens, before they get converted to tokens. I think it has the same effect of disallowing the use of such a character (outside contexts such as string literals) - even if a different diagnostic might be better.