https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936
--- Comment #25 from joseph at codesourcery dot com <joseph at codesourcery dot
com> ---
Older versions of C++ - up to C++20 - would reject such characters (not
allowed in identifiers based on the list of allowed characters in that
standard version) even when not converted to a token, because (a) those
older versions had (as-if) conversion of extended characters to UCNs in
translation phase 1, and (b) UCNs not permitted in identifiers still
matched the syntax for identifier preprocessing tokens ("Otherwise, the
next preprocessing token is the longest sequence of characters that
matches the syntax of a preprocessing token, even if that would cause
further lexical analysis to fail") and then violated a semantic rule on
which UCNs are allowed in identifiers.
C++23 instead converts UCNs to extended characters in phase 3 rather than
doing the reverse conversion, and has (as of N4944, at least),
[lex.pptoken], "... single non-whitespace characters that do not lexically
match the other preprocessing token categories ... If any character not in
the basic character set matches the last category, the program is
ill-formed.". That's part of the description of preprocessing tokens,
before they get converted to tokens. I think it has the same effect of
disallowing the use of such a character (outside contexts such as string
literals) - even if a different diagnostic might be better.