https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #25 from joseph at codesourcery dot com <joseph at codesourcery dot 
com> ---
Older versions of C++ - up to C++20 - would reject such characters (not 
allowed in identifiers based on the list of allowed characters in that 
standard version) even when not converted to a token, because (a) those 
older versions had (as-if) conversion of extended characters to UCNs in 
translation phase 1, and (b) UCNs not permitted in identifiers still 
matched the syntax for identifier preprocessing tokens ("Otherwise, the 
next preprocessing token is the longest sequence of characters that 
matches the syntax of a preprocessing token, even if that would cause 
further lexical analysis to fail") and then violated a semantic rule on 
which UCNs are allowed in identifiers.

C++23 instead converts UCNs to extended characters in phase 3 rather than 
doing the reverse conversion, and has (as of N4944, at least), 
[lex.pptoken], "... single non-whitespace characters that do not lexically 
match the other preprocessing token categories ... If any character not in 
the basic character set matches the last category, the program is 
ill-formed.".  That's part of the description of preprocessing tokens, 
before they get converted to tokens.  I think it has the same effect of 
disallowing the use of such a character (outside contexts such as string 
literals) - even if a different diagnostic might be better.

Reply via email to