On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote: > > Apparently clang uses -Wunicode option to cover these, but unfortunately > > they don't bother to document it (nor almost any other warning option), > > so it is unclear what else exactly it covers. Plus a question is how > > we should document that option for GCC... > > We might as well use the same flag name, and document it to mean what it > currently means for GCC.
Ok, will work on that tomorrow. > > @@ -1489,8 +1507,16 @@ _cpp_valid_ucn (cpp_reader *pfile, const > > if (str < limit && *str == '}') > > { > > - if (name == str && identifier_pos) > > + if (identifier_pos && (name == str || !strict)) > > { > > + if (name == str) > > + cpp_warning (pfile, CPP_W_NONE, > > + "empty named universal character escape " > > + "sequence; treating it as separate tokens"); > > + else > > + cpp_warning (pfile, CPP_W_NONE, > > + "incomplete named universal character escape " > > + "sequence; treating it as separate tokens"); > > It looks like this is handling \N{abc}, for which "incomplete" seems like > the wrong description; it's complete, just wrong, and the diagnostic doesn't > help correct it. The point is to make it more consistent with the \N{X.1} handling. The grammar is clear that only upper case letters + digits + space + hyphen can appear in between \N{ and }. So, both of those cases IMHO should be handled the same. The !strict case is if there is at least one lower case letter or underscore but no other characters than letters + digits + space + hyphen + underscore, we then find the terminating } and inside of string/character literals want to do the UAX44LM2 algorithm suggestions. But for X.1 in literals we don't even look for }, we just emit the cpp_error (pfile, CPP_DL_ERROR, "'\\N{' not terminated with '}' after %.*s", (int) (str - base), base); diagnostics which prints after X For the identifier_pos case, both the !strict and *str != '}' cases are the same reason why it is treated as separate tokens, not because the name is not valid, but because it contains invalid characters. So perhaps for the identifier_pos !strict and *str != '}' cases we could emit a warning with the same wording as above (but so that we stop for !strict on the first lowercase or _ char just break instead of set strict = true if identifier_pos). Or we could emit such a warning and a note that would clarify that only upper case letters, digits, space or hyphen are allowed there? Jakub