On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote:
> > Apparently clang uses -Wunicode option to cover these, but unfortunately
> > they don't bother to document it (nor almost any other warning option),
> > so it is unclear what else exactly it covers.  Plus a question is how
> > we should document that option for GCC...
> 
> We might as well use the same flag name, and document it to mean what it
> currently means for GCC.

Ok, will work on that tomorrow.

> > @@ -1489,8 +1507,16 @@ _cpp_valid_ucn (cpp_reader *pfile, const
> >       if (str < limit && *str == '}')
> >         {
> > -         if (name == str && identifier_pos)
> > +         if (identifier_pos && (name == str || !strict))
> >             {
> > +             if (name == str)
> > +               cpp_warning (pfile, CPP_W_NONE,
> > +                            "empty named universal character escape "
> > +                            "sequence; treating it as separate tokens");
> > +             else
> > +               cpp_warning (pfile, CPP_W_NONE,
> > +                            "incomplete named universal character escape "
> > +                            "sequence; treating it as separate tokens");
> 
> It looks like this is handling \N{abc}, for which "incomplete" seems like
> the wrong description; it's complete, just wrong, and the diagnostic doesn't
> help correct it.

The point is to make it more consistent with the \N{X.1} handling.
The grammar is clear that only upper case letters + digits + space + hyphen
can appear in between \N{ and }.  So, both of those cases IMHO should be
handled the same.  The !strict case is if there is at least one lower case
letter or underscore but no other characters than letters + digits + space +
hyphen + underscore, we then find the terminating } and inside of
string/character literals want to do the UAX44LM2 algorithm suggestions.
But for X.1 in literals we don't even look for }, we just emit the
              cpp_error (pfile, CPP_DL_ERROR,
                         "'\\N{' not terminated with '}' after %.*s",
                         (int) (str - base), base);
diagnostics which prints after X
For the identifier_pos case, both the !strict and *str != '}' cases
are the same reason why it is treated as separate tokens, not because
the name is not valid, but because it contains invalid characters.
So perhaps for the identifier_pos !strict and *str != '}' cases
we could emit a warning with the same wording as above (but so that
we stop for !strict on the first lowercase or _ char just break instead
of set strict = true if identifier_pos).
Or we could emit such a warning and a note that would clarify that only
upper case letters, digits, space or hyphen are allowed there?

        Jakub

Reply via email to