cor3ntin added a subscriber: aaron.ballman. cor3ntin added inline comments.
================ Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:130-131 def ext_delimited_escape_sequence : Extension< - "delimited escape sequences are a Clang extension">, + "%select{delimited|named}0 escape sequences are a Clang extension">, InGroup<DiagGroup<"delimited-escape-sequence-extension">>; ---------------- tahonermann wrote: > I don't see much value in combining these diagnostics since these are > distinct features. The `ext_delimited_escape_sequence` name seems odd for > named escape sequences too (even if both features use `{` and `}` as > delimiters). I'm used to @aaron.ballman encouraging me to keep the number of diagnostics under control, but I'm fine keeping them separate ================ Comment at: clang/lib/Lex/Lexer.cpp:3255-3260 + if (!isAlphanumeric(C) && C != '_' && C != '-' && C != ' ') + break; + + if ((C < 'A' || C > 'Z') && !llvm::isDigit(C) && C != ' ' && C != '-') { + Invalid = true; + } ---------------- tahonermann wrote: > It isn't clear to me why there are two separate `if` statements here. I would > expect the following to suffice. If I'm misunderstanding something, then > comments might be helpful. > if (!isUppercase(C) && !isDigit(C) && C != '-' && C != ' ') { > Invalid = true; > break; > } > > I'm not sure why there is a test for '_' since that character is not present > in the grammar for `n-char` in P2071. > > Is it intentional that there is no `break` statement in the second `if` > statement? I improved that, what it does should be more clear now. More importantly, I added a diagnostic note when we detect a loose match. We allow `_` because Unicode does. We first perform a strict match - which fails as no Unicode name contains an underscore, we emit a diagnostic, and then we try a loose matching which does allow `_`. This enable us to produces better diagnostics ``` <source>:2:18: error: 'GREEK_SMALL_LETTER-OMICRON' is not a valid Unicode character name const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break space}"; ^ <source>:2:20: note: characters names in Unicode escape sequences are sensitive to case and whitespaces const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break space}"; ^~~~~~~~~~~~~~~~~~~~~~~~~~ GREEK SMALL LETTER OMICRON <source>:2:54: error: 'zero width no break space' is not a valid Unicode character name const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break space}"; ^~~~~~~~~~~~~~~~~~~~~~~~~ <source>:2:54: note: characters names in Unicode escape sequences are sensitive to case and whitespaces const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break space}"; ^~~~~~~~~~~~~~~~~~~~~~~~~ ZERO WIDTH NO-BREAK SPACE``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D123064/new/ https://reviews.llvm.org/D123064 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits