[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

Richard Smith - zygoloid via Phabricator via cfe-commits Mon, 21 Aug 2023 12:06:45 -0700

rsmith added inline comments.


================
Comment at: clang/docs/ReleaseNotes.rst:97
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- Implemented `CWG1473 <https://wg21.link/CWG1473>`_ which allows spaces after 
``operator""``.
-  Clang used to err on the lack of space when the literal suffix identifier 
was invalid in
-  all the language modes, which contradicted the deprecation of the 
whitespaces.
-  Also turn on ``-Wdeprecated-literal-operator`` by default in all the 
language modes.
+- Implemented `CWG1473 <https://wg21.link/CWG1473>`_ allowing lack of space 
after ``operator""``.
+  Clang used to err on the lack of space when the literal suffix identifier 
was invalid,
----------------
I don't think this is accurate. Clang supported CWG1473 before these changes, 
as far as I can see: all valid code under CWG1473 was accepted, and invalid 
code was diagnosed (by default). Rather, what has changed is the behavior for 
invalid code: instead of treating an invalid `""blah` as two tokens always, in 
order to accept as much old code as possible, we now treat it as two tokens 
only when `blah` is defined as a macro name.

This is still a breaking change in some cases, for users of 
`-Wno-deprecated-literal-operator`, eg:

```
#define FOO(xyz) ""xyz
```

...  now will be lexed as a single invalid token rather than two tokens.

I'm not sure what the motivation for making changes here was, and D153156's 
description doesn't really help me understand it. Is the goal to improve the 
diagnostic quality for these kinds of errors on invalid code? Is there some 
example for which Clang's behavior with regard to CWG1473 was non-conforming? 
Something else?


================
Comment at: clang/lib/Lex/Lexer.cpp:2020
+  bool IsLiteralOperator =
+      StringRef(BufferPtr, 2).equals("\"\"") && BufferPtr + 2 == TokStart;
+  if (unsigned TokLen = CurPtr - TokStart;
----------------
Reverse the order of these checks to do the cheaper check first and to avoid 
the possibility of reading off the end of the input.


================
Comment at: clang/lib/Lex/Lexer.cpp:2027-2029
+    // However, don't diagnose operator""E(...) even if E is a macro as it
+    // results in confusing error messages. Hence, ""E would not be treated as
+    // string concat; instead it's a single PP token (as it should be).
----------------
That's still a breaking change compared to what we designed 
`-Wno-reserved-literal-operator` to do. How often does it happen in practice 
that someone has both an ill-formed literal operator *and* a macro defined to 
the same name as the suffix?


================
Comment at: clang/lib/Lex/Lexer.cpp:2035
+    IdentifierInfo *II = PP->LookUpIdentifierInfo(Result);
+    if (II->hasMacroDefinition()) {
+      Diag(TokStart, LangOpts.MSVCCompat
----------------
This doesn't check whether the identifier is currently defined as a macro, in 
the presence of modules. It also won't be correct if the lexer has got 
substantially ahead of the preprocessor, and the `#define` has been lexed but 
not yet preprocessed.

Overall, it's not really possible to tell from here whether an identifier is 
defined as a macro. To do this properly, you should instead produce a single 
token here and teach the preprocessor to consider splitting it into two tokens 
if the suffix is a reserved ud-suffix naming a defined macro. In principle you 
could also check from the preprocessor whether the previous produced token was 
`operator` and use that as part of the decision...


================
Comment at: clang/test/CXX/drs/dr14xx.cpp:487
 
 namespace dr1473 { // dr1473: 18
                    // NB: sup 1762, test reused there
----------------
I don't think this is correct. As far as I can tell, Clang has correctly 
implemented CWG1473 since version 3.2.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158372/new/

https://reviews.llvm.org/D158372

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D158372: [Clang] Treat invalid UDL as two tokens

Reply via email to