[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)

Jakub Jelínek via cfe-commits Wed, 19 Jun 2024 08:58:57 -0700

================
@@ -0,0 +1,98 @@
+// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx 
-Wno-c23-extensions
+// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs 
-verify=expected,c
+#embed <media/empty>
+;
+
+void f (unsigned char x) { (void)x;}
+void g () {}
+void h (unsigned char x, int y) {(void)x; (void)y;}
+int i () {
+       return
+#embed <single_byte.txt>
+               ;
+}
+
+_Static_assert(
+#embed <single_byte.txt> suffix(,)
+""
+);
+_Static_assert(
+#embed <single_byte.txt>
+, ""
+);
+_Static_assert(sizeof(
+#embed <single_byte.txt>
+) ==
+sizeof(unsigned char)
+, ""
+);
+_Static_assert(sizeof
+#embed <single_byte.txt>
+, ""
+);
+_Static_assert(sizeof(
+#embed <jk.txt> // expected-warning {{left operand of comma operator has no 
effect}}
+) ==
+sizeof(unsigned char)
+, ""
+);
+
+#ifdef __cplusplus
+template <int First, int Second>
+void j() {
+       static_assert(First == 'j', "");
+       static_assert(Second == 'k', "");
+}
+#endif
+
+void do_stuff() {
+       f(
+#embed <single_byte.txt>
+       );
+       g(
+#embed <media/empty>
+       );
+       h(
+#embed <jk.txt>
+       );
+       int r = i();
+       (void)r;
+#ifdef __cplusplus
+       j<
+#embed <jk.txt>
+       >(
+#embed <media/empty>
+       );
+#endif
+}
+
+// Ensure that we don't accidentally allow you to initialize an unsigned char *
+// from embedded data; the data is modeled as a string literal internally, but
+// is not actually a string literal.
+const unsigned char *ptr =
+#embed <jk.txt> // expected-warning {{left operand of comma operator has no 
effect}}
+; // c-error@-2 {{incompatible integer to pointer conversion initializing 
'const unsigned char *' with an expression of type 'unsigned char'}} \
+     cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char 
*' with an rvalue of type 'unsigned char'}}
+
+// However, there are some cases where this is fine and should work.
+const unsigned char *null_ptr_1 =
+#embed <media/empty> if_empty(0)
+;
+
+const unsigned char *null_ptr_2 =
+#embed <null_byte.bin>
+;
+
+const unsigned char *null_ptr_3 = {
+#embed <null_byte.bin>
+};
+
+#define FILE_NAME <null_byte.bin>
+#define LIMIT 1
+#define OFFSET 0
+#define EMPTY_SUFFIX suffix()
+
+constexpr unsigned char ch =
+#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX
+;
+static_assert(ch == 0);
----------------
jakubjelinek wrote:


Unless you want to special case it in way too many spots, I'd think it would be 
far easier to optimize just the inner part of the integer sequence, i.e. 
everything except the first and last sequence element (maybe with the exception 
when the last prefix token is , or first suffix token is ,
Because one can use arbitrary tokens before and after the #embed, it can be
```c
const unsigned char a[] = {
-400 + 4 * 
#embed __FILE__
- 27 };
```
(or with tokens from prefix/suffix) and at least the current patchset 
mishandles many of such cases.  For the inner part of the sequence you know 
there is , before it and , after it, which simplifies a lot of things.
The above is handled correctly by GCC and by clang -save-temps, but not by 
clang without -save-temps.
And there are tons of other cases like that, e.g. even designated initializer 
[26] = 
before the sequence, etc.

https://github.com/llvm/llvm-project/pull/95802
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)

Reply via email to