[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
AaronBallman wrote: Thank you both for collaborating to get that solved! https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: According to the bots that worked! https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: Thank you! https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
tbaederr wrote: I just pushed https://github.com/llvm/llvm-project/commit/99f5fcb0d1e04125daa404ff14c9cd14b7a2c40b - I don't have time to run _all_ the tests though, so this is a bit of a long shot. If that doesn't fix it, then disabling them for now sounds fine to me. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: I wonder if that would be ok to disable interpreter tests for now? https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: I can't because I don't have a big endian to verify with. We can try to push speculatively if it doesn't break existing tests. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
tbaederr wrote: Here's a quick patch with the cast inserted: ```diff diff --git a/clang/lib/AST/Interp/ByteCodeExprGen.cpp b/clang/lib/AST/Interp/ByteCodeExprGen.cpp index 731153a6ead9..e7fa1a62c277 100644 --- a/clang/lib/AST/Interp/ByteCodeExprGen.cpp +++ b/clang/lib/AST/Interp/ByteCodeExprGen.cpp @@ -1346,13 +1346,30 @@ bool ByteCodeExprGen::visitInitList(ArrayRef Inits, } } -auto Eval = [&](Expr *Init, unsigned ElemIndex) { - return visitArrayElemInit(ElemIndex, Init); -}; - +E->dump(); unsigned ElementIndex = 0; for (const Expr *Init : Inits) { - if (auto *EmbedS = dyn_cast(Init->IgnoreParenImpCasts())) { + if (const auto *EmbedS = dyn_cast(Init->IgnoreParenImpCasts())) { +QualType TargetType = Init->getType(); +PrimType TargetT = classifyPrim(Init->getType()); +TargetType->dump(); + + +auto Eval = [&](const Expr *Init, unsigned ElemIndex) { + PrimType InitT = classifyPrim(Init->getType()); + if (!this->visit(Init)) +return false; + if (InitT != TargetT) { +if (!this->emitCast(InitT, TargetT, E)) + return false; + } +return this->emitInitElem(TargetT, ElemIndex, Init); +}; + + + + + if (!EmbedS->doForEachDataElement(Eval, ElementIndex)) return false; } else { ``` Can you check if that fixes the problem? https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: I'm trying to insert a cast using emitCast: ``` --- a/clang/lib/AST/Interp/ByteCodeExprGen.cpp +++ b/clang/lib/AST/Interp/ByteCodeExprGen.cpp @@ -1347,6 +1347,13 @@ bool ByteCodeExprGen::visitInitList(ArrayRef Inits, } auto Eval = [&](Expr *Init, unsigned ElemIndex) { + auto ArrayTy = Ctx.getASTContext().getAsConstantArrayType(T); + std::optional FromT = classify(Init->getType()); + std::optional ToT = classify(ArrayTy->getElementType()); + if (FromT != ToT) { +if (!this->emitCast(*FromT, *ToT, Init)) + return false; + } return visitArrayElemInit(ElemIndex, Init); }; ``` But it fails with an assertion ``` source/llvm-project/clang/lib/AST/Interp/InterpStack.h:46: T clang::interp::InterpStack::pop() [with T = clang::interp::Integral<8, false>]: Asserti on `ItemTypes.back() == toPrimType()' failed. ``` https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: Yes, all bots are big endian. Reproducer is ``` clang -cc1 %s -fsyntax-only -verify -fexperimental-new-constant-interpreter constexpr int value(int a, int b) { return a + b; } constexpr int init_list_expr() { int vals[] = { #embed "jk.txt" }; return value(vals[0], vals[1]); } constexpr int ExpectedValue = 'j' + 'k'; static_assert(init_list_expr() == ExpectedValue); ``` contents of "jk.txt" is simply "jk". https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
tbaederr wrote: Do you have a smaller reproducer? Are all the failing build bots big endian? https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: @tbaederr , I noticed that all buildbot failures relate to the run with the new constant interpreter. I was wondering if you could see if I did something wrong? For example, embed by default yields values of type `unsigned char`. However when expanding in [ByteCodeExprGen.cpp](https://github.com/llvm/llvm-project/pull/95802/files#diff-ccb62cc00bada13b706286e9cdc64881c86d81da71f2939bcbb40ee692299d67) , I did not insert any casts. I think this could affect, since I had to insert casts in other places and from the log (as @AaronBallman noticed): ``` Line 18: in call to 'value(1778384896, 1795162112)' The first value is 0x6A00 in hex and the second is 0x6B00. The ASCII value for j is 0x6A and k is 0x6B ``` So the values are actually correct https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: Buildbot failure, I'm looking https://lab.llvm.org/buildbot/#/builders/176/builds/226 . https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
https://github.com/Fznamznon closed https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
https://github.com/AaronBallman approved this pull request. Leak fix LGTM, I think it's ready to re-land and try again. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); AaronBallman wrote: > I have a prototype of injecting tokens that helps. It also removes all the > "whack a mole" around template arguments. The only downside it is now yields > int instead of unsigned char, but I guess it is fine? Nice! Yes, it's fine to yield an `int`; that's how the feature is defined to behave in C and we need the semantics to be the same in C and C++. > Should I push it to this PR or it makes sense to land this first and make a > separate PR? NOTE: I'm on vacation next week, so I will not be available. IMO, it would be easier for reviewers to land the current changes and then push fixes and improvements separately. This patch is already really hard to review due to size. What do folks think about landing the changes as-is today/tomorrow and then doing follow-up work once @Fznamznon is back from vacation? https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); cor3ntin wrote: I think we should land this PR and iterate from there. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); Fznamznon wrote: I have a prototype of injecting tokens that helps. It also removes all the "whack a mole" around template arguments. The only downside it is now yields int instead of unsigned char, but I guess it is fine? Should I push it to this PR or it makes sense to land this first and make a separate PR? NOTE: I'm on vacation next week, so I will not be available. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); jakubjelinek wrote: Unless you want to special case it in way too many spots, I'd think it would be far easier to optimize just the inner part of the integer sequence, i.e. everything except the first and last sequence element (maybe with the exception when the last prefix token is , or first suffix token is , Because one can use arbitrary tokens before and after the #embed, it can be ```c const unsigned char a[] = { -400 + 4 * #embed __FILE__ - 27 }; ``` (or with tokens from prefix/suffix) and at least the current patchset mishandles many of such cases. For the inner part of the sequence you know there is , before it and , after it, which simplifies a lot of things. The above is handled correctly by GCC and by clang -save-temps, but not by clang without -save-temps. And there are tons of other cases like that, e.g. even designated initializer [26] = before the sequence, etc. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); cor3ntin wrote: > This is one more case, that would be solved by injecting tokens back into the > stream. Right now it is quite complex to understand that embed met by > `ParseCastExpression` should be expanded in a special way. +1. I can work on that when I come back from st Louis if you don't get to it first https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); Fznamznon wrote: This is one more case, that would be solved by injecting tokens back into the stream. Right now it is quite complex to understand that embed met by `ParseCastExpression` should be expanded in a special way. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); AaronBallman wrote: Because embed produces tokens from the preprocessor, the first example should be the same as: ``` void f(float x, char y, char z); void g() { f((float) 1, 2, 3 ); } ``` which should be accepted (the cast applies to the first argument). You can run into the same with something like: ``` void f(float x, char y, char z); void g() { f( #embed "three_character_file" prefix((float)) ); } ``` https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: Ok, removed null byte file. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -2422,6 +2422,10 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred, Bldr.addNodes(Dst); break; } + +case Stmt::EmbedExprClass: + llvm_unreachable("Support for EmbedExpr is not implemented."); Fznamznon wrote: Used `report_fatal_error` instead. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); Fznamznon wrote: Added cases. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -441,6 +441,7 @@ tok::PPKeywordKind IdentifierInfo::getPPKeywordID() const { CASE( 4, 'e', 's', else); CASE( 4, 'l', 'n', line); CASE( 4, 's', 'c', sccs); + CASE(5, 'e', 'b', embed); Fznamznon wrote: Ok, done. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); Fznamznon wrote: The first is interesting, cast to float makes #embed considered as a comma expression, so there is actually not enough arguments. Although, I think this is a correct behavior. Otherwise It is not quite clear to me to which data element the cast should apply. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -441,6 +441,7 @@ tok::PPKeywordKind IdentifierInfo::getPPKeywordID() const { CASE( 4, 'e', 's', else); CASE( 4, 'l', 'n', line); CASE( 4, 's', 'c', sccs); + CASE(5, 'e', 'b', embed); NagyDonat wrote: ```suggestion CASE( 5, 'e', 'b', embed); ``` Just drive-by bikeshedding -- consider aligning this like the lines around it. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
efriedma-quic wrote: Please don't commit binary files if it isn't absolutely necessary. You can generate whatever files you need in a RUN line. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -0,0 +1,98 @@ +// RUN: %clang_cc1 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,cxx -Wno-c23-extensions +// RUN: %clang_cc1 -x c -std=c23 %s -fsyntax-only --embed-dir=%S/Inputs -verify=expected,c +#embed +; + +void f (unsigned char x) { (void)x;} +void g () {} +void h (unsigned char x, int y) {(void)x; (void)y;} +int i () { + return +#embed + ; +} + +_Static_assert( +#embed suffix(,) +"" +); +_Static_assert( +#embed +, "" +); +_Static_assert(sizeof( +#embed +) == +sizeof(unsigned char) +, "" +); +_Static_assert(sizeof +#embed +, "" +); +_Static_assert(sizeof( +#embed // expected-warning {{left operand of comma operator has no effect}} +) == +sizeof(unsigned char) +, "" +); + +#ifdef __cplusplus +template +void j() { + static_assert(First == 'j', ""); + static_assert(Second == 'k', ""); +} +#endif + +void do_stuff() { + f( +#embed + ); + g( +#embed + ); + h( +#embed + ); + int r = i(); + (void)r; +#ifdef __cplusplus + j< +#embed + >( +#embed + ); +#endif +} + +// Ensure that we don't accidentally allow you to initialize an unsigned char * +// from embedded data; the data is modeled as a string literal internally, but +// is not actually a string literal. +const unsigned char *ptr = +#embed // expected-warning {{left operand of comma operator has no effect}} +; // c-error@-2 {{incompatible integer to pointer conversion initializing 'const unsigned char *' with an expression of type 'unsigned char'}} \ + cxx-error@-2 {{cannot initialize a variable of type 'const unsigned char *' with an rvalue of type 'unsigned char'}} + +// However, there are some cases where this is fine and should work. +const unsigned char *null_ptr_1 = +#embed if_empty(0) +; + +const unsigned char *null_ptr_2 = +#embed +; + +const unsigned char *null_ptr_3 = { +#embed +}; + +#define FILE_NAME +#define LIMIT 1 +#define OFFSET 0 +#define EMPTY_SUFFIX suffix() + +constexpr unsigned char ch = +#embed FILE_NAME limit(LIMIT) clang::offset(OFFSET) EMPTY_SUFFIX +; +static_assert(ch == 0); efriedma-quic wrote: More weird cases to consider: ``` void f(float x, char y, char z); void g() { f((float) #embed "three_character_file" ); } ``` ``` struct S { S(char x); ~S(); }; void f() { S s[] = { #embed "file" }; } ``` https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
@@ -2422,6 +2422,10 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred, Bldr.addNodes(Dst); break; } + +case Stmt::EmbedExprClass: + llvm_unreachable("Support for EmbedExpr is not implemented."); efriedma-quic wrote: Please don't use llvm_unreachable for things which are actually reachable. At the very least, use report_fatal_error. Prefer a real diagnostic when possible. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
https://github.com/cor3ntin approved this pull request. The commit to fix the leak LGTM https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
Fznamznon wrote: This fixes https://github.com/llvm/llvm-project/pull/68620#issuecomment-2163448739 . There was also https://github.com/llvm/llvm-project/pull/68620#issuecomment-2163603239 reported, but I'm not able to access proper logs. The link points to sanitizer buildbots so I suppose it might be the same memory leak. With this patch and sanitizer build only `Clang-Unit :: Lex/./LexTests/67/120` fails, which also fails on main. https://github.com/llvm/llvm-project/pull/95802 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
llvmbot wrote: @llvm/pr-subscribers-clang-static-analyzer-1 Author: Mariya Podchishchaeva (Fznamznon) Changes This commit implements the entirety of the now-accepted [N3017 -Preprocessor Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and its sister C++ paper [p1967](https://wg21.link/p1967). It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will generate EmbedExpr AST node which will be expanded by AST consumers (CodeGen or constant expression evaluators) or expand embed directive as a comma expression. This reverts commit https://github.com/llvm/llvm-project/commit/682d461d5a231cee54d65910e6341769419a67d7. - Co-authored-by: The Phantom Derpstorm phdofthehouse@gmail.com Co-authored-by: Aaron Ballman aaron@aaronballman.com Co-authored-by: cor3ntin corentinjabot@gmail.com Co-authored-by: H. Vetinari h.vetinari@gmx.com --- Patch is 184.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95802.diff 96 Files Affected: - (modified) clang-tools-extra/test/pp-trace/pp-trace-macro.cpp (+9) - (modified) clang/docs/LanguageExtensions.rst (+24) - (modified) clang/include/clang/AST/Expr.h (+158) - (modified) clang/include/clang/AST/RecursiveASTVisitor.h (+5) - (modified) clang/include/clang/AST/TextNodeDumper.h (+1) - (modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+3) - (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+12) - (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (-2) - (modified) clang/include/clang/Basic/FileManager.h (+7-4) - (modified) clang/include/clang/Basic/StmtNodes.td (+1) - (modified) clang/include/clang/Basic/TokenKinds.def (+6) - (modified) clang/include/clang/Driver/Options.td (+6) - (modified) clang/include/clang/Frontend/PreprocessorOutputOptions.h (+3) - (modified) clang/include/clang/Lex/PPCallbacks.h (+54) - (added) clang/include/clang/Lex/PPDirectiveParameter.h (+33) - (added) clang/include/clang/Lex/PPEmbedParameters.h (+94) - (modified) clang/include/clang/Lex/Preprocessor.h (+67-2) - (modified) clang/include/clang/Lex/PreprocessorOptions.h (+3) - (modified) clang/include/clang/Parse/Parser.h (+3) - (modified) clang/include/clang/Sema/Sema.h (+4) - (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3) - (modified) clang/lib/AST/Expr.cpp (+12) - (modified) clang/lib/AST/ExprClassification.cpp (+5) - (modified) clang/lib/AST/ExprConstant.cpp (+58-5) - (modified) clang/lib/AST/Interp/ByteCodeExprGen.cpp (+18-3) - (modified) clang/lib/AST/Interp/ByteCodeExprGen.h (+1) - (modified) clang/lib/AST/ItaniumMangle.cpp (+1) - (modified) clang/lib/AST/StmtPrinter.cpp (+4) - (modified) clang/lib/AST/StmtProfile.cpp (+2) - (modified) clang/lib/AST/TextNodeDumper.cpp (+5) - (modified) clang/lib/Basic/FileManager.cpp (+6-1) - (modified) clang/lib/Basic/IdentifierTable.cpp (+3-2) - (modified) clang/lib/CodeGen/CGExprAgg.cpp (+32-8) - (modified) clang/lib/CodeGen/CGExprConstant.cpp (+93-25) - (modified) clang/lib/CodeGen/CGExprScalar.cpp (+7) - (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5-1) - (modified) clang/lib/Frontend/CompilerInvocation.cpp (+8) - (modified) clang/lib/Frontend/DependencyFile.cpp (+25) - (modified) clang/lib/Frontend/DependencyGraph.cpp (+23-1) - (modified) clang/lib/Frontend/InitPreprocessor.cpp (+8) - (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+115-7) - (modified) clang/lib/Lex/PPDirectives.cpp (+474-2) - (modified) clang/lib/Lex/PPExpressions.cpp (+36-13) - (modified) clang/lib/Lex/PPMacroExpansion.cpp (+111) - (modified) clang/lib/Lex/TokenConcatenation.cpp (+4-1) - (modified) clang/lib/Parse/ParseExpr.cpp (+36-1) - (modified) clang/lib/Parse/ParseInit.cpp (+30) - (modified) clang/lib/Parse/ParseTemplate.cpp (+29-12) - (modified) clang/lib/Sema/SemaExceptionSpec.cpp (+1) - (modified) clang/lib/Sema/SemaExpr.cpp (+12-3) - (modified) clang/lib/Sema/SemaInit.cpp (+100-13) - (modified) clang/lib/Sema/TreeTransform.h (+5) - (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+14) - (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+10) - (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+4) - (added) clang/test/C/C2x/Inputs/bits.bin (+1) - (added) clang/test/C/C2x/Inputs/boop.h (+1) - (added) clang/test/C/C2x/Inputs/i.dat (+1) - (added) clang/test/C/C2x/Inputs/jump.wav (+1) - (added) clang/test/C/C2x/Inputs/s.dat (+1) - (added) clang/test/C/C2x/n3017.c (+216) - (added) clang/test/Preprocessor/Inputs/jk.txt (+1) - (added) clang/test/Preprocessor/Inputs/media/art.txt (+9) - (added) clang/test/Preprocessor/Inputs/media/empty
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
llvmbot wrote: @llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Mariya Podchishchaeva (Fznamznon) Changes This commit implements the entirety of the now-accepted [N3017 -Preprocessor Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and its sister C++ paper [p1967](https://wg21.link/p1967). It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will generate EmbedExpr AST node which will be expanded by AST consumers (CodeGen or constant expression evaluators) or expand embed directive as a comma expression. This reverts commit https://github.com/llvm/llvm-project/commit/682d461d5a231cee54d65910e6341769419a67d7. - Co-authored-by: The Phantom Derpstorm phdofthehouse@gmail.com Co-authored-by: Aaron Ballman aaron@aaronballman.com Co-authored-by: cor3ntin corentinjabot@gmail.com Co-authored-by: H. Vetinari h.vetinari@gmx.com --- Patch is 184.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95802.diff 96 Files Affected: - (modified) clang-tools-extra/test/pp-trace/pp-trace-macro.cpp (+9) - (modified) clang/docs/LanguageExtensions.rst (+24) - (modified) clang/include/clang/AST/Expr.h (+158) - (modified) clang/include/clang/AST/RecursiveASTVisitor.h (+5) - (modified) clang/include/clang/AST/TextNodeDumper.h (+1) - (modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+3) - (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+12) - (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (-2) - (modified) clang/include/clang/Basic/FileManager.h (+7-4) - (modified) clang/include/clang/Basic/StmtNodes.td (+1) - (modified) clang/include/clang/Basic/TokenKinds.def (+6) - (modified) clang/include/clang/Driver/Options.td (+6) - (modified) clang/include/clang/Frontend/PreprocessorOutputOptions.h (+3) - (modified) clang/include/clang/Lex/PPCallbacks.h (+54) - (added) clang/include/clang/Lex/PPDirectiveParameter.h (+33) - (added) clang/include/clang/Lex/PPEmbedParameters.h (+94) - (modified) clang/include/clang/Lex/Preprocessor.h (+67-2) - (modified) clang/include/clang/Lex/PreprocessorOptions.h (+3) - (modified) clang/include/clang/Parse/Parser.h (+3) - (modified) clang/include/clang/Sema/Sema.h (+4) - (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3) - (modified) clang/lib/AST/Expr.cpp (+12) - (modified) clang/lib/AST/ExprClassification.cpp (+5) - (modified) clang/lib/AST/ExprConstant.cpp (+58-5) - (modified) clang/lib/AST/Interp/ByteCodeExprGen.cpp (+18-3) - (modified) clang/lib/AST/Interp/ByteCodeExprGen.h (+1) - (modified) clang/lib/AST/ItaniumMangle.cpp (+1) - (modified) clang/lib/AST/StmtPrinter.cpp (+4) - (modified) clang/lib/AST/StmtProfile.cpp (+2) - (modified) clang/lib/AST/TextNodeDumper.cpp (+5) - (modified) clang/lib/Basic/FileManager.cpp (+6-1) - (modified) clang/lib/Basic/IdentifierTable.cpp (+3-2) - (modified) clang/lib/CodeGen/CGExprAgg.cpp (+32-8) - (modified) clang/lib/CodeGen/CGExprConstant.cpp (+93-25) - (modified) clang/lib/CodeGen/CGExprScalar.cpp (+7) - (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5-1) - (modified) clang/lib/Frontend/CompilerInvocation.cpp (+8) - (modified) clang/lib/Frontend/DependencyFile.cpp (+25) - (modified) clang/lib/Frontend/DependencyGraph.cpp (+23-1) - (modified) clang/lib/Frontend/InitPreprocessor.cpp (+8) - (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+115-7) - (modified) clang/lib/Lex/PPDirectives.cpp (+474-2) - (modified) clang/lib/Lex/PPExpressions.cpp (+36-13) - (modified) clang/lib/Lex/PPMacroExpansion.cpp (+111) - (modified) clang/lib/Lex/TokenConcatenation.cpp (+4-1) - (modified) clang/lib/Parse/ParseExpr.cpp (+36-1) - (modified) clang/lib/Parse/ParseInit.cpp (+30) - (modified) clang/lib/Parse/ParseTemplate.cpp (+29-12) - (modified) clang/lib/Sema/SemaExceptionSpec.cpp (+1) - (modified) clang/lib/Sema/SemaExpr.cpp (+12-3) - (modified) clang/lib/Sema/SemaInit.cpp (+100-13) - (modified) clang/lib/Sema/TreeTransform.h (+5) - (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+14) - (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+10) - (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+4) - (added) clang/test/C/C2x/Inputs/bits.bin (+1) - (added) clang/test/C/C2x/Inputs/boop.h (+1) - (added) clang/test/C/C2x/Inputs/i.dat (+1) - (added) clang/test/C/C2x/Inputs/jump.wav (+1) - (added) clang/test/C/C2x/Inputs/s.dat (+1) - (added) clang/test/C/C2x/n3017.c (+216) - (added) clang/test/Preprocessor/Inputs/jk.txt (+1) - (added) clang/test/Preprocessor/Inputs/media/art.txt (+9) - (added)
[clang] [clang-tools-extra] Reland [clang][Sema, Lex, Parse] Preprocessor embed in C and C++ (PR #95802)
llvmbot wrote: @llvm/pr-subscribers-clang-driver Author: Mariya Podchishchaeva (Fznamznon) Changes This commit implements the entirety of the now-accepted [N3017 -Preprocessor Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and its sister C++ paper [p1967](https://wg21.link/p1967). It implements everything in the specification, and includes an implementation that drastically improves the time it takes to embed data in specific scenarios (the initialization of character type arrays). The mechanisms used to do this are used under the "as-if" rule, and in general when the system cannot detect it is initializing an array object in a variable declaration, will generate EmbedExpr AST node which will be expanded by AST consumers (CodeGen or constant expression evaluators) or expand embed directive as a comma expression. This reverts commit https://github.com/llvm/llvm-project/commit/682d461d5a231cee54d65910e6341769419a67d7. - Co-authored-by: The Phantom Derpstorm phdofthehouse@gmail.com Co-authored-by: Aaron Ballman aaron@aaronballman.com Co-authored-by: cor3ntin corentinjabot@gmail.com Co-authored-by: H. Vetinari h.vetinari@gmx.com --- Patch is 184.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95802.diff 96 Files Affected: - (modified) clang-tools-extra/test/pp-trace/pp-trace-macro.cpp (+9) - (modified) clang/docs/LanguageExtensions.rst (+24) - (modified) clang/include/clang/AST/Expr.h (+158) - (modified) clang/include/clang/AST/RecursiveASTVisitor.h (+5) - (modified) clang/include/clang/AST/TextNodeDumper.h (+1) - (modified) clang/include/clang/Basic/DiagnosticCommonKinds.td (+3) - (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+12) - (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (-2) - (modified) clang/include/clang/Basic/FileManager.h (+7-4) - (modified) clang/include/clang/Basic/StmtNodes.td (+1) - (modified) clang/include/clang/Basic/TokenKinds.def (+6) - (modified) clang/include/clang/Driver/Options.td (+6) - (modified) clang/include/clang/Frontend/PreprocessorOutputOptions.h (+3) - (modified) clang/include/clang/Lex/PPCallbacks.h (+54) - (added) clang/include/clang/Lex/PPDirectiveParameter.h (+33) - (added) clang/include/clang/Lex/PPEmbedParameters.h (+94) - (modified) clang/include/clang/Lex/Preprocessor.h (+67-2) - (modified) clang/include/clang/Lex/PreprocessorOptions.h (+3) - (modified) clang/include/clang/Parse/Parser.h (+3) - (modified) clang/include/clang/Sema/Sema.h (+4) - (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3) - (modified) clang/lib/AST/Expr.cpp (+12) - (modified) clang/lib/AST/ExprClassification.cpp (+5) - (modified) clang/lib/AST/ExprConstant.cpp (+58-5) - (modified) clang/lib/AST/Interp/ByteCodeExprGen.cpp (+18-3) - (modified) clang/lib/AST/Interp/ByteCodeExprGen.h (+1) - (modified) clang/lib/AST/ItaniumMangle.cpp (+1) - (modified) clang/lib/AST/StmtPrinter.cpp (+4) - (modified) clang/lib/AST/StmtProfile.cpp (+2) - (modified) clang/lib/AST/TextNodeDumper.cpp (+5) - (modified) clang/lib/Basic/FileManager.cpp (+6-1) - (modified) clang/lib/Basic/IdentifierTable.cpp (+3-2) - (modified) clang/lib/CodeGen/CGExprAgg.cpp (+32-8) - (modified) clang/lib/CodeGen/CGExprConstant.cpp (+93-25) - (modified) clang/lib/CodeGen/CGExprScalar.cpp (+7) - (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5-1) - (modified) clang/lib/Frontend/CompilerInvocation.cpp (+8) - (modified) clang/lib/Frontend/DependencyFile.cpp (+25) - (modified) clang/lib/Frontend/DependencyGraph.cpp (+23-1) - (modified) clang/lib/Frontend/InitPreprocessor.cpp (+8) - (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+115-7) - (modified) clang/lib/Lex/PPDirectives.cpp (+474-2) - (modified) clang/lib/Lex/PPExpressions.cpp (+36-13) - (modified) clang/lib/Lex/PPMacroExpansion.cpp (+111) - (modified) clang/lib/Lex/TokenConcatenation.cpp (+4-1) - (modified) clang/lib/Parse/ParseExpr.cpp (+36-1) - (modified) clang/lib/Parse/ParseInit.cpp (+30) - (modified) clang/lib/Parse/ParseTemplate.cpp (+29-12) - (modified) clang/lib/Sema/SemaExceptionSpec.cpp (+1) - (modified) clang/lib/Sema/SemaExpr.cpp (+12-3) - (modified) clang/lib/Sema/SemaInit.cpp (+100-13) - (modified) clang/lib/Sema/TreeTransform.h (+5) - (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+14) - (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+10) - (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+4) - (added) clang/test/C/C2x/Inputs/bits.bin (+1) - (added) clang/test/C/C2x/Inputs/boop.h (+1) - (added) clang/test/C/C2x/Inputs/i.dat (+1) - (added) clang/test/C/C2x/Inputs/jump.wav (+1) - (added) clang/test/C/C2x/Inputs/s.dat (+1) - (added) clang/test/C/C2x/n3017.c (+216) - (added) clang/test/Preprocessor/Inputs/jk.txt (+1) - (added) clang/test/Preprocessor/Inputs/media/art.txt (+9) - (added) clang/test/Preprocessor/Inputs/media/empty () -