[Bug c/66425] (void) cast doesn't suppress __attribute__((warn_unused_result))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425 --- Comment #71 from Petr Skocik --- An Ignore macro that works everywhere where a (void) cast syntactically works (i.e., even on void types for whatever reason) is easy: #define IGN$(Val) (__extension__({ \ __auto_type IGN$ = _Generic((typeof(Val)*)0, \ void*: ((void)(Val),0), default: Val); (void)IGN$; })) /// __attribute((warn_unused_result)) int getInt(void); void getVoid(void); void ign_test(void){ getInt(); //warning getVoid(); //no warning (void)getInt(); //traditionally with a warning (void)getVoid(); //no warning IGN$(getInt()); //no warning IGN$(getVoid()); //no warning } https://godbolt.org/z/4qa8TcWMM (Can be easily done wihtout __auto-type (=>use typeof) or (__extension__({ }) too (use do ;while(0)). Would strongly prefer if the current semantics of warn_unused_result were not broken by a late "correction". The time for a discussion on the semantics of warn_unused_result in combo with void cast is long gone. It's now been long established that simple (void) casts do NOT silence warn_unused_result. Let's not break code that expects such semantics. A conditional compiler flag to enable void casts to silence WUR might be in order, however, considering that clang disregards the established semantics and a void cast does silence WUR on clang (https://godbolt.org/z/4qa8TcWMM).
[Bug middle-end/93487] Missed tail-call optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93487 --- Comment #5 from Petr Skocik --- Another case of a missed tailcall which might warrant a separate mention: struct big{ long _[10]; }; void takePtr(void *); void takeBigAndPassItsAddress(struct big X){ takePtr(&X); } This should ideally compile to just `lea 8(%rsp), %rdi; jmp takePtr;`. The compiler might be tempted here to use the taking of an address of a local here as a reason not to tail call, and clang misses this optimization too, probably for this reason, but tailcalling here is fine as the particular local here isn't allocated by the function but rather the callee during the call. Icc does do this optimization: https://godbolt.org/z/a6coTzPjz
[Bug c/90181] Feature request: provide a way to explicitly select specific named registers in constraints
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90181 Petr Skocik changed: What|Removed |Added CC||pskocik at gmail dot com --- Comment #16 from Petr Skocik --- The current way of loading stuff into regs that don't have a specific constraint for them also breaks on gcc (but not on clang) if the variable is marked const. https://godbolt.org/z/1PvYsrqG9
[Bug middle-end/112844] Branches under -Os (unlike -O{1, 2, 3}) do not respect __builtin_expect hints
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112844 --- Comment #2 from Petr Skocik --- (In reply to Jakub Jelinek from comment #1) > With -Os you ask the code to be small. So, while internally the hint is > still present in edge probabilities, -Os is considered more important and > certain code changes based on the probabilities aren't done if they are > known or expected to result in larger code. Thanks. I very much like the codegen I get with gcc -Os, often better than what I get with clang. But the sometimes counter-obvious branch layout at -Os is annoying to me, especially considering I've measured it a couple of times as being the source of a slowdown. Sure you can save a (most-often-than not 2-byte) jump by conditionally jumping over an unlikely branch instead of conditionally jumping to an unlikely branch placed after ret and having it jump back in the function body (the latter is what all the other compilers do at -Os), but I'd rather have the code spend the extra two bytes and have my happy paths be fall-through as they should be.
[Bug target/114097] Missed register optimization in _Noreturn functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097 --- Comment #4 from Petr Skocik --- Excellent! Thank you very much. Didn't realize the functionality was already there, but didn't work without an explicit __attribute((noreturn)). Now I can get rid of my most complex assembly function which I stupidly (back then I thought cleverly) wrote. :)
[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837 Petr Skocik changed: What|Removed |Added CC||pskocik at gmail dot com --- Comment #19 from Petr Skocik --- IMO(In reply to Xi Ruoyao from comment #16) > In practice most _Noreturn functions are abort, exit, ..., i.e. they are > only executed one time so optimizing against a cold path does not help much. > I don't think it's a good idea to encourage people to construct some fancy > code by a recursive _Noreturn function (why not just use a loop?!) And if > you must write such fancy code anyway IMO musttail attribute (PR83324) will > be a better solution. There's also longjmp, which may not be all that super cold and may be executed multiple times. And while yeah, nobody will notice a single call vs jmp time save against a process spawn/exit, for a longjmp wrapper, it'll make it a few % faster (as would utilizing _Noreturn attributes for better register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097, which would also save a bit of codesize too). Taillcalls can also save a bit of codesize if the target is near.
[Bug c/114097] New: Missed register optimization in _Noreturn functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097 Bug ID: 114097 Summary: Missed register optimization in _Noreturn functions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- Consider a never-returning functions such as this: #include #include //_Noreturn void noret(unsigned A, unsigned B, unsigned C, unsigned D, unsigned E, jmp_buf Jb){ for(;A--;) puts("A"); for(;B--;) puts("B"); for(;C--;) puts("C"); for(;D--;) puts("D"); for(;E--;) puts("E"); longjmp(Jb,1); } https://godbolt.org/z/35YjrhjYq In its prologue, gcc saves the arguments in call-preserved registers to preserve them around the puts calls, and it does so the usual way: by (1) pushing the old values of the call-preserved registers to the stack and (2) actually moving the arguments into the call-preserved registers. pushq %r15 movq%r9, %r15 pushq %r14 movl%edi, %r14d pushq %r13 movl%esi, %r13d pushq %r12 movl%edx, %r12d pushq %rbp movl%ecx, %ebp pushq %rbx movl%r8d, %ebx pushq %rax //... Since this function demonstrably never returns, step 1 can be entirely elided as the old values of the call-preserved registers won't ever need to be restored (desirably, gcc does not generate the would-be-dead restoration code): movq%r9, %r15 movl%edi, %r14d movl%esi, %r13d movl%edx, %r12d movl%ecx, %ebp movl%r8d, %ebx pushq %rax //... (Also desirable would be the unrealized tailoptimization of the longjmp call in this case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837)
[Bug c/114011] New: Feature request: __goto__
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114011 Bug ID: 114011 Summary: Feature request: __goto__ Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- Gcc has __volatile__. I can only assume the rationale for it is so that inline asm macros can do __asm __volatile__ and not have to worry about user-redefines of the volatile keyword (which while not quite approved by the standard, is sometimes practically useful). While the __asm syntax also allows the goto keyword, there's currently no __goto__ counterpart to __volatile__, which could similarly protect against goto redefines. Adding it is trivial and consistent with the already existing volatile/__volatile__ pair. Would you consider it? ( Why am I redefining goto? I'm basically doing it within the confines of a macro framework to force a static context check on gotos to prevent gotos out of scopes where doing it would be an error. Something like: enum { DISALLOW_GOTO_HERE = 0 }; //normally, goto is allowed #define goto while(_Generic((int(*)[!DISALLOW_GOTO_HERE])0, int(*)[1]:1)) goto //statically checked goto int main(void){ goto next; next:; //OK, not disallowed in this context #if 0 //would fail to compile enum {DISALLOW_GOTO_HERE=1}; //disallowed in this context goto next2; next2:; #endif } While this redefine does not syntactically disturb C, it does disturb `__asm goto()`, which I, unfortunately, have one very frequently used instance of, and since there's no way to suppress an object macro redefine, I'd like to be able to change it to `__asm __goto__` and have it peacefully coexist with the goto redefine. )
[Bug c/112844] New: Branches under -Os (unlike -O{1,2,3}) do not respect __builtin_expect hints
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112844 Bug ID: 112844 Summary: Branches under -Os (unlike -O{1,2,3}) do not respect __builtin_expect hints Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- A simple example that demonstrates this is: int test(void); void yes(void); void expect_yes(void){ if (__builtin_expect(test(),1)) yes(); else {} } void expect_no(void){ if (__builtin_expect(test(),0)) yes(); else {} } For an optimized x86-64 output, one should expect: -a fall-through to a yes() tailcall for the expect_yes() case, preceded by a conditional jump to code doing a plain return -a fall-through to a plain return for the expect_no() case, preceded by a conditional jump to a yes() tailcall (or even more preferably: a conditional-taicall to yes() with the needed stack adjustment done once before the test instead of being duplicated in each branch after the test) Indeed, that's how gcc lays it out for -O{1,2,3} (https://godbolt.org/z/rG3P3d6f7) as does clang at -O{1,2,3,s} (https://godbolt.org/z/EcKbrn1b7) and icc at -O{1,2,3,s} (https://godbolt.org/z/Err73eGsb). But gcc at -Os seems to have a very strong preference to falling through to call yes() even in void expect_no(void){ if (__builtin_expect(test(),0)) yes(); else {} } and even in void expect_no2(void){ if (__builtin_expect(!test(),1)){} else yes(); } essentially completely disregarding any user attempts at controlling the branch layout of the output.
[Bug ipa/106116] Missed optimization: in no_reorder-attributed functions, tail calls to the subsequent function could just be function-to-function fallthrough
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106116 --- Comment #4 from Petr Skocik --- It would be interesting to do this at the assembler level, effectively completely turning what's equivalent to `jmp 1f; 1:` to nothing. This would also be in line with the GNU assembler's apparent philosophy that jmp is a high-level variadic-length instruction (either jmp, or jmpq, whichever is possible first => this could become: nothing, jmp, or jmpq). I have a bunch of multiparam functions such with supporting functions structured as follows: void func_A(int A){ func_AB(DEFAULT_C); } void func_AB(int A, int B){ func_ABC(A,B,DEFAULT_C); } void func_ABC(int A, int B, int C){ func_ABCD(A,B,C,DEFAULT_D); } void func_ABC(int A, int B, int C, int D){ //... } which could size-wise benefit from eliding the jumps, turning them into fallthrus this way, but yeah, probably not worth the effort (unless somebody knows how to easily hack gas to do it).
[Bug middle-end/109766] New: Passing doubles through the stack generates a stack adjustment pear each such argument at -Os/-Oz.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109766 Bug ID: 109766 Summary: Passing doubles through the stack generates a stack adjustment pear each such argument at -Os/-Oz. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- /* Passing doubles through the stack generates a stack adjustment pear each such argument at -Os/-Oz. These stack adjustments are only coalesced at -O1/-O2/-O3, leaving -Os/-Oz with larger code. */ #define $expr(...) (__extension__({__VA_ARGS__;})) #define $regF0 $expr(register double x __asm("xmm0"); x) #define $regF1 $expr(register double x __asm("xmm1"); x) #define $regF2 $expr(register double x __asm("xmm2"); x) #define $regF3 $expr(register double x __asm("xmm3"); x) #define $regF4 $expr(register double x __asm("xmm4"); x) #define $regF5 $expr(register double x __asm("xmm5"); x) #define $regF6 $expr(register double x __asm("xmm6"); x) #define $regF7 $expr(register double x __asm("xmm7"); x) void func(char const*Fmt, ...); void callfunc(char const*Fmt, double D0, double D1, double D2, double D3, double D4, double D5, double D6, double D7){ func(Fmt,$regF0,$regF1,$regF2,$regF3,$regF4,$regF5,$regF6,$regF7, D0,D1,D2,D3,D4,D5,D6,D7); /* //gcc @ -Os/-Oz : 0: 50 push %rax 1: b0 08 mov$0x8,%al 3: 48 8d 64 24 f8 lea-0x8(%rsp),%rsp 8: 66 0f d6 3c 24 movq %xmm7,(%rsp) d: 48 8d 64 24 f8 lea-0x8(%rsp),%rsp 12: 66 0f d6 34 24 movq %xmm6,(%rsp) 17: 48 8d 64 24 f8 lea-0x8(%rsp),%rsp 1c: 66 0f d6 2c 24 movq %xmm5,(%rsp) 21: 48 8d 64 24 f8 lea-0x8(%rsp),%rsp 26: 66 0f d6 24 24 movq %xmm4,(%rsp) 2b: 48 8d 64 24 f8 lea-0x8(%rsp),%rsp 30: 66 0f d6 1c 24 movq %xmm3,(%rsp) 35: 48 8d 64 24 f8 lea-0x8(%rsp),%rsp 3a: 66 0f d6 14 24 movq %xmm2,(%rsp) 3f: 48 8d 64 24 f8 lea-0x8(%rsp),%rsp 44: 66 0f d6 0c 24 movq %xmm1,(%rsp) 49: 48 8d 64 24 f8 lea-0x8(%rsp),%rsp 4e: 66 0f d6 04 24 movq %xmm0,(%rsp) 53: e8 00 00 00 00 callq 58 54: R_X86_64_PLT32 func-0x4 58: 48 83 c4 48 add$0x48,%rsp 5c: c3 retq $sz(callfunc)=93 //clang @ -Os/-Oz : 0: 48 83 ec 48 sub$0x48,%rsp 4: f2 0f 11 7c 24 38 movsd %xmm7,0x38(%rsp) a: f2 0f 11 74 24 30 movsd %xmm6,0x30(%rsp) 10: f2 0f 11 6c 24 28 movsd %xmm5,0x28(%rsp) 16: f2 0f 11 64 24 20 movsd %xmm4,0x20(%rsp) 1c: f2 0f 11 5c 24 18 movsd %xmm3,0x18(%rsp) 22: f2 0f 11 54 24 10 movsd %xmm2,0x10(%rsp) 28: f2 0f 11 4c 24 08 movsd %xmm1,0x8(%rsp) 2e: f2 0f 11 04 24 movsd %xmm0,(%rsp) 33: b0 08 mov$0x8,%al 35: e8 00 00 00 00 callq 3a 36: R_X86_64_PLT32 func-0x4 3a: 48 83 c4 48 add$0x48,%rsp 3e: c3 retq $sz(callfunc)=63 //gcc @ -O1 : 0: 48 83 ec 48 sub$0x48,%rsp 4: f2 0f 11 7c 24 38 movsd %xmm7,0x38(%rsp) a: f2 0f 11 74 24 30 movsd %xmm6,0x30(%rsp) 10: f2 0f 11 6c 24 28 movsd %xmm5,0x28(%rsp) 16: f2 0f 11 64 24 20 movsd %xmm4,0x20(%rsp) 1c: f2 0f 11 5c 24 18 movsd %xmm3,0x18(%rsp) 22: f2 0f 11 54 24 10 movsd %xmm2,0x10(%rsp) 28: f2 0f 11 4c 24 08 movsd %xmm1,0x8(%rsp) 2e: f2 0f 11 04 24 movsd %xmm0,(%rsp) 33: b8 08 00 00 00 mov$0x8,%eax 38: e8 00 00 00 00 callq 3d 39: R_X86_64_PLT32 func-0x4 3d: 48 83 c4 48 add$0x48,%rsp 41: c3 retq $sz(callfunc)=66 */ } https://godbolt.org/z/d8T3hxqWK
[Bug preprocessor/109704] New: #pragma {push,pop}_macro broken for identifiers that contain dollar signs at nonfirst positions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109704 Bug ID: 109704 Summary: #pragma {push,pop}_macro broken for identifiers that contain dollar signs at nonfirst positions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: preprocessor Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- This following dollarsign-less example compiles fine as expected: #define MACRO 1 _Static_assert(MACRO,""); #pragma push_macro("MACRO") #undef MACRO #define MACRO 0 _Static_assert(!MACRO,""); #pragma pop_macro("MACRO") _Static_assert(MACRO,""); //OK Substituting $MACRO for MACRO still works, but with MACRO$ or M$CRO the final assertions fail: https://godbolt.org/z/n1EoGao74
[Bug tree-optimization/93265] memcmp comparisons of structs wrapping a primitive type not as compact/efficient as direct comparisons of the underlying primitive type under -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93265 --- Comment #3 from Petr Skocik --- Here's another example (which may be summarizing it more nicely) struct a{ char _[4]; }; #include int cmp(struct a A, struct a B){ return !!memcmp(&A,&B,4); } Expected x86-64 codegen (✓ for gcc -O2/-O3 and for clang -Os/-O2/-O3) xor eax, eax cmp edi, esi setne al ret gcc -Os codegen: subq$24, %rsp movl$4, %edx movl%edi, 12(%rsp) leaq12(%rsp), %rdi movl%esi, 8(%rsp) leaq8(%rsp), %rsi callmemcmp testl %eax, %eax setne %al addq$24, %rsp movzbl %al, %eax ret https://godbolt.org/z/G5eE5GYv4
[Bug c/94379] Feature request: like clang, support __attribute((__warn_unused_result__)) on structs, unions, and enums
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94379 --- Comment #2 from Petr Skocik --- Excellent! For optional super extra coolness, this might work (and clang doesn't do this) with statement expressions too so that statement expression-based macros could be marked warn_unused_result through it too. typedef struct __attribute((__warn_unused_result__)) { int x; } wur_retval_t; wur_retval_t foo(void){ int x=41; return (wur_retval_t){x+1}; } #define foo_macro() ({ int x=41; (wur_retval_t){x+1}; }) void use(void){ foo(); //warn unused result ✓ foo_macro(); //perhaps should "warn unused result" too? }
[Bug c/109567] New: Useless stack adjustment by 16 around calls with odd stack-argument counts on SysV x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109567 Bug ID: 109567 Summary: Useless stack adjustment by 16 around calls with odd stack-argument counts on SysV x86_64 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- For function calls with odd stack argument counts, gcc generates a useless `sub $16, %rsp` at the beginning of the calling function. Example (https://godbolt.org/z/Y4ErE8ee9): #include int callprintf_0stk(char const*Fmt){ return printf(Fmt,0,0,0,0,0),0; } int callprintf_1stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1),0; } //useless sub $0x10,%rsp int callprintf_2stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2),0; } int callprintf_3stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3),0; } //useless sub $0x10,%rsp int callprintf_4stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3,4),0; } int callprintf_5stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3,4,5),0; } //useless sub $0x10,%rsp int callprintf_6stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3,4,5,6),0; } int callprintf_7stk(char const *Fmt){ return printf(Fmt,0,0,0,0,0, 1,2,3,4,5,6,7),0; } //useless sub $0x10,%rsp
[Bug middle-end/108799] Improper deprecation diagnostic for rsp clobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799 Petr Skocik changed: What|Removed |Added CC||pskocik at gmail dot com --- Comment #3 from Petr Skocik --- Very good question. The deprecation of SP clobbers could use some explanation if there are indeed good reasons for it. IMO, if listing the SP as a clobber both (1) forces a frame pointer with frame-pointer-relative addressing of spills (and the frame pointer isn't clobbered too) and (2) avoids the use of the red zone (and it absolutely should continue to do both of these things in my opinion) then gcc shouldn't need to care about redzone clobbers (as in the `pushf;pop` example) or even a wide class of stack pointer changes (assembly-made stack allocation and frees) just as long as no spills made by the compiler are clobbered (or opened to being clobbered from signal handlers) by such head-of-the-stack manipulation. Even with assembly-less standard C that uses VLAs or allocas, gcc cannot count on being in control of the stack pointer anyway, so why be so fussy about it when something as expert-oriented as inline assembly tries to manipulate it?
[Bug c/108194] GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194 --- Comment #6 from Petr Skocik --- (In reply to Petr Skocik from comment #5) > (In reply to Andrew Pinski from comment #4) > > Invalid as mentioned in r13-3135-gfa258f6894801a . > > I believe it's still a bug for pre-c2x __typeof. > While it is GCC's prerogative to include _Noreturn/__attribute((noreturn)) > into the type for its own __typeof (which, BTW, I think is better design > than the standardized semantics), I think two otherwise compatible function > types should still remain compatible if they both either have or don't have > _Noreturn/__attribute((noreturn)). But treating `_Noreturn void > NR_FN_A(void);` > as INcompatible with `_Noreturn void NR_FN_B(void);` that's just wonky, IMO. OK, the bug was MINE after all. For bug report archeologists: I was doing what was meant to be a full (qualifers-including) type comparison wrong. While something like _Generic((__typeof(type0)*)0, __typeof(type1)*:1, default:0) suffices to get around _Generic dropping qualifs (const/volatile/_Atomic) in its controlling expression, for function pointer types at single pointer layer of indirection, the _Noreturn attribute will still get dropped in the controlling expression of _Generic (I guess that makes sense because they're much more closely related to functions that how another pointer type would be related to its target type) and another pointer layer of indirection if required as in `_Generic((__typeof(type0)**)0, __typeof(type1)**:1, default:0)`. Thanks you all very much, especially jos...@codesourcery.com, who pointed me (pun intended) to the right solution over email. :)
[Bug c/108194] GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194 Petr Skocik changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #5 from Petr Skocik --- (In reply to Andrew Pinski from comment #4) > Invalid as mentioned in r13-3135-gfa258f6894801a . I believe it's still a bug for pre-c2x __typeof. While it is GCC's prerogative to include _Noreturn/__attribute((noreturn)) into the type for its own __typeof (which, BTW, I think is better design than the standardized semantics), I think two otherwise compatible function types should still remain compatible if they both either have or don't have _Noreturn/__attribute((noreturn)). But treating `_Noreturn void NR_FN_A(void);` as INcompatible with `_Noreturn void NR_FN_B(void);` that's just wonky, IMO.
[Bug c/108194] New: GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108194 Bug ID: 108194 Summary: GCC won't treat two compatible function types as compatible if any of them (or both of them) is declared _Noreturn Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- (same with __attribute((noreturn))) Example (https://godbolt.org/z/ePGd95sWz): void FN_A(void); void FN_B(void); _Noreturn void NR_FN_A(void); _Noreturn void NR_FN_B(void); _Static_assert(_Generic((__typeof(*(FN_A))*){0}, __typeof(*(FN_B))*: 1), ""); //OK ✓ _Static_assert(_Generic((__typeof(*(NR_FN_A))*){0}, __typeof(*(NR_FN_B))*: 1), ""); //ERROR ✗ _Static_assert(_Generic((__typeof(*(FN_A))*){0}, __typeof(*(NR_FN_B))*: 1), ""); //ERROR ✗ As you can see from the Compiler Explorer link, clang accepts all three, which is as it should be as per the standard, where _Noreturn is a function specifier (https://port70.net/~nsz/c/c11/n1570.html#6.7.4), which means it shouldn't even go into the type. (Personally, I don't even mind it going into the type just as long as two otherwise identical _Noreturn functio declarations are deemed as having the same type). Regards, Petr Skocik
[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831 --- Comment #9 from Petr Skocik --- Regarding the size of alloca/VLA-generated code under -fstack-clash-protection. I've played with this a little bit and while I love the feature, the code size increases seem quite significant and unnecessarily so. Take a simple void ALLOCA_C(size_t Sz){ char buf[Sz]; asm volatile ("" : : "r"(&buf[0])); } gcc -fno-stack-clash-protection: 17 bytes gcc -fstack-clash-protection: 72 bytes clang manages with less of an increase: -fno-stack-clash_protection: 26 bytes -stack-clash-protection: 45 bytes Still this could be as low as 11 bytes for the -fclash-stack-protection version (less than for the unprotected one!) all by using a simple call to an assembly function, whose code can be no-clobber without much extra effort. Linked in compiler explorer is a crack at the idea along with benchmarks: https://godbolt.org/z/f8rhG1ozs The performance impact of the call seems negligible (practically less than 1ns, though in the above quick-and-dirty benchmark it fluctuates a tiny bit, sometimes even giving the non-inline version an edge). I originally suggested popping the address of the stack and repushing before calling returning. Ended up just repushing -- the old return address becomes part of the alloca allocation. The concern that this could mess up the return stack buffer of the CPU seems valid but all the benchmarks indicate it doesn't--not even when the ret address is popped--just as long as the return target address is the same. (When it isn't, the performance penalty is rather significant: measured a 19 times slowdown of that for comparison (it's also in the linked benchmarks)). The (x86-64) assembly function: #define STR(...) STR__(__VA_ARGS__) //{{{ #define STR__(...) #__VA_ARGS__ //}}} asm(STR( .global safeAllocaAsm; safeAllocaAsm: //no clobber, though does expect 16-byte aligned at entry as usual push %r10; cmp $16, %rdi; ja .LsafeAllocaAsm__test32; push 8(%rsp); ret; .LsafeAllocaAsm__test32: push %r10; push %rdi; mov %rsp, %r10; sub $17, %rdi; and $-16, %rdi; //(-32+15)&(-16) //substract the 32 and 16-align, rounding up jnz .LsafeAllocaAsm__probes; .LsafeAllocaAsm__ret: lea (3*8)(%r10,%rdi,1), %rdi; push (%rdi); mov -8(%rdi), %r10; mov -16(%rdi), %rdi; ret; .LsafeAllocaAsm__probes: sub %rdi, %r10; //r10 is the desired rsp .LsafeAllocaAsm__probedPastDesiredSpEh: cmp %rsp, %r10; jge .LsafeAllocaAsm__pastDesiredSp; orl $0x0,(%rsp); sub $0x1000,%rsp; jmp .LsafeAllocaAsm__probedPastDesiredSpEh; .LsafeAllocaAsm__pastDesiredSp: mov %r10, %rsp; //set the desired sp jmp .LsafeAllocaAsm__ret; .size safeAllocaAsm, .-safeAllocaAsm; )); Cheers, Petr Skocik
[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831 --- Comment #7 from Petr Skocik --- (In reply to Jakub Jelinek from comment #4) > Say for > void bar (char *); > void > foo (int x, int y) > { > __attribute__((assume (x < 64))); > for (int i = 0; i < y; ++i) > bar (__builtin_alloca (x)); > } > all the alloca calls are known to be small, yet they can quickly cross pages. > Similarly: > void > baz (int x) > { > if (x >= 512) __builtin_unreachable (); > char a[x]; > bar (a); > char b[x]; > bar (b); > char c[x]; > bar (c); > char d[x]; > bar (d); > char e[x]; > bar (e); > char f[x]; > bar (f); > char g[x]; > bar (g); > char h[x]; > bar (h); > char i[x]; > bar (i); > char j[x]; > bar (j); > } > All the VLAs here are small, yet together they can cross a page. > So, we'd need to punt for dynamic allocations in loops and for others > estimate > the maximum size of all the allocations together (+ __builtin_alloca > overhead + normal frame size). I think this shouldn't need probes either (unless you tried to coalesce the allocations) on architectures where making a function call touches the stack. Also alloca's of less than or equal to half a page intertwined with writes anywhere to the allocated blocks should be always safe (but I guess I'll just turn stack-clash-protection off in the one file where I'm making such clearly safe dynamic stack allocations).
[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831 --- Comment #6 from Petr Skocik --- (In reply to Jakub Jelinek from comment #2) > (In reply to Petr Skocik from comment #1) > > Sidenote regarding the stack-allocating code for cases when the size is not > > known to be less than pagesize: the code generated for those cases is quite > > large. It could be replaced (at least under -Os) with a call to a special > > assembly function that'd pop the return address (assuming the target machine > > pushes return addresses to the stack), allocate adjust and allocate the > > stack size in a piecemeal fashion so as to not skip guard pages, the repush > > the return address and return to caller with the stacksize expanded. > > You certainly don't want to kill the return stack the CPU has, even if it > results in a few saved bytes for -Os. That's a very interesting point because I have written x86_64 assembly "functions" that did pop the return address, pushed something to the stack, and then repushed the return address and returned. In a loop, it doesn't seem to perform badly compared to inline code, so I figure it shouldn't be messing with the return stack buffer. After all, even though the return happens through a different place in the callstack, it's still returning to the original caller. The one time I absolutely must have accidentally messed with the return stack buffer was when I wrote context switching routine and originally tried to "ret" to the new context. It turned out to be very measurably many times slower that `pop %rcx; jmp *%rcx;` (also measured on a loop), so that's why I think popping a return address, allocating on the stack, and then pushing and returning is not really a performance killer (on my Intel CPU anyway). If it was messing with the return stack buffer, I think would be getting similar slowdowns to what I got with context switching code trying to `ret`.
[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831 --- Comment #1 from Petr Skocik --- Sidenote regarding the stack-allocating code for cases when the size is not known to be less than pagesize: the code generated for those cases is quite large. It could be replaced (at least under -Os) with a call to a special assembly function that'd pop the return address (assuming the target machine pushes return addresses to the stack), allocate adjust and allocate the stack size in a piecemeal fashion so as to not skip guard pages, the repush the return address and return to caller with the stacksize expanded.
[Bug c/107831] New: Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831 Bug ID: 107831 Summary: Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- I'm talking allocations such as char buf [ (uint8_t)size ]; The resulting code for this should ideally be the same with or without -fstack-clash-protection as this can clearly never skip a whole page. But gcc generates a big loop trying to touch every page-sized subpart of that allocation. https://godbolt.org/z/G8EbzbshK
[Bug c/106116] New: Missed optimization: in no_reorder-attributed functions, tail calls to the subsequent function could just be function-to-function fallthrough
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106116 Bug ID: 106116 Summary: Missed optimization: in no_reorder-attributed functions, tail calls to the subsequent function could just be function-to-function fallthrough Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- Example: __attribute((noinline,no_reorder)) int fnWithExplicitArg(int ExplicitArg); __attribute((noinline,no_reorder)) int fnWithDefaultArg(void){ return fnWithExplicitArg(42); } int fnWithExplicitArg(int ExplicitArg){ int useArg(int); return 12+useArg(ExplicitArg); } Generated fnWithDefaultArg: fnWithDefaultArg: mov edi, 42 jmp fnWithExplicitArg fnWithExplicitArg: //... Desired fnWithDefaultArg fnWithDefaultArg: mov edi, 42 //fallthru fnWithExplicitArg: //... https://gcc.godbolt.org/z/Ph3onxoh9
[Bug target/85927] ud2 instruction generated starting with gcc 8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85927 Petr Skocik changed: What|Removed |Added CC||pskocik at gmail dot com --- Comment #5 from Petr Skocik --- I think it'd be more welcome if gcc just put nothing there like clang does.
[Bug c/102096] New: Gcc unnecessarily initializes indeterminate variables passed across function boundaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102096 Bug ID: 102096 Summary: Gcc unnecessarily initializes indeterminate variables passed across function boundaries Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- Compared to clang where: long ret_unspec(void){ auto long rv; return rv; } void take6(long,long,long,long,long,long); void call_take6(void) { //6 unnecessary XORs on GCC auto long id0; //indeterminate auto long id1; //indeterminate auto long id2; //indeterminate auto long id3; //indeterminate auto long id4; //indeterminate auto long id5; //indeterminate take6(id0,id1,id2,id3,id4,id5); } yields (x86_64): ret_unspec:# @ret_unspec2 retq call_take6: # @call_take6 jmp take6 (1+5 bytes), GCC compiles the above to ret_unspec2: xorl%eax, %eax ret call_take6: xorl%r9d, %r9d xorl%r8d, %r8d xorl%ecx, %ecx xorl%edx, %edx xorl%esi, %esi xorl%edi, %edi jmp take6 (3+19 bytes), unnecessarily 0-initializing the indeterminate return-value/arguments. Type casting the called function can often be hackishly used to get the same assembly but doing so is technically UB and not as generic as supporting the passing of unspecified arguments/return values, which can be used to omit argument register initializations not just for arguments at the end of an argument pack but also in the middle. TL;DR: Allowing to passing/return indeterminate variables without generating initializing code for them would be nice. Clang already does it.
[Bug c/98418] Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418 pskocik at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #3 from pskocik at gmail dot com --- You're right. The bug was in my code. struct foo { unsigned bit: (0xll<<40)!=0; }; is indeed UB due to http://port70.net/~nsz/c/c11/n1570.html#6.5.7p4, but struct foo { unsigned bit: (0xull<<40)!=0; }; isn't and GCC accepts it without complaint. Apologies for the false alarm.
[Bug c/98418] Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418 --- Comment #2 from pskocik at gmail dot com --- You're right. The bug was in my code. struct foo { unsigned bit: (0xll<<40)!=0; }; is indeed UB due to http://port70.net/~nsz/c/c11/n1570.html#6.5.7p4, but struct foo { unsigned bit: (0xull<<40)!=0; }; isn't and GCC accepts it without complaint. Apologies for the false alarm.
[Bug c/98418] New: Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98418 Bug ID: 98418 Summary: Valid integer constant expressions based on expressions that trigger -Wshift-overflow are treated as non-constants Product: gcc Version: 6.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pskocik at gmail dot com Target Milestone: --- This causes things like: struct foo { unsigned bit: (0xll<<40)!=0; }; to elicit a -pedantic warning about the bitfield width not being a proper integer constant expression, even though it is. In other contexts, a complete compilation error might ensue: extern int bar[ (0xll<<40)!=0 ]; //seen as an invalid VLA https://gcc.godbolt.org/z/7zfz96 Neither clang nor gcc <= 5 appear to have this bug. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93241 seems related.