[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 user202729 changed: What|Removed |Added CC||user202729 at protonmail dot com --- Comment #17 from user202729 --- Created attachment 58280 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58280=edit Possible patch to address the issue in case the intervening function is pure instead of `operator new`. I wrote a patch (attached) that allows the optimization to be performed if the intervening function is pure instead of `operator new`. With this patch, each of the functions in the following code will use only one memory store instead of two. ``` #include #include struct MyClass { std::array arr; }; // Prevent optimization void sink(void *m) { asm volatile("" : : "g"(m) : "memory"); } __attribute__((pure)) int f(); int g1(MyClass a) { MyClass b; MyClass c = a; int result=f(); b = c; sink(); return result; } int g2(MyClass a) { MyClass b; MyClass c = a; int result=f(); b = c; sink(); return result; } int g3(MyClass&& a) { MyClass b; MyClass c = a; int result=f(); b = c; sink(); return result; } ``` It would be helpful if someone can review the patch.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #16 from rguenther at suse dot de --- On Tue, 6 Jun 2023, amonakov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 > > --- Comment #15 from Alexander Monakov --- > malloc and friends modify 'errno' on failure, so in they would have to be > special-cased for alias analysis. That's already handled, but it's conditional on -fmath-errno (there's a PR about that).
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #15 from Alexander Monakov --- malloc and friends modify 'errno' on failure, so in they would have to be special-cased for alias analysis.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 Richard Biener changed: What|Removed |Added CC||aagarwa at gcc dot gnu.org, ||amonakov at gcc dot gnu.org --- Comment #14 from Richard Biener --- (In reply to Pontakorn Prasertsuk from comment #12) > I notice that GCC also does not optimize this case: > https://godbolt.org/z/7oGqjqqz4 Yes. To quote: #include #include #include #include struct MyClass { std::array arr; }; MyClass globalA; // Prevent optimization void sink(MyClass *m) { std::cout << m->arr[0] << std::endl; } void __attribute__((noinline)) gg(MyClass ) { MyClass c = a; MyClass *b = (MyClass *)malloc(sizeof(MyClass)); *b = c; sink(b); } and we do RTL expansion from [local count: 1073741824]: vect_c_arr__M_elems_0_6.31_25 = MEM [(long unsigned int *)a_2(D)]; vect_c_arr__M_elems_0_6.32_27 = MEM [(long unsigned int *)a_2(D) + 16B]; vect_c_arr__M_elems_0_6.33_29 = MEM [(long unsigned int *)a_2(D) + 32B]; b_4 = malloc (48); MEM [(long unsigned int *)b_4] = vect_c_arr__M_elems_0_6.31_25; MEM [(long unsigned int *)b_4 + 16B] = vect_c_arr__M_elems_0_6.32_27; MEM [(long unsigned int *)b_4 + 32B] = vect_c_arr__M_elems_0_6.33_29; sink (b_4); [tail call] note that the temporary was elided but we specifically avoid TER (some magic scheduling of stmts in a basic-block) to cross function calls and there's no optimization phase that would try to optimize register pressure over calls. In this case we want to sink the loads across the call, in other cases we want to avoid doing so. In the end this would be a job for a late running pass that factors in things like register pressure and the set of call clobbered register. I'll note that -fschedule-insns doesn't seem to have any effect here, but I also remember that scheduling around calls was recently fiddled with, specifically in r13-5154-g733a1b777f16cd which restricts motion even with -fsched-pressure (not sure how that honors call clobbered regs). In the above case the GPR for a_2(D) would be needed after the call (but there are not call clobbered GPRs) but the three data vectors in xmm would no longer be live across the call (and all vector registers are call clobbered on x86). Of course I'm not sure at all whether RTL scheduling can disambiguate against a 'malloc' call.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #13 from rguenther at suse dot de --- On Tue, 6 Jun 2023, ptk.prasertsuk at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 > > --- Comment #11 from Pontakorn Prasertsuk > --- > (In reply to rguent...@suse.de from comment #10) > > On Mon, 5 Jun 2023, ptk.prasertsuk at gmail dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 > > > > > > --- Comment #9 from Pontakorn Prasertsuk > > com> --- > > > (In reply to Richard Biener from comment #8) > > > > (In reply to Pontakorn Prasertsuk from comment #7) > > > > > For the LLVM IR code of the snippet I provided, Clang's alias > > > > > analysis can > > > > > prove that `new` call has no side effect to other memory location. > > > > > This is > > > > > indicated by `noalias` keyword at the return value of the `new` call > > > > > (_Znwm). > > > > > > > > > > According to Clang's Language Reference: > > > > > "On function return values, the noalias attribute indicates that the > > > > > function acts like a system memory allocation function, returning a > > > > > pointer > > > > > to allocated storage disjoint from the storage for any other object > > > > > accessible to the caller." > > > > > > > > > > Is this possible for GCC alias analysis pass? > > > > > > > > > MyClass c = a; > > > > > MyClass *b = new MyClass; > > > > > *b = c; > > > > > > > > the point is that 'new' can alter the value of 'a', GCC already knows > > > > that > > > > 'b' is distinct from c and a but that's not the relevant thing. It > > > > looks > > > > like LLVM creates wrong-code here. > > > > > > In what case can 'new' alter 'a'? I thought memory allocation functions > > > such as > > > 'malloc, 'calloc' and 'new' cannot alias other memory locations than its > > > return > > > value. > > > > 'new' can be overridden by the user, you can declare your own > > implementation that does fancy stuff behind the scenes, including > > in the above case altering 'a'. Welcome to C++ ... > > I assume you are referring to this case: https://godbolt.org/z/z4Y7YdxWE > > Clang indeed assumes that 'new' is non-alias and this feature can be turned > off > by using -fno-assume-sane-operator-new > > However, can we safely assume that 'malloc' and 'calloc' are non-alias as > well? Well, we do. For the C++ new case we did and it did break real world programs.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #12 from Pontakorn Prasertsuk --- I notice that GCC also does not optimize this case: https://godbolt.org/z/7oGqjqqz4
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #11 from Pontakorn Prasertsuk --- (In reply to rguent...@suse.de from comment #10) > On Mon, 5 Jun 2023, ptk.prasertsuk at gmail dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 > > > > --- Comment #9 from Pontakorn Prasertsuk > > --- > > (In reply to Richard Biener from comment #8) > > > (In reply to Pontakorn Prasertsuk from comment #7) > > > > For the LLVM IR code of the snippet I provided, Clang's alias analysis > > > > can > > > > prove that `new` call has no side effect to other memory location. This > > > > is > > > > indicated by `noalias` keyword at the return value of the `new` call > > > > (_Znwm). > > > > > > > > According to Clang's Language Reference: > > > > "On function return values, the noalias attribute indicates that the > > > > function acts like a system memory allocation function, returning a > > > > pointer > > > > to allocated storage disjoint from the storage for any other object > > > > accessible to the caller." > > > > > > > > Is this possible for GCC alias analysis pass? > > > > > > > MyClass c = a; > > > > MyClass *b = new MyClass; > > > > *b = c; > > > > > > the point is that 'new' can alter the value of 'a', GCC already knows that > > > 'b' is distinct from c and a but that's not the relevant thing. It looks > > > like LLVM creates wrong-code here. > > > > In what case can 'new' alter 'a'? I thought memory allocation functions > > such as > > 'malloc, 'calloc' and 'new' cannot alias other memory locations than its > > return > > value. > > 'new' can be overridden by the user, you can declare your own > implementation that does fancy stuff behind the scenes, including > in the above case altering 'a'. Welcome to C++ ... I assume you are referring to this case: https://godbolt.org/z/z4Y7YdxWE Clang indeed assumes that 'new' is non-alias and this feature can be turned off by using -fno-assume-sane-operator-new However, can we safely assume that 'malloc' and 'calloc' are non-alias as well?
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #10 from rguenther at suse dot de --- On Mon, 5 Jun 2023, ptk.prasertsuk at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 > > --- Comment #9 from Pontakorn Prasertsuk --- > (In reply to Richard Biener from comment #8) > > (In reply to Pontakorn Prasertsuk from comment #7) > > > For the LLVM IR code of the snippet I provided, Clang's alias analysis can > > > prove that `new` call has no side effect to other memory location. This is > > > indicated by `noalias` keyword at the return value of the `new` call > > > (_Znwm). > > > > > > According to Clang's Language Reference: > > > "On function return values, the noalias attribute indicates that the > > > function acts like a system memory allocation function, returning a > > > pointer > > > to allocated storage disjoint from the storage for any other object > > > accessible to the caller." > > > > > > Is this possible for GCC alias analysis pass? > > > > > MyClass c = a; > > > MyClass *b = new MyClass; > > > *b = c; > > > > the point is that 'new' can alter the value of 'a', GCC already knows that > > 'b' is distinct from c and a but that's not the relevant thing. It looks > > like LLVM creates wrong-code here. > > In what case can 'new' alter 'a'? I thought memory allocation functions such > as > 'malloc, 'calloc' and 'new' cannot alias other memory locations than its > return > value. 'new' can be overridden by the user, you can declare your own implementation that does fancy stuff behind the scenes, including in the above case altering 'a'. Welcome to C++ ...
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #9 from Pontakorn Prasertsuk --- (In reply to Richard Biener from comment #8) > (In reply to Pontakorn Prasertsuk from comment #7) > > For the LLVM IR code of the snippet I provided, Clang's alias analysis can > > prove that `new` call has no side effect to other memory location. This is > > indicated by `noalias` keyword at the return value of the `new` call > > (_Znwm). > > > > According to Clang's Language Reference: > > "On function return values, the noalias attribute indicates that the > > function acts like a system memory allocation function, returning a pointer > > to allocated storage disjoint from the storage for any other object > > accessible to the caller." > > > > Is this possible for GCC alias analysis pass? > > > MyClass c = a; > > MyClass *b = new MyClass; > > *b = c; > > the point is that 'new' can alter the value of 'a', GCC already knows that > 'b' is distinct from c and a but that's not the relevant thing. It looks > like LLVM creates wrong-code here. In what case can 'new' alter 'a'? I thought memory allocation functions such as 'malloc, 'calloc' and 'new' cannot alias other memory locations than its return value.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #8 from Richard Biener --- (In reply to Pontakorn Prasertsuk from comment #7) > For the LLVM IR code of the snippet I provided, Clang's alias analysis can > prove that `new` call has no side effect to other memory location. This is > indicated by `noalias` keyword at the return value of the `new` call (_Znwm). > > According to Clang's Language Reference: > "On function return values, the noalias attribute indicates that the > function acts like a system memory allocation function, returning a pointer > to allocated storage disjoint from the storage for any other object > accessible to the caller." > > Is this possible for GCC alias analysis pass? > MyClass c = a; > MyClass *b = new MyClass; > *b = c; the point is that 'new' can alter the value of 'a', GCC already knows that 'b' is distinct from c and a but that's not the relevant thing. It looks like LLVM creates wrong-code here.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #7 from Pontakorn Prasertsuk --- For the LLVM IR code of the snippet I provided, Clang's alias analysis can prove that `new` call has no side effect to other memory location. This is indicated by `noalias` keyword at the return value of the `new` call (_Znwm). According to Clang's Language Reference: "On function return values, the noalias attribute indicates that the function acts like a system memory allocation function, returning a pointer to allocated storage disjoint from the storage for any other object accessible to the caller." Is this possible for GCC alias analysis pass?
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #6 from rguenther at suse dot de --- On Tue, 30 May 2023, pinskia at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 > > Andrew Pinski changed: > >What|Removed |Added > >Keywords||missed-optimization > Ever confirmed|0 |1 >Severity|normal |enhancement >Last reconfirmed||2023-05-30 > Status|UNCONFIRMED |NEW > > --- Comment #2 from Andrew Pinski --- > More obvious Reduced testcase: > ``` > struct MyClass > { > unsigned long long arr[128]; > }; > > [[gnu::noipa]] > void sink(void *m){} > void gg(MyClass ) > { > MyClass c = a; > MyClass *b = new MyClass; > *b = c; > sink(b); > } > ``` > > There might be a dup of this issue too. But we cannot move the load of 'a' across the call to operator new since that can possibly clobber 'a' (you can overwrite 'new' with something having observable side-effects)
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #5 from Pontakorn Prasertsuk --- (In reply to Andrew Pinski from comment #3) > We don't even optimize: > ``` > struct MyClass > { > unsigned long long arr[128]; > }; > > [[gnu::noipa]] > void sink(void *m); > void gg(MyClass , MyClass *b) > { > MyClass c = a; > *b = c; > sink(b); > } > ``` > > As I mentioned there are dups of the above testcase. Would you mind pointing me to the original issue?
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #4 from Pontakorn Prasertsuk --- (In reply to Richard Biener from comment #1) > Ick - convoluted C++. We end up with > > void ff (struct MyClass & obj) > { > vector(2) long unsigned int vect_SR.16; > vector(2) long unsigned int vect_SR.15; > vector(2) long unsigned int vect_SR.14; > void * _6; > >[local count: 1073741824]: > vect_SR.14_5 = MEM [(struct MyClass > &)obj_2(D)]; > vect_SR.15_28 = MEM [(struct MyClass > &)obj_2(D) + 16]; > vect_SR.16_30 = MEM [(struct MyClass > &)obj_2(D) + 32]; > _6 = operator new (48); > MEM [(struct MyClass2 *)_6] = vect_SR.14_5; > MEM [(struct MyClass2 *)_6 + 16B] = > vect_SR.15_28; > MEM [(struct MyClass2 *)_6 + 32B] = > vect_SR.16_30; > HandleMyClass2 (_6); [tail call] > > and the issue is that 'operator new (48)' can alter what 'obj' points to, > so we cannot move the loads across the call and we get spilling. > > There is no inter-procedural analysis in GCC that would tell us that > 'obj_2(D)' (the MyClass & obj argument of ff) does not point to an > object that did not escape. In fact 'ff' has global visibility > and it might have other callers. > > If you add -fwhole-program then you get the function inlined to main and > > main: > .LFB652: > .cfi_startproc > subq$8, %rsp > .cfi_def_cfa_offset 16 > movl$48, %edi > call_Znwm > movq$0, (%rax) > movq%rax, %rdi > movq$0, 8(%rax) > movq$0, 16(%rax) > movq$0, 24(%rax) > movq$0, 32(%rax) > movq$0, 40(%rax) > call_Z14HandleMyClass2Pv > xorl%eax, %eax > addq$8, %rsp > .cfi_def_cfa_offset 8 > ret > > (not using vectors because 'main' is considered cold). Do you cite an > inline copy of ff() for clang? Hi Richard, The clang snippet I provided is not inlined into 'main' function.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #3 from Andrew Pinski --- We don't even optimize: ``` struct MyClass { unsigned long long arr[128]; }; [[gnu::noipa]] void sink(void *m); void gg(MyClass , MyClass *b) { MyClass c = a; *b = c; sink(b); } ``` As I mentioned there are dups of the above testcase.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Ever confirmed|0 |1 Severity|normal |enhancement Last reconfirmed||2023-05-30 Status|UNCONFIRMED |NEW --- Comment #2 from Andrew Pinski --- In the case of x86_64, it is just moving the loads across the operator new, I think: vect_SR.14_5 = MEM [(struct MyClass &)obj_2(D)]; vect_SR.15_28 = MEM [(struct MyClass &)obj_2(D) + 16]; vect_SR.16_30 = MEM [(struct MyClass &)obj_2(D) + 32]; _6 = operator new (48); MEM [(struct MyClass2 *)_6] = vect_SR.14_5; MEM [(struct MyClass2 *)_6 + 16B] = vect_SR.15_28; MEM [(struct MyClass2 *)_6 + 32B] = vect_SR.16_30; HandleMyClass2 (_6); [tail call] Other targets is moving across the operator new too: D.14580.__obj = *obj_2(D); _6 = operator new (48); MEM[(struct MyClass2 *)_6].f = D.14580; More obvious Reduced testcase: ``` struct MyClass { unsigned long long arr[128]; }; [[gnu::noipa]] void sink(void *m){} void gg(MyClass ) { MyClass c = a; MyClass *b = new MyClass; *b = c; sink(b); } ``` There might be a dup of this issue too.
[Bug tree-optimization/110035] Missed optimization for dependent assignment statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #1 from Richard Biener --- Ick - convoluted C++. We end up with void ff (struct MyClass & obj) { vector(2) long unsigned int vect_SR.16; vector(2) long unsigned int vect_SR.15; vector(2) long unsigned int vect_SR.14; void * _6; [local count: 1073741824]: vect_SR.14_5 = MEM [(struct MyClass &)obj_2(D)]; vect_SR.15_28 = MEM [(struct MyClass &)obj_2(D) + 16]; vect_SR.16_30 = MEM [(struct MyClass &)obj_2(D) + 32]; _6 = operator new (48); MEM [(struct MyClass2 *)_6] = vect_SR.14_5; MEM [(struct MyClass2 *)_6 + 16B] = vect_SR.15_28; MEM [(struct MyClass2 *)_6 + 32B] = vect_SR.16_30; HandleMyClass2 (_6); [tail call] and the issue is that 'operator new (48)' can alter what 'obj' points to, so we cannot move the loads across the call and we get spilling. There is no inter-procedural analysis in GCC that would tell us that 'obj_2(D)' (the MyClass & obj argument of ff) does not point to an object that did not escape. In fact 'ff' has global visibility and it might have other callers. If you add -fwhole-program then you get the function inlined to main and main: .LFB652: .cfi_startproc subq$8, %rsp .cfi_def_cfa_offset 16 movl$48, %edi call_Znwm movq$0, (%rax) movq%rax, %rdi movq$0, 8(%rax) movq$0, 16(%rax) movq$0, 24(%rax) movq$0, 32(%rax) movq$0, 40(%rax) call_Z14HandleMyClass2Pv xorl%eax, %eax addq$8, %rsp .cfi_def_cfa_offset 8 ret (not using vectors because 'main' is considered cold). Do you cite an inline copy of ff() for clang?