Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference
On Mon, Jan 22, 2024 at 11:10 PM Richard Biener wrote: > > On Mon, 22 Jan 2024, Jeff Law wrote: > > > > > > > On 1/15/24 06:34, Richard Biener wrote: > > > When the x86 backend generates code for cpymem with the rep_8byte > > > strathegy for the 8 byte aligned main rep movq it needs to compute > > > an adjusted pointer to the source after doing a prologue aligning > > > the destination. It computes that via > > > > > >src_ptr + (dest_ptr - orig_dest_ptr) > > > > > > which is perfectly fine. On RTL this is then > > > > > > 8: r134:DI=const(`g'+0x44) > > > 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} > > >REG_UNUSED flags:CC > > > 56: r129:DI=const(`g'+0x4c) > > > 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} > > >REG_UNUSED flags:CC > > >REG_EQUAL const(`g'+0x4c)&0xfff8 > > > 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} > > >REG_DEAD r134:DI > > >REG_UNUSED flags:CC > > >REG_EQUAL const(`g'+0x44)-r129:DI > > > 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} > > >REG_DEAD r133:DI > > >REG_UNUSED flags:CC > > > > > > but as written find_base_term happily picks the first candidate > > > it finds for the MINUS which means it picks const(`g') rather > > > than the correct frame:DI. This way find_base_term (but also > > > the unfixed find_base_value used by init_alias_analysis to > > > initialize REG_BASE_VALUE) performs pointer analysis isn't > > > sound. The following restricts the handling of multi-operand > > > operations to the case we know only one can be a pointer. > > > > > > This for example causes gcc.dg/tree-ssa/pr94969.c to miss some > > > RTL PRE (I've opened PR113395 for this). A more drastic patch, > > > removing base_alias_check results in only gcc.dg/guality/pr41447-1.c > > > regressing (so testsuite coverage is bad). I've looked at > > > gcc.dg/tree-ssa tests and mostly scheduling changes are present, > > > the cc1plus .text size is only 230 bytes worse. With the this > > > less drastic patch below most scheduling changes are gone. > > > > > > x86_64 might not the very best target to test for impact, but > > > test coverage on other targets is unlikely to be very much better. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together > > > with 2/2). Jeff, can you maybe throw this on your tester? > > > Jakub, you did the PR64025 fix which was for a similar issue. > > No issues across the cross compilers with those two patches. > > Thanks, pushed. I'm probably going to revert when bigger issues > appear (and hopefully we'd get some test coverage then). > > Richard. This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113562 -- H.J.
Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference
On Tue, Jan 23, 2024 at 6:15 AM H.J. Lu wrote: > > On Mon, Jan 22, 2024 at 11:10 PM Richard Biener wrote: > > > > On Mon, 22 Jan 2024, Jeff Law wrote: > > > > > > > > > > > On 1/15/24 06:34, Richard Biener wrote: > > > > When the x86 backend generates code for cpymem with the rep_8byte > > > > strathegy for the 8 byte aligned main rep movq it needs to compute > > > > an adjusted pointer to the source after doing a prologue aligning > > > > the destination. It computes that via > > > > > > > >src_ptr + (dest_ptr - orig_dest_ptr) > > > > > > > > which is perfectly fine. On RTL this is then > > > > > > > > 8: r134:DI=const(`g'+0x44) > > > > 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} > > > >REG_UNUSED flags:CC > > > > 56: r129:DI=const(`g'+0x4c) > > > > 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} > > > >REG_UNUSED flags:CC > > > >REG_EQUAL const(`g'+0x4c)&0xfff8 > > > > 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} > > > >REG_DEAD r134:DI > > > >REG_UNUSED flags:CC > > > >REG_EQUAL const(`g'+0x44)-r129:DI > > > > 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} > > > >REG_DEAD r133:DI > > > >REG_UNUSED flags:CC > > > > > > > > but as written find_base_term happily picks the first candidate > > > > it finds for the MINUS which means it picks const(`g') rather > > > > than the correct frame:DI. This way find_base_term (but also > > > > the unfixed find_base_value used by init_alias_analysis to > > > > initialize REG_BASE_VALUE) performs pointer analysis isn't > > > > sound. The following restricts the handling of multi-operand > > > > operations to the case we know only one can be a pointer. > > > > > > > > This for example causes gcc.dg/tree-ssa/pr94969.c to miss some > > > > RTL PRE (I've opened PR113395 for this). A more drastic patch, > > > > removing base_alias_check results in only gcc.dg/guality/pr41447-1.c > > > > regressing (so testsuite coverage is bad). I've looked at > > > > gcc.dg/tree-ssa tests and mostly scheduling changes are present, > > > > the cc1plus .text size is only 230 bytes worse. With the this > > > > less drastic patch below most scheduling changes are gone. > > > > > > > > x86_64 might not the very best target to test for impact, but > > > > test coverage on other targets is unlikely to be very much better. > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together > > > > with 2/2). Jeff, can you maybe throw this on your tester? > > > > Jakub, you did the PR64025 fix which was for a similar issue. > > > No issues across the cross compilers with those two patches. > > > > Thanks, pushed. I'm probably going to revert when bigger issues > > appear (and hopefully we'd get some test coverage then). > > > > Richard. > > The test failed with -m32: > > FAIL: gcc.dg/torture/pr113255.c -O1 (test for excess errors) > Excess errors: > cc1: error: '-mstringop-strategy=rep_8byte' not supported for 32-bit code > I am checking in this: diff --git a/gcc/testsuite/gcc.dg/torture/pr113255.c b/gcc/testsuite/gcc.dg/torture/pr113255.c index 2f009524c6b..78af6a5a563 100644 --- a/gcc/testsuite/gcc.dg/torture/pr113255.c +++ b/gcc/testsuite/gcc.dg/torture/pr113255.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-additional-options "-mtune=k8 -mstringop-strategy=rep_8byte" { target { x86_64-*-* i?86-*-* } } } */ +/* { dg-additional-options "-mtune=k8 -mstringop-strategy=rep_8byte" { target { { i?86-*-* x86_64-*-* } && { ! ia32 } } } } */ struct S { unsigned a[10]; unsigned y; unsigned b[6]; } g[2]; -- H.J.
Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference
On Mon, Jan 22, 2024 at 11:10 PM Richard Biener wrote: > > On Mon, 22 Jan 2024, Jeff Law wrote: > > > > > > > On 1/15/24 06:34, Richard Biener wrote: > > > When the x86 backend generates code for cpymem with the rep_8byte > > > strathegy for the 8 byte aligned main rep movq it needs to compute > > > an adjusted pointer to the source after doing a prologue aligning > > > the destination. It computes that via > > > > > >src_ptr + (dest_ptr - orig_dest_ptr) > > > > > > which is perfectly fine. On RTL this is then > > > > > > 8: r134:DI=const(`g'+0x44) > > > 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} > > >REG_UNUSED flags:CC > > > 56: r129:DI=const(`g'+0x4c) > > > 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} > > >REG_UNUSED flags:CC > > >REG_EQUAL const(`g'+0x4c)&0xfff8 > > > 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} > > >REG_DEAD r134:DI > > >REG_UNUSED flags:CC > > >REG_EQUAL const(`g'+0x44)-r129:DI > > > 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} > > >REG_DEAD r133:DI > > >REG_UNUSED flags:CC > > > > > > but as written find_base_term happily picks the first candidate > > > it finds for the MINUS which means it picks const(`g') rather > > > than the correct frame:DI. This way find_base_term (but also > > > the unfixed find_base_value used by init_alias_analysis to > > > initialize REG_BASE_VALUE) performs pointer analysis isn't > > > sound. The following restricts the handling of multi-operand > > > operations to the case we know only one can be a pointer. > > > > > > This for example causes gcc.dg/tree-ssa/pr94969.c to miss some > > > RTL PRE (I've opened PR113395 for this). A more drastic patch, > > > removing base_alias_check results in only gcc.dg/guality/pr41447-1.c > > > regressing (so testsuite coverage is bad). I've looked at > > > gcc.dg/tree-ssa tests and mostly scheduling changes are present, > > > the cc1plus .text size is only 230 bytes worse. With the this > > > less drastic patch below most scheduling changes are gone. > > > > > > x86_64 might not the very best target to test for impact, but > > > test coverage on other targets is unlikely to be very much better. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together > > > with 2/2). Jeff, can you maybe throw this on your tester? > > > Jakub, you did the PR64025 fix which was for a similar issue. > > No issues across the cross compilers with those two patches. > > Thanks, pushed. I'm probably going to revert when bigger issues > appear (and hopefully we'd get some test coverage then). > > Richard. The test failed with -m32: FAIL: gcc.dg/torture/pr113255.c -O1 (test for excess errors) Excess errors: cc1: error: '-mstringop-strategy=rep_8byte' not supported for 32-bit code -- H.J.
Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference
On Mon, 22 Jan 2024, Jeff Law wrote: > > > On 1/15/24 06:34, Richard Biener wrote: > > When the x86 backend generates code for cpymem with the rep_8byte > > strathegy for the 8 byte aligned main rep movq it needs to compute > > an adjusted pointer to the source after doing a prologue aligning > > the destination. It computes that via > > > >src_ptr + (dest_ptr - orig_dest_ptr) > > > > which is perfectly fine. On RTL this is then > > > > 8: r134:DI=const(`g'+0x44) > > 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} > >REG_UNUSED flags:CC > > 56: r129:DI=const(`g'+0x4c) > > 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} > >REG_UNUSED flags:CC > >REG_EQUAL const(`g'+0x4c)&0xfff8 > > 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} > >REG_DEAD r134:DI > >REG_UNUSED flags:CC > >REG_EQUAL const(`g'+0x44)-r129:DI > > 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} > >REG_DEAD r133:DI > >REG_UNUSED flags:CC > > > > but as written find_base_term happily picks the first candidate > > it finds for the MINUS which means it picks const(`g') rather > > than the correct frame:DI. This way find_base_term (but also > > the unfixed find_base_value used by init_alias_analysis to > > initialize REG_BASE_VALUE) performs pointer analysis isn't > > sound. The following restricts the handling of multi-operand > > operations to the case we know only one can be a pointer. > > > > This for example causes gcc.dg/tree-ssa/pr94969.c to miss some > > RTL PRE (I've opened PR113395 for this). A more drastic patch, > > removing base_alias_check results in only gcc.dg/guality/pr41447-1.c > > regressing (so testsuite coverage is bad). I've looked at > > gcc.dg/tree-ssa tests and mostly scheduling changes are present, > > the cc1plus .text size is only 230 bytes worse. With the this > > less drastic patch below most scheduling changes are gone. > > > > x86_64 might not the very best target to test for impact, but > > test coverage on other targets is unlikely to be very much better. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together > > with 2/2). Jeff, can you maybe throw this on your tester? > > Jakub, you did the PR64025 fix which was for a similar issue. > No issues across the cross compilers with those two patches. Thanks, pushed. I'm probably going to revert when bigger issues appear (and hopefully we'd get some test coverage then). Richard.
Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference
On 1/15/24 06:34, Richard Biener wrote: When the x86 backend generates code for cpymem with the rep_8byte strathegy for the 8 byte aligned main rep movq it needs to compute an adjusted pointer to the source after doing a prologue aligning the destination. It computes that via src_ptr + (dest_ptr - orig_dest_ptr) which is perfectly fine. On RTL this is then 8: r134:DI=const(`g'+0x44) 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} REG_UNUSED flags:CC 56: r129:DI=const(`g'+0x4c) 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} REG_UNUSED flags:CC REG_EQUAL const(`g'+0x4c)&0xfff8 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} REG_DEAD r134:DI REG_UNUSED flags:CC REG_EQUAL const(`g'+0x44)-r129:DI 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} REG_DEAD r133:DI REG_UNUSED flags:CC but as written find_base_term happily picks the first candidate it finds for the MINUS which means it picks const(`g') rather than the correct frame:DI. This way find_base_term (but also the unfixed find_base_value used by init_alias_analysis to initialize REG_BASE_VALUE) performs pointer analysis isn't sound. The following restricts the handling of multi-operand operations to the case we know only one can be a pointer. This for example causes gcc.dg/tree-ssa/pr94969.c to miss some RTL PRE (I've opened PR113395 for this). A more drastic patch, removing base_alias_check results in only gcc.dg/guality/pr41447-1.c regressing (so testsuite coverage is bad). I've looked at gcc.dg/tree-ssa tests and mostly scheduling changes are present, the cc1plus .text size is only 230 bytes worse. With the this less drastic patch below most scheduling changes are gone. x86_64 might not the very best target to test for impact, but test coverage on other targets is unlikely to be very much better. Bootstrapped and tested on x86_64-unknown-linux-gnu (together with 2/2). Jeff, can you maybe throw this on your tester? Jakub, you did the PR64025 fix which was for a similar issue. No issues across the cross compilers with those two patches. Jeff
Re: [PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference
On 1/15/24 06:34, Richard Biener wrote: When the x86 backend generates code for cpymem with the rep_8byte strathegy for the 8 byte aligned main rep movq it needs to compute an adjusted pointer to the source after doing a prologue aligning the destination. It computes that via src_ptr + (dest_ptr - orig_dest_ptr) which is perfectly fine. On RTL this is then 8: r134:DI=const(`g'+0x44) 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} REG_UNUSED flags:CC 56: r129:DI=const(`g'+0x4c) 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} REG_UNUSED flags:CC REG_EQUAL const(`g'+0x4c)&0xfff8 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} REG_DEAD r134:DI REG_UNUSED flags:CC REG_EQUAL const(`g'+0x44)-r129:DI 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} REG_DEAD r133:DI REG_UNUSED flags:CC but as written find_base_term happily picks the first candidate it finds for the MINUS which means it picks const(`g') rather than the correct frame:DI. This way find_base_term (but also the unfixed find_base_value used by init_alias_analysis to initialize REG_BASE_VALUE) performs pointer analysis isn't sound. The following restricts the handling of multi-operand operations to the case we know only one can be a pointer. This for example causes gcc.dg/tree-ssa/pr94969.c to miss some RTL PRE (I've opened PR113395 for this). A more drastic patch, removing base_alias_check results in only gcc.dg/guality/pr41447-1.c regressing (so testsuite coverage is bad). I've looked at gcc.dg/tree-ssa tests and mostly scheduling changes are present, the cc1plus .text size is only 230 bytes worse. With the this less drastic patch below most scheduling changes are gone. x86_64 might not the very best target to test for impact, but test coverage on other targets is unlikely to be very much better. Bootstrapped and tested on x86_64-unknown-linux-gnu (together with 2/2). Jeff, can you maybe throw this on your tester? Jakub, you did the PR64025 fix which was for a similar issue. OK for trunk? Pending testing, yes. Though I'd be a bit surprised if anything pops on this. I just doubt we've got much coverage in this space. I'll pass along the cross results as soon as they're done. jeff
[PATCH 1/2] rtl-optimization/113255 - base_alias_check vs. pointer difference
When the x86 backend generates code for cpymem with the rep_8byte strathegy for the 8 byte aligned main rep movq it needs to compute an adjusted pointer to the source after doing a prologue aligning the destination. It computes that via src_ptr + (dest_ptr - orig_dest_ptr) which is perfectly fine. On RTL this is then 8: r134:DI=const(`g'+0x44) 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} REG_UNUSED flags:CC 56: r129:DI=const(`g'+0x4c) 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} REG_UNUSED flags:CC REG_EQUAL const(`g'+0x4c)&0xfff8 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} REG_DEAD r134:DI REG_UNUSED flags:CC REG_EQUAL const(`g'+0x44)-r129:DI 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} REG_DEAD r133:DI REG_UNUSED flags:CC but as written find_base_term happily picks the first candidate it finds for the MINUS which means it picks const(`g') rather than the correct frame:DI. This way find_base_term (but also the unfixed find_base_value used by init_alias_analysis to initialize REG_BASE_VALUE) performs pointer analysis isn't sound. The following restricts the handling of multi-operand operations to the case we know only one can be a pointer. This for example causes gcc.dg/tree-ssa/pr94969.c to miss some RTL PRE (I've opened PR113395 for this). A more drastic patch, removing base_alias_check results in only gcc.dg/guality/pr41447-1.c regressing (so testsuite coverage is bad). I've looked at gcc.dg/tree-ssa tests and mostly scheduling changes are present, the cc1plus .text size is only 230 bytes worse. With the this less drastic patch below most scheduling changes are gone. x86_64 might not the very best target to test for impact, but test coverage on other targets is unlikely to be very much better. Bootstrapped and tested on x86_64-unknown-linux-gnu (together with 2/2). Jeff, can you maybe throw this on your tester? Jakub, you did the PR64025 fix which was for a similar issue. OK for trunk? Thanks, Richard. PR rtl-optimization/113255 * alias.cc (find_base_term): Remove PLUS/MINUS handling when both operands are not CONST_INT_P. * gcc.dg/torture/pr113255.c: New testcase. --- gcc/alias.cc| 28 + gcc/testsuite/gcc.dg/torture/pr113255.c | 27 2 files changed, 32 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr113255.c diff --git a/gcc/alias.cc b/gcc/alias.cc index 99008b0390d..bdc119822b4 100644 --- a/gcc/alias.cc +++ b/gcc/alias.cc @@ -2077,31 +2077,13 @@ find_base_term (rtx x, vec= 0) +{ + r++; + e[1].y++; +} + g[1] = e[1]; + return r; +} + +int +main () +{ + test (1); + if (g[1].y != 1) +__builtin_abort (); + return 0; +} -- 2.35.3