[PATCH] libstdc++: Work around modules ICE in [PR105297]
This makes the initializer for __table in __from_chars_alnum_to_val dependent in an artificial way, which works around the modules testsuite ICE reported in PR105297 by preventing the initializer from getting evaluated at parse time. Compared to the alternative workaround of using a non-local class type for __table, this workaround has the advantage of slightly speeding up compilation of the header, since now the table will not get built (via constexpr evaluation) until one of the integer std::from_chars overloads is actually instantiated. Tested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/105297 PR c++/105322 libstdc++-v3/ChangeLog: * include/std/charconv (__from_chars_alnum_to_val): Make initializer for __table dependent in an artificial way. --- libstdc++-v3/include/std/charconv | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index f1ace406017..561234cb2fc 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -445,7 +445,9 @@ namespace __detail return __c - '0'; else { - static constexpr auto __table = __from_chars_alnum_to_val_table(); + // This initializer is deliberately made dependent in order to work + // around modules bug PR105322. + static constexpr auto __table = (_DecOnly, __from_chars_alnum_to_val_table()); return __table.__data[__c]; } } -- 2.36.0.rc2.10.g1ac7422e39
[PATCH] c++: Add srodata to the allowed sections
This fires errors like FAIL: g++.dg/opt/const7.C -std=c++14 scan-assembler-symbol-section symbol b_var (found _ZL5b_var) has section ^\\.(const|rodata)|\\[RO\\] (found .srodata) on RISC-V, where RO data can end up in the srodata section. gcc/testsuite/ChangeLog: * g++.dg/opt/const7.C: Allow symbols in .srodata --- I didn't actually re-run the test suite, as I was poking around with something else. This one seems pretty trivial, though. Happy to do so before committing, but figured I'd send it out anyway in case anyone else is triaging our bugs. --- gcc/testsuite/g++.dg/opt/const7.C | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/g++.dg/opt/const7.C b/gcc/testsuite/g++.dg/opt/const7.C index 5bcf94897a8..8bbd9db973f 100644 --- a/gcc/testsuite/g++.dg/opt/const7.C +++ b/gcc/testsuite/g++.dg/opt/const7.C @@ -4,4 +4,4 @@ struct B { B()=default; }; static const B b_var; // { dg-bogus "" } -// { dg-final { scan-assembler-symbol-section {b_var} {^\.(const|rodata)|\[RO\]} } } +// { dg-final { scan-assembler-symbol-section {b_var} {^\.(const|rodata|srodata)|\[RO\]} } } -- 2.34.1
Re: [PATCH] Asan changes for RISC-V.
Hi Joshua: > Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is > still unsupported for RISC-V. If I add '--enable-libsanitizer' in Makefile.in > to reconfigure, there are compiling errors. Is it because # libsanitizer not supported rv32, but it will break the rv64 multi-lib build, so we disable that temporally until rv32 supported# in Makefile.in? IIUC, you mean the Makefile in riscv-gnu-toolchain instead of upstream GCC, right? I guess we can make a configure option to enable that and check it does not come with multi-lib, or maybe you could fix that on GCC's configure script to make the multi-lib build be ignored for rv32? On Wed, Apr 20, 2022 at 2:13 PM joshua via Gcc-patches wrote: > > Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is > still unsupported for RISC-V. If I add '--enable-libsanitizer' in Makefile.in > to reconfigure, there are compiling errors. > Is it because # libsanitizer not supported rv32, but it will break the rv64 > multi-lib build, so we disable that temporally until rv32 supported# in > Makefile.in? > > > -- > 发件人:Jim Wilson > 发送时间:2020年10月29日(星期四) 07:59 > 收件人:gcc-patches > 抄 送:cooper.joshua ; Jim Wilson > > 主 题:[PATCH] Asan changes for RISC-V. > > We have only riscv64 asan support, there is no riscv32 support as yet. So I > need to be able to conditionally enable asan support for the riscv target. I > implemented this by returning zero from the asan_shadow_offset function. This > requires a change to toplev.c and docs in target.def. > > The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel. > The problem is that the asan high memory region is a small wedge below > 0x40. The new kernel puts shared libraries at 0x3f and going > down which works. But the old kernel puts shared libraries at 0x20 > and going up which does not work, as it isn't in any recognized memory > region. This might be fixable with more asan work, but we don't really need > support for old kernel versions. > > The asan port is curious in that it uses 1<<29 for the shadow offset, but all > other 64-bit targets use a number larger than 1<<32. But what we have is > working OK for now. > > I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running on > qemu and the results look reasonable. > > === gcc Summary === > > # of expected passes 1905 > # of unexpected failures 11 > # of unsupported tests 224 > > === g++ Summary === > > # of expected passes 2002 > # of unexpected failures 6 > # of unresolved testcases 1 > # of unsupported tests 175 > > OK? > > Jim > > 2020-10-28 Jim Wilson > > gcc/ > * config/riscv/riscv.c (riscv_asan_shadow_offset): New. > (TARGET_ASAN_SHADOW_OFFSET): New. > * doc/tm.texi: Regenerated. > * target.def (asan_shadow_offset); Mention that it can return zero. > * toplev.c (process_options): Check for and handle zero return from > targetm.asan_shadow_offset call. > > Co-Authored-By: cooper.joshua > --- > gcc/config/riscv/riscv.c | 16 > gcc/doc/tm.texi | 3 ++- > gcc/target.def | 3 ++- > gcc/toplev.c | 3 ++- > 4 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c > index 989a9f15250..6909e200de1 100644 > --- a/gcc/config/riscv/riscv.c > +++ b/gcc/config/riscv/riscv.c > @@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op) >return true; > } > > +/* Implement TARGET_ASAN_SHADOW_OFFSET. */ > + > +static unsigned HOST_WIDE_INT > +riscv_asan_shadow_offset (void) > +{ > + /* We only have libsanitizer support for RV64 at present. > + > + This number must match kRiscv*_ShadowOffset* in the file > + libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64, > + even though 1<<36 makes more sense. */ > + return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0; > +} > + > /* Initialize the GCC target structure. */ > #undef TARGET_ASM_ALIGNED_HI_OP > #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" > @@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op) > #undef TARGET_NEW_ADDRESS_PROFITABLE_P > #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p > > +#undef TARGET_ASAN_SHADOW_OFFSET > +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset > + > struct gcc_target targetm = TARGET_INITIALIZER; > > #include "gt-riscv.h" > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index 24c37f655c8..39c596b647a 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -12078,7 +12078,8 @@ is zero, which disables this optimization. > @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_ASAN_SHADOW_OFFSET > (void) > Return the offset bitwise ored into shifted address to get corresponding > Address Sanitizer shadow memory address. NULL if Address Sanitizer is not > -supported by the target. > +supported by the target. May
Re: 回复:[PATCH] Asan changes for RISC-V.
Arm 32, x86 (32) and mips has support for Asan[1], so we can `reference` how they implement that, but I guess the problem is we need someone to do that. [1] https://github.com/llvm/llvm-project/blob/main/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake#L28 On Thu, Apr 21, 2022 at 7:54 AM Palmer Dabbelt wrote: > > On Tue, 19 Apr 2022 23:13:15 PDT (-0700), gcc-patches@gcc.gnu.org wrote: > > Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is > > still unsupported for RISC-V. If I add '--enable-libsanitizer' in > > Makefile.in to reconfigure, there are compiling errors. > > Is it because # libsanitizer not supported rv32, but it will break the rv64 > > multi-lib build, so we disable that temporally until rv32 supported# in > > Makefile.in? > > Not quite sure what's going on here, I keep getting copies of this > message that look empty in gmail. > > I was under the impression that asan worked on rv64, but remember there > being some worrisome constants floating around (as Jim alludes to in the > forwarded patch). As far as I can tell there's no libsanitizer support > for rv32 (upstream is at LLVM), probably because we didn't have a stable > uABI back then. It's not super hard to do a libsanitizer port, but I > don't see any other 32-bit targets with asan so either I'm missing > something or it's tricky (and we don't have much free VA space, so not > sure if it'd even run anything useful). > > > -- > > 发件人:Jim Wilson > > 发送时间:2020年10月29日(星期四) 07:59 > > 收件人:gcc-patches > > 抄 送:cooper.joshua ; Jim Wilson > > > > 主 题:[PATCH] Asan changes for RISC-V. > > > > We have only riscv64 asan support, there is no riscv32 support as yet. So I > > need to be able to conditionally enable asan support for the riscv target. > > I > > implemented this by returning zero from the asan_shadow_offset function. > > This > > requires a change to toplev.c and docs in target.def. > > > > The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel. > > The problem is that the asan high memory region is a small wedge below > > 0x40. The new kernel puts shared libraries at 0x3f and > > going > > down which works. But the old kernel puts shared libraries at 0x20 > > and going up which does not work, as it isn't in any recognized memory > > region. This might be fixable with more asan work, but we don't really need > > support for old kernel versions. > > > > The asan port is curious in that it uses 1<<29 for the shadow offset, but > > all > > other 64-bit targets use a number larger than 1<<32. But what we have is > > working OK for now. > > > > I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running > > on > > qemu and the results look reasonable. > > > > === gcc Summary === > > > > # of expected passes 1905 > > # of unexpected failures 11 > > # of unsupported tests 224 > > > > === g++ Summary === > > > > # of expected passes 2002 > > # of unexpected failures 6 > > # of unresolved testcases 1 > > # of unsupported tests 175 > > > > OK? > > > > Jim > > > > 2020-10-28 Jim Wilson > > > > gcc/ > > * config/riscv/riscv.c (riscv_asan_shadow_offset): New. > > (TARGET_ASAN_SHADOW_OFFSET): New. > > * doc/tm.texi: Regenerated. > > * target.def (asan_shadow_offset); Mention that it can return zero. > > * toplev.c (process_options): Check for and handle zero return from > > targetm.asan_shadow_offset call. > > > > Co-Authored-By: cooper.joshua > > --- > > gcc/config/riscv/riscv.c | 16 > > gcc/doc/tm.texi | 3 ++- > > gcc/target.def | 3 ++- > > gcc/toplev.c | 3 ++- > > 4 files changed, 22 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c > > index 989a9f15250..6909e200de1 100644 > > --- a/gcc/config/riscv/riscv.c > > +++ b/gcc/config/riscv/riscv.c > > @@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op) > >return true; > > } > > > > +/* Implement TARGET_ASAN_SHADOW_OFFSET. */ > > + > > +static unsigned HOST_WIDE_INT > > +riscv_asan_shadow_offset (void) > > +{ > > + /* We only have libsanitizer support for RV64 at present. > > + > > + This number must match kRiscv*_ShadowOffset* in the file > > + libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64, > > + even though 1<<36 makes more sense. */ > > + return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0; > > +} > > + > > /* Initialize the GCC target structure. */ > > #undef TARGET_ASM_ALIGNED_HI_OP > > #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" > > @@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op) > > #undef TARGET_NEW_ADDRESS_PROFITABLE_P > > #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p > > > > +#undef TARGET_ASAN_SHADOW_OFFSET > > +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset > > + > > struct gcc_target targetm =
Re: [PATCH v4] libgo: Don't use pt_regs member in mcontext_t
On Thu, Apr 14, 2022 at 3:15 PM Ian Lance Taylor wrote: > > Thanks! I tested a version of that code with glibc, and it works > there too, so I've committed this patch after testing on > powerpc-linux-gnu and x86_64-linux-gnu. Please let me know about any > problems. Well, that patch broke PPC 32-bit, as reported in PR 105315, so I've committed this one. Tested on powerpc-linux-gnu, powerpc64-linux-gnu, powerpc64le-linux-gnu, all with glibc. I hope that it doesn't break musl again. Ian 8e14028002a661be19619ee8df081b713a8ec4a5 diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index 63238715bd0..ef20a0aafd6 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -99ca6be406a5781be078ff23f45a72b4c84b16e3 +70ca85f08edf63f46c87d540fa99c45e2903edc2 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c index 2caddd068d6..528d9b6d9fe 100644 --- a/libgo/runtime/go-signal.c +++ b/libgo/runtime/go-signal.c @@ -233,7 +233,11 @@ getSiginfo(siginfo_t *info, void *context __attribute__((unused))) #elif defined(__PPC64__) && defined(__linux__) ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32]; #elif defined(__PPC__) && defined(__linux__) +# if defined(__GLIBC__) + ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32]; +# else ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32]; +# endif #elif defined(__PPC__) && defined(_AIX) ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar; #elif defined(__aarch64__) && defined(__linux__) @@ -344,12 +348,13 @@ dumpregs(siginfo_t *info __attribute__((unused)), void *context __attribute__((u runtime_printf("sp %X\n", m->sc_regs[30]); runtime_printf("pc %X\n", m->sc_pc); } -#elif defined(__PPC__) && defined(__LITTLE_ENDIAN__) && defined(__linux__) +#elif defined(__PPC__) && defined(__linux__) { - mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext; int i; -#if defined(__PPC64__) +# if defined(__PPC64__) + mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext; + for (i = 0; i < 32; i++) runtime_printf("r%d %X\n", i, m->gp_regs[i]); runtime_printf("pc %X\n", m->gp_regs[32]); @@ -358,16 +363,22 @@ dumpregs(siginfo_t *info __attribute__((unused)), void *context __attribute__((u runtime_printf("lr %X\n", m->gp_regs[36]); runtime_printf("ctr %X\n", m->gp_regs[35]); runtime_printf("xer %X\n", m->gp_regs[37]); -#else +# else +# if defined(__GLIBC__) + mcontext_t *m = ((ucontext_t*)(context))->uc_mcontext.uc_regs; +# else + mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext; +# endif + for (i = 0; i < 32; i++) - runtime_printf("r%d %X\n", i, m->gregs[i]); - runtime_printf("pc %X\n", m->gregs[32]); - runtime_printf("msr %X\n", m->gregs[33]); - runtime_printf("cr %X\n", m->gregs[38]); - runtime_printf("lr %X\n", m->gregs[36]); - runtime_printf("ctr %X\n", m->gregs[35]); - runtime_printf("xer %X\n", m->gregs[37]); -#endif + runtime_printf("r%d %x\n", i, m->gregs[i]); + runtime_printf("pc %x\n", m->gregs[32]); + runtime_printf("msr %x\n", m->gregs[33]); + runtime_printf("cr %x\n", m->gregs[38]); + runtime_printf("lr %x\n", m->gregs[36]); + runtime_printf("ctr %x\n", m->gregs[35]); + runtime_printf("xer %x\n", m->gregs[37]); +# endif } #elif defined(__PPC__) && defined(_AIX) {
Re: 回复:[PATCH] Asan changes for RISC-V.
On Tue, 19 Apr 2022 23:13:15 PDT (-0700), gcc-patches@gcc.gnu.org wrote: Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is still unsupported for RISC-V. If I add '--enable-libsanitizer' in Makefile.in to reconfigure, there are compiling errors. Is it because # libsanitizer not supported rv32, but it will break the rv64 multi-lib build, so we disable that temporally until rv32 supported# in Makefile.in? Not quite sure what's going on here, I keep getting copies of this message that look empty in gmail. I was under the impression that asan worked on rv64, but remember there being some worrisome constants floating around (as Jim alludes to in the forwarded patch). As far as I can tell there's no libsanitizer support for rv32 (upstream is at LLVM), probably because we didn't have a stable uABI back then. It's not super hard to do a libsanitizer port, but I don't see any other 32-bit targets with asan so either I'm missing something or it's tricky (and we don't have much free VA space, so not sure if it'd even run anything useful). -- 发件人:Jim Wilson 发送时间:2020年10月29日(星期四) 07:59 收件人:gcc-patches 抄 送:cooper.joshua ; Jim Wilson 主 题:[PATCH] Asan changes for RISC-V. We have only riscv64 asan support, there is no riscv32 support as yet. So I need to be able to conditionally enable asan support for the riscv target. I implemented this by returning zero from the asan_shadow_offset function. This requires a change to toplev.c and docs in target.def. The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel. The problem is that the asan high memory region is a small wedge below 0x40. The new kernel puts shared libraries at 0x3f and going down which works. But the old kernel puts shared libraries at 0x20 and going up which does not work, as it isn't in any recognized memory region. This might be fixable with more asan work, but we don't really need support for old kernel versions. The asan port is curious in that it uses 1<<29 for the shadow offset, but all other 64-bit targets use a number larger than 1<<32. But what we have is working OK for now. I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running on qemu and the results look reasonable. === gcc Summary === # of expected passes 1905 # of unexpected failures 11 # of unsupported tests 224 === g++ Summary === # of expected passes 2002 # of unexpected failures 6 # of unresolved testcases 1 # of unsupported tests 175 OK? Jim 2020-10-28 Jim Wilson gcc/ * config/riscv/riscv.c (riscv_asan_shadow_offset): New. (TARGET_ASAN_SHADOW_OFFSET): New. * doc/tm.texi: Regenerated. * target.def (asan_shadow_offset); Mention that it can return zero. * toplev.c (process_options): Check for and handle zero return from targetm.asan_shadow_offset call. Co-Authored-By: cooper.joshua --- gcc/config/riscv/riscv.c | 16 gcc/doc/tm.texi | 3 ++- gcc/target.def | 3 ++- gcc/toplev.c | 3 ++- 4 files changed, 22 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 989a9f15250..6909e200de1 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op) return true; } +/* Implement TARGET_ASAN_SHADOW_OFFSET. */ + +static unsigned HOST_WIDE_INT +riscv_asan_shadow_offset (void) +{ + /* We only have libsanitizer support for RV64 at present. + + This number must match kRiscv*_ShadowOffset* in the file + libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64, + even though 1<<36 makes more sense. */ + return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0; +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op) #undef TARGET_NEW_ADDRESS_PROFITABLE_P #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p +#undef TARGET_ASAN_SHADOW_OFFSET +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-riscv.h" diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 24c37f655c8..39c596b647a 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -12078,7 +12078,8 @@ is zero, which disables this optimization. @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_ASAN_SHADOW_OFFSET (void) Return the offset bitwise ored into shifted address to get corresponding Address Sanitizer shadow memory address. NULL if Address Sanitizer is not -supported by the target. +supported by the target. May return 0 if Address Sanitizer is not supported +by a subtarget. @end deftypefn @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK (unsigned HOST_WIDE_INT @var{val}) diff --git
[PATCH] c++: wrong error with constexpr COMPOUND_EXPR [PR105321]
Here we issue a bogus error for the first assert in the test. Therein we have = (void) (VIEW_CONVERT_EXPR(yes) || handle_error ());, VIEW_CONVERT_EXPR(value); which has a COMPOUND_EXPR, so we get to cxx_eval_constant_expression . The problem here is that we call 7044 /* Check that the LHS is constant and then discard it. */ 7045 cxx_eval_constant_expression (ctx, op0, 7046 true, non_constant_p, overflow_p, 7047 jump_target); where lval is always true, so the PARM_DECL 'yes' is not evaluated into its value. r218832 changed the argument for 'lval' from false to true: (cxx_eval_constant_expression) [COMPOUND_EXPR]: Pass true for lval. but I think we want to pass 'lval' instead. Jakub tells me that's what we do for "(void) expr" as well. [expr.comma] says that the left expression is a discarded-value expression, but [expr.context] doesn't suggest that we should always be passing false for lval as pre-r218832. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11.3? PR c++/105321 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_constant_expression) : Pass lval to cxx_eval_constant_expression. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-105321.C: New test. --- gcc/cp/constexpr.cc | 2 +- gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C | 18 ++ 2 files changed, 19 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index e89440e770f..28271d4405d 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -7043,7 +7043,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, { /* Check that the LHS is constant and then discard it. */ cxx_eval_constant_expression (ctx, op0, - true, non_constant_p, overflow_p, + lval, non_constant_p, overflow_p, jump_target); if (*non_constant_p) return t; diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C new file mode 100644 index 000..adb6830ff22 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C @@ -0,0 +1,18 @@ +// PR c++/105321 +// { dg-do compile { target c++11 } } + +bool handle_error(); + +constexpr int echo(int value, bool yes = true) noexcept +{ +return (yes || handle_error()), value; +} + +static_assert(echo(10) == 10, ""); + +constexpr int echo2(int value, bool no = false) noexcept +{ +return (!no || handle_error()), value; +} + +static_assert(echo2(10) == 10, ""); base-commit: 5bde80f48bcc594658c788895ad1fd86d0916fc2 -- 2.35.1
Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059
On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote: > Eliminate power8 fusion options, use power8 tuning, PR target/102059 > > This is V4 of the patch. Compared to V3 of the patch, GCC will just > ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign. > Hi, No comments on code, a few comments about the comments below. > The splitting of signed halfword and word loads into unsigned load and > sign extension is now suppressed with -Os, but it is done normally if we > are not optimizing for space. I see references to TARGET_P8_FUSION_SIGN in the patch below, and some removal of old code. I assume this describes the implementation that remains. > > The power8 fusion support used to be set automatically when -mcpu=power8 or > -mtune=power8 was used, and it was cleared for other cpu's. However, if you > used the target attribute or target #pragma to change the default cpu type or > tuning, you would get an error that a target specifiction option mismatch > occurred. specification. :-) > > This occurred because the rs6000_can_inline_p function just compares the ISA > bits between the called inline function and the caller. If the ISA flags of > the called function is not a subset of the ISA flags of the caller, we won't > do > the inlinging. When a power9 or power10 function inlines a function that is > explicitly compiled for power8, the power8 function has the power8 fusion bits > set and the power9 or power10 functions do not have the fusion bits set. inlining. > > This code makes the -mpower8-fusion option a nop. It is accepted without > warning, but it does nothing. Power8 fusion is only enabled if we are tuning > for a power8. > > The undocumented -mpower8-fusion-sign option is also made into a nop. > > I left in the pragma target and attribute target support for power8-fusion, > but > using it doesn't do anything now. This is because I told the customer who > encountered this problem that one solution was to add an explicit > no-power8-fusion option in their target pragma or attribute to work around the > problem. > > I have tested this patch on a little endian power10 system. I have tested > previous versions on little endian power9 and big endian power8 systems. > Can I apply this patch to the master branch? > > If it is accepted, I will produce a similar patch for back porting to GCC 11 > and GCC 10. > > 2022-04-12 Michael Meissner > > gcc/ > PR target/102059 > * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete. > (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks. > (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION. > * config/rs6000/rs6000.cc (rs6000_option_override_internal): > Delete code that set the power8 fusion options automatically. > (rs6000_opt_masks): Allow #pragma target and attribute target > power8-fusion option for backwards compatibility. > (rs6000_print_options_internal): Skip printing backward > compatibility options that are just ignored. > * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro. > (TARGET_P8_FUSION_SIGN): Likewise. > (MASK_P8_FUSION): Delete. > * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but > ignore it completely. > (-mpower8-fusion-sign): Likewise. > * doc/invoke.texi (RS/6000 and PowerPC Options): Delete > -mpower8-fusion. > > gcc/testsuite/ > PR target/102059 > * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion. > * gcc.dg/lto/pr102059-2_0.c: Likewise. > * gcc.target/powerpc/pr102059-3.c: Likewise. > * gcc.target/powerpc/pr102059-4.c: New test. > --- > gcc/config/rs6000/rs6000-cpus.def | 18 +++ > gcc/config/rs6000/rs6000.cc | 49 +-- > gcc/config/rs6000/rs6000.h| 13 - > gcc/config/rs6000/rs6000.opt | 8 +-- > gcc/doc/invoke.texi | 13 + > gcc/testsuite/gcc.dg/lto/pr102059-1_0.c | 2 +- > gcc/testsuite/gcc.dg/lto/pr102059-2_0.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr102059-3.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 + > 9 files changed, 62 insertions(+), 68 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c > > diff --git a/gcc/config/rs6000/rs6000-cpus.def > b/gcc/config/rs6000/rs6000-cpus.def > index 963947f6939..d913a3d6b73 100644 > --- a/gcc/config/rs6000/rs6000-cpus.def > +++ b/gcc/config/rs6000/rs6000-cpus.def > @@ -54,19 +54,14 @@ >| OPTION_MASK_QUAD_MEMORY \ >| OPTION_MASK_QUAD_MEMORY_ATOMIC) > > -/* ISA masks setting fusion options. */ > -#define OTHER_FUSION_MASKS (OPTION_MASK_P8_FUSION \ > - | OPTION_MASK_P8_FUSION_SIGN) > - > /* Add ISEL back into ISA 3.0, since it is supposed to be a win. Do not
Re: [PATCH] PR fortran/105310 - ICE when UNION is after the 8th field in a DEC STRUCTURE with -finit-derived -finit-local-zero
Hi Fritz, Am 20.04.22 um 20:03 schrieb Fritz Reese via Fortran: See the bug report at gcc dot gnu dot org/bugzilla/show_bug.cgi?id=105310 . This code was originally authored by me and the fix is trivial, so I intend to commit the attached patch in the next few days if there is no dissent. OK if you add a/the testcase. The bug is caused by gfc_conv_union_initializer in gcc/fortran/trans-expr.cc, which accepts a pointer to a vector of constructor trees (vec*) as an argument, then appends one or two field constructors to the vector. The problem is the use of CONSTRUCTOR_APPEND_ELT(v, ...) within gfc_conv_union_initializer, which modifies the vector pointer v when a reallocation of the vector occurs, but the pointer is passed by value. Therefore, when a vector reallocation occurs, the caller's (gfc_conv_structure) vector pointer is not updated and subsequently points to freed memory. Chaos ensues. The bug only occurs when gfc_conv_union_initializer itself triggers the reallocation, which is whenever the vector is "full" (v->m_vecpfx.m_alloc == v->m_vecpfx.m_num). Since the vector defaults to allocating 8 elements and doubles in size for every reallocation, the bug only occurs when there are 8, 16, 32, etc... fields with initializers prior to the union, causing the vector of constructors to be resized when entering gfc_conv_union_initializer. The -finit-derived and -finit-local-zero options together ensure each field has an initializer, triggering the bug. The patch fixes the bug by passing the vector pointer to gfc_conv_union_initializer by reference, matching the signature of vec_safe_push from within the CONSTRUCTOR_APPEND_ELT macro. -- Fritz Reese As this affects all branches, you may backport the patch as far as you feel reasonable. (No, I do not use DEC extensions personally.) Thanks for the patch! Harald
Re: Ping: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059
On 4/20/22 11:01 AM, Michael Meissner wrote: > Ping patch. > > | Date: Tue, 12 Apr 2022 21:14:55 -0400 > | From: Michael Meissner > | Subject: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR > target/102059 > | Message-ID: > > I feel this is an important patch. Please look at it and approve the patch or > give me feedback on how to change it. Note, I will be in today (April 20th) > and tomorrow (April 21st), but I will be away from a computer on April 22-25 > (Friday through Monday). I agree this is important and we want this is in so we can get it backported. I'm being pinged about this from a customer who is using GCC10 and this issue is holding them back, so the quicker we get this in, the better. Peter
Re: [PATCH] emit-rtl: Fix -fcompare-debug bug with label references in debug insns [PR105203]
> Am 20.04.2022 um 18:52 schrieb Jakub Jelinek via Gcc-patches > : > > Hi! > > When we compute LABEL_NUSES from scratch, mark_all_labels doesn't call > mark_jump_label on DEBUG_INSNs: > if (NONDEBUG_INSN_P (insn)) >mark_jump_label (PATTERN (insn), insn, 0); > and so doesn't increment LABEL_NUSES from references in DEBUG_INSNs. > But, when we call emit_copy_of_insn_after e.g. when duplicating some > DEBUG_INSNs, we call it even on those, which then results in LABEL_NUSES > differences and -fcompare-debug failures. > > The following patch makes sure we don't call it on DEBUG_INSNs. > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk? Ok Richard > > 2022-04-20 Jakub Jelinek > >PR debug/105203 >* emit-rtl.cc (emit_copy_of_insn_after): Don't call mark_jump_label >on DEBUG_INSNs. > >* gfortran.dg/g77/pr105203.f: New test. > > --- gcc/emit-rtl.cc.jj2022-02-23 09:17:04.805125253 +0100 > +++ gcc/emit-rtl.cc2022-04-20 10:26:44.972198107 +0200 > @@ -6440,7 +6440,8 @@ emit_copy_of_insn_after (rtx_insn *insn, > } > > /* Update LABEL_NUSES. */ > - mark_jump_label (PATTERN (new_rtx), new_rtx, 0); > + if (NONDEBUG_INSN_P (insn)) > +mark_jump_label (PATTERN (new_rtx), new_rtx, 0); > > INSN_LOCATION (new_rtx) = INSN_LOCATION (insn); > > --- gcc/testsuite/gfortran.dg/g77/pr105203.f.jj2022-04-20 > 10:29:44.830696254 +0200 > +++ gcc/testsuite/gfortran.dg/g77/pr105203.f2022-04-20 10:31:13.532463772 > +0200 > @@ -0,0 +1,20 @@ > +C Test case for PR debug/105203 > +C Origin: kmcca...@princeton.edu > +C > +C { dg-do compile } > +C { dg-options "-O2 -fcompare-debug -ftracer -w" } > +C { dg-additional-options "-fPIC" { target fpic } } > + SUBROUTINE FOO (B) > + > + 10 CALL BAR (A) > + ASSIGN 20 TO M > + IF (100.LT.A) GOTO 10 > + GOTO 40 > +C > + 20 IF (B.LT.ABS(A)) GOTO 10 > + ASSIGN 30 TO M > + GOTO 40 > +C > + 30 ASSIGN 10 TO M > + 40 GOTO M,(10,20,30) > + END > >Jakub >
Re: [PATCH] opts: Disable -gstatement-frontiers by default [PR103788]
> Am 20.04.2022 um 19:15 schrieb Jakub Jelinek via Gcc-patches > : > > Hi! > > As mentioned in those PRs and I think in others too, there are some long > time unresolved -fcompare-debug issues with DEBUG_BEGIN_STMTs in the FEs and > during gimplification, especially with statement expressions, where we end > up with different code generation depending on whether there are > DEBUG_BEGIN_STMTs (which force STATEMENT_LISTs) or not (in that case > we often have just the single expression from the list). > I've tried to fix that several times, but nothing worked. > Furthermore, Alex mentioned in bugzilla that there are no consumers of the > statement frontiers right now. > > This patch turns -gstatement-frontiers off by default because of those > 2 reasons, consumers for those can still be added (one can test with > explicit -gstatement-frontiers) and if/once that happens, perhaps somebody > will have some great idea how to resolve those -fcompare-debug issues. > > Until then, can we go with this? > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk if it also passes > bootstrap/regtest on x86_64-linux/i686-linux? OK. Richard. > 2022-04-20 Jakub Jelinek > >PR debug/103788 >PR middle-end/100733 >PR debug/104180 >* opts.cc (finish_options): Disable -gstatement-frontiers by default. > >* gcc.dg/pr103788.c: New test. >* c-c++-common/ubsan/pr100733.c: New test. >* g++.dg/debug/pr104180.C: New test. > > --- gcc/opts.cc.jj2022-04-06 17:42:03.084190238 +0200 > +++ gcc/opts.cc2022-04-20 13:12:22.282322920 +0200 > @@ -1317,12 +1317,16 @@ finish_options (struct gcc_options *opts >debug_info_level = DINFO_LEVEL_NONE; > } > > + /* Don't enable -gstatement-frontiers by default until some consumers > + actually consume it and until the issues with DEBUG_BEGIN_STMTs > + affecting code generation e.g. for statement expressions are resolved. > + See PR103788, PR104180, PR100733. > if (!OPTION_SET_P (debug_nonbind_markers_p)) > debug_nonbind_markers_p > = (optimize > && debug_info_level >= DINFO_LEVEL_NORMAL > && dwarf_debuginfo_p () > - && !(flag_selective_scheduling || flag_selective_scheduling2)); > + && !(flag_selective_scheduling || flag_selective_scheduling2)); */ > > /* Note -fvar-tracking is enabled automatically with OPT_LEVELS_1_PLUS and > so we need to drop it if we are called from optimize attribute. */ > --- gcc/testsuite/gcc.dg/pr103788.c.jj2022-04-20 13:13:47.253141338 +0200 > +++ gcc/testsuite/gcc.dg/pr103788.c2022-04-20 13:13:29.301390970 +0200 > @@ -0,0 +1,28 @@ > +/* PR debug/103788 */ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fcompare-debug" } */ > + > +int > +bar (void); > + > +int > +foo (int x) > +{ > + int i; > + > + for (i = 0; i <= __INT_MAX__; ++i) > +x += bar () < (x ? 2 : 1); > + > + return x; > +} > + > +int > +baz (int x) > +{ > + int i; > + > + for (i = 0; i <= __INT_MAX__; ++i) > +x += bar () < ( > +x ? 2 : 1 ); > + return x; > +} > --- gcc/testsuite/c-c++-common/ubsan/pr100733.c.jj2022-04-20 > 13:18:09.135499667 +0200 > +++ gcc/testsuite/c-c++-common/ubsan/pr100733.c2022-04-20 > 13:18:43.031028328 +0200 > @@ -0,0 +1,9 @@ > +/* PR middle-end/100733 */ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug > -fdisable-tree-phiopt2" } */ > + > +int > +foo (int x) > +{ > + return (__builtin_expect (({ x != 0; }) ? 0 : 1, 3) == 0) * -1 << 0; > +} > --- gcc/testsuite/g++.dg/debug/pr104180.C.jj2022-04-20 13:14:51.468248383 > +0200 > +++ gcc/testsuite/g++.dg/debug/pr104180.C2022-04-20 13:15:17.856881425 > +0200 > @@ -0,0 +1,14 @@ > +/* PR debug/104180 */ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fcompare-debug" } */ > + > +int a[5]; > + > +void > +foo (void) > +{ > + unsigned int b; > + > + for (b = 3; ; b--) > +a[b] = ({ a[b + 1]; }); > +} > >Jakub >
[PATCH] PR middle-end/98865: Optimize (a>>63)*b as -(a>>63) in match.pd.
This patch implements the constant folding optimization(s) described in PR middle-end/98865, which should help address the serious performance regression of Botan AES-128/XTS mentioned in PR tree-optimization/98856. This combines aspects of both Jakub Jelinek's patch in comment #2 and Andrew Pinski's patch in comment #4, so both are listed as co-authors. Alas truth_valued_p is not quite what we want (and tweaking its definition has undesirable side-effects), so instead this patch introduces a new zero_one_valued predicate based on tree_nonzero_bits that extends truth_valued_p (which is for Boolean types with single bit precision). This is then used to simple if X*Y into X when both X and Y are zero_one_valued_p, and simplify X*Y into (-X) when X is zero_one_valued_p, in both cases replacing an integer multiplication with a cheaper bit-wise AND. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and with --target_board=unix{-m32}, with no new failures, except for a tweak required to tree-ssa/vrp116.c. The recently proposed cmove patch ensures the i386 backend continues to generate identical code for vrp116.c as before. Ok, either for mainline or when stage 1 reopens? 2022-04-20 Roger Sayle Andrew Pinski Jakub Jelinek gcc/ChangeLog PR middle-end/98865 * match.pd (match zero_one_valued_p): New predicate. (mult @0 @1): Use zero_one_valued_p for transforming into (and @0 @1). (mult zero_one_valued_p@0 @1): Convert integer multiplication into a negation and a bit-wise AND, if it can't be cheaply implemented by a single left shift. gcc/testsuite/ChangeLog PR middle-end/98865 * gcc.dg/pr98865.c: New test case. * gcc.dg/vrp116.c: Tweak test to confirm the integer multiplication has been eliminated, not for the actual replacement implementation. Thanks, Roger -- diff --git a/gcc/match.pd b/gcc/match.pd index 6d691d3..16a1203 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -285,14 +285,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || !COMPLEX_FLOAT_TYPE_P (type))) (negate @0))) -/* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 } */ -(simplify - (mult SSA_NAME@1 SSA_NAME@2) - (if (INTEGRAL_TYPE_P (type) - && get_nonzero_bits (@1) == 1 - && get_nonzero_bits (@2) == 1) - (bit_and @1 @2))) - /* Transform x * { 0 or 1, 0 or 1, ... } into x & { 0 or -1, 0 or -1, ...}, unless the target has native support for the former but not the latter. */ (simplify @@ -1789,6 +1781,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_not (bit_not @0)) @0) +(match zero_one_valued_p + @0 + (if (INTEGRAL_TYPE_P (type) && tree_nonzero_bits (@0) == 1))) +(match zero_one_valued_p + truth_valued_p@0) + +/* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 } */ +(simplify + (mult zero_one_valued_p@0 zero_one_valued_p@1) + (if (INTEGRAL_TYPE_P (type)) + (bit_and @0 @1))) + +/* Transform x * { 0 or 1 } into x & { 0 or -1 }, i.e. an integer + multiplication into negate/bitwise and. Don't do this if the + multiplication is cheap, may be implemented by a single shift. */ +(simplify + (mult:c zero_one_valued_p@0 @1) + (if (INTEGRAL_TYPE_P (type) + && (TREE_CODE (@1) != INTEGER_CST + || wi::popcount (wi::to_wide (@1)) > 1)) + (bit_and (negate @0) @1))) + /* Convert ~ (-A) to A - 1. */ (simplify (bit_not (convert? (negate @0))) diff --git a/gcc/testsuite/gcc.dg/pr98865.c b/gcc/testsuite/gcc.dg/pr98865.c new file mode 100644 index 000..e7599d3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr98865.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#if __SIZEOF_INT__ == 4 +unsigned int foo(unsigned int a, unsigned int b) +{ + return (a >> 31) * b; +} + +int bar(int a, int b) +{ + return -(a >> 31) * b; +} + +int baz(int a, int b) +{ + int c = a >> 31; + int d = -c; + return d * b; +} +#endif + +#if __SIZEOF_LONG_LONG__ == 8 +unsigned long long fool (unsigned long long a, unsigned long long b) +{ + return (a >> 63) * b; +} + +long long barl (long long a, long long b) +{ + return -(a >> 63) * b; +} + +long long bazl (long long a, long long b) +{ + long long c = a >> 63; + long long d = -c; + return d * b; +} +#endif + +unsigned int pin (int a, unsigned int b) +{ + unsigned int t = a & 1; + return t * b; +} + +unsigned long pinl (long a, unsigned long b) +{ + unsigned long t = a & 1; + return t * b; +} + +unsigned long long pinll (long long a, unsigned long long b) +{ + unsigned long long t = a & 1; + return t * b; +} + +/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c index 9e68a77..469b232 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c @@ -9,4 +9,5 @@ f (int m1, int m2, int c)
RE: [x86 PATCH] Improved V1TI (and V2DI) mode equality/inequality.
Doh! ENOPATCH. > -Original Message- > From: Roger Sayle > Sent: 20 April 2022 18:50 > To: 'gcc-patches@gcc.gnu.org' > Subject: [x86 PATCH] Improved V1TI (and V2DI) mode equality/inequality. > > > This patch (for when the compiler returns to stage 1) improves support for > vector equality and inequality of V1TImode vectors, and V2DImode vectors with > sse2 but not sse4. Consider the three functions below: > > typedef unsigned int uv4si __attribute__ ((__vector_size__ (16))); typedef > unsigned long long uv2di __attribute__ ((__vector_size__ (16))); typedef > unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); > > uv4si eq_v4si(uv4si x, uv4si y) { return x == y; } uv2di eq_v2di(uv2di x, uv2di y) { > return x == y; } uv1ti eq_v1ti(uv1ti x, uv1ti y) { return x == y; } > > These all perform vector comparisons of 128bit SSE2 registers, generating the > result as a vector, where ~0 (all 1 bits) represents true and a zero represents > false. eq_v4si is trivially implemented by x86_64's pcmpeqd instruction. This > patch improves the other two cases: > > For v2di, gcc -O2 currently generates: > > movq%xmm0, %rdx > movq%xmm1, %rax > movdqa %xmm0, %xmm2 > cmpq%rax, %rdx > movhlps %xmm2, %xmm3 > movhlps %xmm1, %xmm4 > sete%al > movq%xmm3, %rdx > movzbl %al, %eax > negq%rax > movq%rax, %xmm0 > movq%xmm4, %rax > cmpq%rax, %rdx > sete%al > movzbl %al, %eax > negq%rax > movq%rax, %xmm5 > punpcklqdq %xmm5, %xmm0 > ret > > but with this patch we now generate: > > pcmpeqd %xmm0, %xmm1 > pshufd $177, %xmm1, %xmm0 > pand%xmm1, %xmm0 > ret > > where the results of a V4SI comparison are shuffled and bit-wise ANDed to > produce the desired result. There's no change in the code generated for "-O2 - > msse4" where the compiler generates a single "pcmpeqq" insn. > > For V1TI mode, the results are equally dramatic, where the current -O2 output > looks like: > > movaps %xmm0, -40(%rsp) > movq-40(%rsp), %rax > movq-32(%rsp), %rdx > movaps %xmm1, -24(%rsp) > movq-24(%rsp), %rcx > movq-16(%rsp), %rsi > xorq%rcx, %rax > xorq%rsi, %rdx > orq %rdx, %rax > sete%al > xorl%edx, %edx > movzbl %al, %eax > negq%rax > adcq$0, %rdx > movq%rax, %xmm2 > negq%rdx > movq%rdx, -40(%rsp) > movhps -40(%rsp), %xmm2 > movdqa %xmm2, %xmm0 > ret > > with this patch we now generate: > > pcmpeqd %xmm0, %xmm1 > pshufd $177, %xmm1, %xmm0 > pand%xmm1, %xmm0 > pshufd $78, %xmm0, %xmm1 > pand%xmm1, %xmm0 > ret > > performing a V2DI comparison, followed by a shuffle and pand, and with > -O2 -msse4 take advantages of SSE4.1's pcmpeqq: > > pcmpeqq %xmm0, %xmm1 > pshufd $78, %xmm1, %xmm0 > pand%xmm1, %xmm0 > ret > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and > make -k check, both with and without --target_board=unix{-m32}, with no new > failures. Is this OK for when we return to stage 1? > > > 2022-04-20 Roger Sayle > > gcc/ChangeLog > * config/i386/sse.md (vec_cmpeqv2div2di): Enable for TARGET_SSE2. > For !TARGET_SSE4_1, expand as a V4SI vector comparison, followed > by a pshufd and pand. > (vec_cmpeqv1tiv1ti): New define_expand implementing V1TImode > vector equality as a V2DImode vector comparison (see above), > followed by a pshufd and pand. > > gcc/testsuite/ChangeLog > * gcc.target/i386/sse2-v1ti-veq.c: New test case. > * gcc.target/i386/sse2-v1ti-vne.c: New test case. > > > Roger > -- diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a852c16..9bc8fb0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4379,13 +4379,57 @@ (match_operator:V2DI 1 "" [(match_operand:V2DI 2 "register_operand") (match_operand:V2DI 3 "vector_operand")]))] - "TARGET_SSE4_1" + "TARGET_SSE2" { - bool ok = ix86_expand_int_vec_cmp (operands); + bool ok; + if (!TARGET_SSE4_1) +{ + rtx ops[4]; + ops[0] = gen_reg_rtx (V4SImode); + ops[2] = force_reg (V4SImode, gen_lowpart (V4SImode, operands[2])); + ops[3] = force_reg (V4SImode, gen_lowpart (V4SImode, operands[3])); + ops[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]), V4SImode, + ops[2], ops[3]); + ok = ix86_expand_int_vec_cmp (ops); + + rtx tmp1 = gen_reg_rtx (V4SImode); + emit_insn (gen_sse2_pshufd (tmp1, ops[0], GEN_INT (0xb1))); + + rtx tmp2 = gen_reg_rtx (V4SImode); + emit_insn (gen_andv4si3 (tmp2, tmp1,
[PATCH] PR fortran/105310 - ICE when UNION is after the 8th field in a DEC STRUCTURE with -finit-derived -finit-local-zero
See the bug report at gcc dot gnu dot org/bugzilla/show_bug.cgi?id=105310 . This code was originally authored by me and the fix is trivial, so I intend to commit the attached patch in the next few days if there is no dissent. The bug is caused by gfc_conv_union_initializer in gcc/fortran/trans-expr.cc, which accepts a pointer to a vector of constructor trees (vec*) as an argument, then appends one or two field constructors to the vector. The problem is the use of CONSTRUCTOR_APPEND_ELT(v, ...) within gfc_conv_union_initializer, which modifies the vector pointer v when a reallocation of the vector occurs, but the pointer is passed by value. Therefore, when a vector reallocation occurs, the caller's (gfc_conv_structure) vector pointer is not updated and subsequently points to freed memory. Chaos ensues. The bug only occurs when gfc_conv_union_initializer itself triggers the reallocation, which is whenever the vector is "full" (v->m_vecpfx.m_alloc == v->m_vecpfx.m_num). Since the vector defaults to allocating 8 elements and doubles in size for every reallocation, the bug only occurs when there are 8, 16, 32, etc... fields with initializers prior to the union, causing the vector of constructors to be resized when entering gfc_conv_union_initializer. The -finit-derived and -finit-local-zero options together ensure each field has an initializer, triggering the bug. The patch fixes the bug by passing the vector pointer to gfc_conv_union_initializer by reference, matching the signature of vec_safe_push from within the CONSTRUCTOR_APPEND_ELT macro. -- Fritz Reese diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index 06713f24f95..8677a3b0b20 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -9195,7 +9195,7 @@ gfc_trans_structure_assign (tree dest, gfc_expr * expr, bool init, bool coarray) } void -gfc_conv_union_initializer (vec *v, +gfc_conv_union_initializer (vec *, gfc_component *un, gfc_expr *init) { gfc_constructor *ctor;
[x86 PATCH] Improved V1TI (and V2DI) mode equality/inequality.
This patch (for when the compiler returns to stage 1) improves support for vector equality and inequality of V1TImode vectors, and V2DImode vectors with sse2 but not sse4. Consider the three functions below: typedef unsigned int uv4si __attribute__ ((__vector_size__ (16))); typedef unsigned long long uv2di __attribute__ ((__vector_size__ (16))); typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); uv4si eq_v4si(uv4si x, uv4si y) { return x == y; } uv2di eq_v2di(uv2di x, uv2di y) { return x == y; } uv1ti eq_v1ti(uv1ti x, uv1ti y) { return x == y; } These all perform vector comparisons of 128bit SSE2 registers, generating the result as a vector, where ~0 (all 1 bits) represents true and a zero represents false. eq_v4si is trivially implemented by x86_64's pcmpeqd instruction. This patch improves the other two cases: For v2di, gcc -O2 currently generates: movq%xmm0, %rdx movq%xmm1, %rax movdqa %xmm0, %xmm2 cmpq%rax, %rdx movhlps %xmm2, %xmm3 movhlps %xmm1, %xmm4 sete%al movq%xmm3, %rdx movzbl %al, %eax negq%rax movq%rax, %xmm0 movq%xmm4, %rax cmpq%rax, %rdx sete%al movzbl %al, %eax negq%rax movq%rax, %xmm5 punpcklqdq %xmm5, %xmm0 ret but with this patch we now generate: pcmpeqd %xmm0, %xmm1 pshufd $177, %xmm1, %xmm0 pand%xmm1, %xmm0 ret where the results of a V4SI comparison are shuffled and bit-wise ANDed to produce the desired result. There's no change in the code generated for "-O2 -msse4" where the compiler generates a single "pcmpeqq" insn. For V1TI mode, the results are equally dramatic, where the current -O2 output looks like: movaps %xmm0, -40(%rsp) movq-40(%rsp), %rax movq-32(%rsp), %rdx movaps %xmm1, -24(%rsp) movq-24(%rsp), %rcx movq-16(%rsp), %rsi xorq%rcx, %rax xorq%rsi, %rdx orq %rdx, %rax sete%al xorl%edx, %edx movzbl %al, %eax negq%rax adcq$0, %rdx movq%rax, %xmm2 negq%rdx movq%rdx, -40(%rsp) movhps -40(%rsp), %xmm2 movdqa %xmm2, %xmm0 ret with this patch we now generate: pcmpeqd %xmm0, %xmm1 pshufd $177, %xmm1, %xmm0 pand%xmm1, %xmm0 pshufd $78, %xmm0, %xmm1 pand%xmm1, %xmm0 ret performing a V2DI comparison, followed by a shuffle and pand, and with -O2 -msse4 take advantages of SSE4.1's pcmpeqq: pcmpeqq %xmm0, %xmm1 pshufd $78, %xmm1, %xmm0 pand%xmm1, %xmm0 ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Is this OK for when we return to stage 1? 2022-04-20 Roger Sayle gcc/ChangeLog * config/i386/sse.md (vec_cmpeqv2div2di): Enable for TARGET_SSE2. For !TARGET_SSE4_1, expand as a V4SI vector comparison, followed by a pshufd and pand. (vec_cmpeqv1tiv1ti): New define_expand implementing V1TImode vector equality as a V2DImode vector comparison (see above), followed by a pshufd and pand. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-veq.c: New test case. * gcc.target/i386/sse2-v1ti-vne.c: New test case. Roger --
[PATCH] opts: Disable -gstatement-frontiers by default [PR103788]
Hi! As mentioned in those PRs and I think in others too, there are some long time unresolved -fcompare-debug issues with DEBUG_BEGIN_STMTs in the FEs and during gimplification, especially with statement expressions, where we end up with different code generation depending on whether there are DEBUG_BEGIN_STMTs (which force STATEMENT_LISTs) or not (in that case we often have just the single expression from the list). I've tried to fix that several times, but nothing worked. Furthermore, Alex mentioned in bugzilla that there are no consumers of the statement frontiers right now. This patch turns -gstatement-frontiers off by default because of those 2 reasons, consumers for those can still be added (one can test with explicit -gstatement-frontiers) and if/once that happens, perhaps somebody will have some great idea how to resolve those -fcompare-debug issues. Until then, can we go with this? Bootstrapped/regtested on powerpc64le-linux, ok for trunk if it also passes bootstrap/regtest on x86_64-linux/i686-linux? 2022-04-20 Jakub Jelinek PR debug/103788 PR middle-end/100733 PR debug/104180 * opts.cc (finish_options): Disable -gstatement-frontiers by default. * gcc.dg/pr103788.c: New test. * c-c++-common/ubsan/pr100733.c: New test. * g++.dg/debug/pr104180.C: New test. --- gcc/opts.cc.jj 2022-04-06 17:42:03.084190238 +0200 +++ gcc/opts.cc 2022-04-20 13:12:22.282322920 +0200 @@ -1317,12 +1317,16 @@ finish_options (struct gcc_options *opts debug_info_level = DINFO_LEVEL_NONE; } + /* Don't enable -gstatement-frontiers by default until some consumers + actually consume it and until the issues with DEBUG_BEGIN_STMTs + affecting code generation e.g. for statement expressions are resolved. + See PR103788, PR104180, PR100733. if (!OPTION_SET_P (debug_nonbind_markers_p)) debug_nonbind_markers_p = (optimize && debug_info_level >= DINFO_LEVEL_NORMAL && dwarf_debuginfo_p () -&& !(flag_selective_scheduling || flag_selective_scheduling2)); +&& !(flag_selective_scheduling || flag_selective_scheduling2)); */ /* Note -fvar-tracking is enabled automatically with OPT_LEVELS_1_PLUS and so we need to drop it if we are called from optimize attribute. */ --- gcc/testsuite/gcc.dg/pr103788.c.jj 2022-04-20 13:13:47.253141338 +0200 +++ gcc/testsuite/gcc.dg/pr103788.c 2022-04-20 13:13:29.301390970 +0200 @@ -0,0 +1,28 @@ +/* PR debug/103788 */ +/* { dg-do compile } */ +/* { dg-options "-O1 -fcompare-debug" } */ + +int +bar (void); + +int +foo (int x) +{ + int i; + + for (i = 0; i <= __INT_MAX__; ++i) +x += bar () < (x ? 2 : 1); + + return x; +} + +int +baz (int x) +{ + int i; + + for (i = 0; i <= __INT_MAX__; ++i) +x += bar () < ( +x ? 2 : 1 ); + return x; +} --- gcc/testsuite/c-c++-common/ubsan/pr100733.c.jj 2022-04-20 13:18:09.135499667 +0200 +++ gcc/testsuite/c-c++-common/ubsan/pr100733.c 2022-04-20 13:18:43.031028328 +0200 @@ -0,0 +1,9 @@ +/* PR middle-end/100733 */ +/* { dg-do compile } */ +/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug -fdisable-tree-phiopt2" } */ + +int +foo (int x) +{ + return (__builtin_expect (({ x != 0; }) ? 0 : 1, 3) == 0) * -1 << 0; +} --- gcc/testsuite/g++.dg/debug/pr104180.C.jj2022-04-20 13:14:51.468248383 +0200 +++ gcc/testsuite/g++.dg/debug/pr104180.C 2022-04-20 13:15:17.856881425 +0200 @@ -0,0 +1,14 @@ +/* PR debug/104180 */ +/* { dg-do compile } */ +/* { dg-options "-O1 -fcompare-debug" } */ + +int a[5]; + +void +foo (void) +{ + unsigned int b; + + for (b = 3; ; b--) +a[b] = ({ a[b + 1]; }); +} Jakub
[PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]
Hi! So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs for which that function is invoked need an extra artificial BIND_EXPR around their body so that we move all variables of the bodies. The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK or OMP_TARGET and for OpenACC constructs that behave similarly to OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. The following patch does that for OpenACC constructs too. This fixes ICE on the attached testcase. Unfortunately, it also regresses FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90 -O (test for excess errors) FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O0 (test for excess errors) FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O1 (test for excess errors) FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -g (test for excess errors) FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (test for excess errors) Those emits emit tons of various messages and now there are some extra ones, the previous as well as new ones are mostly on artificial variables created by the compiler, so I wonder if we should emit those at all. Anyway, here it is the patch, appart from those regressions passed bootstrap/regtested on powerpc64le-linux. 2022-04-20 Jakub Jelinek PR fortran/104717 * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body in an extra BIND_EXPR. * gfortran.dg/goacc/pr104717.f90: New test. --- gcc/fortran/trans-openmp.cc.jj 2022-04-06 09:59:32.729654664 +0200 +++ gcc/fortran/trans-openmp.cc 2022-04-20 12:48:19.773402677 +0200 @@ -,7 +,9 @@ gfc_trans_oacc_construct (gfc_code *code gfc_start_block (); oacc_clauses = gfc_trans_omp_clauses (, code->ext.omp_clauses, code->loc, false, true); + pushlevel (); stmt = gfc_trans_omp_code (code->block->next, true); + stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); stmt = build2_loc (gfc_get_location (>loc), construct_code, void_type_node, stmt, oacc_clauses); gfc_add_expr_to_block (, stmt); --- gcc/testsuite/gfortran.dg/goacc/pr104717.f90.jj 2022-04-20 12:53:54.002748265 +0200 +++ gcc/testsuite/gfortran.dg/goacc/pr104717.f902022-04-20 12:53:12.811321862 +0200 @@ -0,0 +1,22 @@ +! PR fortran/104717 +! { dg-do compile } +! { dg-options "-O1 -fopenacc -fstack-arrays" } + +program main + implicit none (type, external) + integer :: j + integer, allocatable :: A(:) + + A = [(3*j, j=1, 10)] + call foo (A, size(A)) + deallocate (A) +contains + subroutine foo (array, nn) +integer :: i, nn +integer :: array(nn) + +!$acc parallel copyout(array) +array = [(-i, i = 1, nn)] +!$acc end parallel + end subroutine foo +end Jakub
[PATCH] emit-rtl: Fix -fcompare-debug bug with label references in debug insns [PR105203]
Hi! When we compute LABEL_NUSES from scratch, mark_all_labels doesn't call mark_jump_label on DEBUG_INSNs: if (NONDEBUG_INSN_P (insn)) mark_jump_label (PATTERN (insn), insn, 0); and so doesn't increment LABEL_NUSES from references in DEBUG_INSNs. But, when we call emit_copy_of_insn_after e.g. when duplicating some DEBUG_INSNs, we call it even on those, which then results in LABEL_NUSES differences and -fcompare-debug failures. The following patch makes sure we don't call it on DEBUG_INSNs. Bootstrapped/regtested on powerpc64le-linux, ok for trunk? 2022-04-20 Jakub Jelinek PR debug/105203 * emit-rtl.cc (emit_copy_of_insn_after): Don't call mark_jump_label on DEBUG_INSNs. * gfortran.dg/g77/pr105203.f: New test. --- gcc/emit-rtl.cc.jj 2022-02-23 09:17:04.805125253 +0100 +++ gcc/emit-rtl.cc 2022-04-20 10:26:44.972198107 +0200 @@ -6440,7 +6440,8 @@ emit_copy_of_insn_after (rtx_insn *insn, } /* Update LABEL_NUSES. */ - mark_jump_label (PATTERN (new_rtx), new_rtx, 0); + if (NONDEBUG_INSN_P (insn)) +mark_jump_label (PATTERN (new_rtx), new_rtx, 0); INSN_LOCATION (new_rtx) = INSN_LOCATION (insn); --- gcc/testsuite/gfortran.dg/g77/pr105203.f.jj 2022-04-20 10:29:44.830696254 +0200 +++ gcc/testsuite/gfortran.dg/g77/pr105203.f2022-04-20 10:31:13.532463772 +0200 @@ -0,0 +1,20 @@ +C Test case for PR debug/105203 +C Origin: kmcca...@princeton.edu +C +C { dg-do compile } +C { dg-options "-O2 -fcompare-debug -ftracer -w" } +C { dg-additional-options "-fPIC" { target fpic } } + SUBROUTINE FOO (B) + + 10 CALL BAR (A) + ASSIGN 20 TO M + IF (100.LT.A) GOTO 10 + GOTO 40 +C + 20 IF (B.LT.ABS(A)) GOTO 10 + ASSIGN 30 TO M + GOTO 40 +C + 30 ASSIGN 10 TO M + 40 GOTO M,(10,20,30) + END Jakub
Re: [PATCH] Add HAVE_DEBUGINFOD_SUPPORT to built-in features.
Em Wed, Apr 20, 2022 at 01:30:09PM +0200, Martin Liška escreveu: > The change adds debuginfod to ./perf -vv: > > ... > debuginfod: [ OFF ] # HAVE_DEBUGINFOD_SUPPORT > ... Thanks, applied. - Arnaldo > Signed-off-by: Martin Liska > --- > tools/perf/builtin-version.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c > index 9cd074a3d825..a71f491224da 100644 > --- a/tools/perf/builtin-version.c > +++ b/tools/perf/builtin-version.c > @@ -65,6 +65,7 @@ static void library_status(void) > #endif > STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table); > STATUS(HAVE_LIBBFD_SUPPORT, libbfd); > + STATUS(HAVE_DEBUGINFOD_SUPPORT, debuginfod); > STATUS(HAVE_LIBELF_SUPPORT, libelf); > STATUS(HAVE_LIBNUMA_SUPPORT, libnuma); > STATUS(HAVE_LIBNUMA_SUPPORT, numa_num_possible_cpus); > -- > 2.35.3 -- - Arnaldo
Ping: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation.
Ping patch. | Date: Wed, 6 Apr 2022 14:21:26 -0400 | From: Michael Meissner | Subject: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping: [PATCH] Replace UNSPEC with RTL code for extendditi2.
Ping patch. While this could be held for GCC 13, it would be nice to know whether to keep this patch (which was asked for in one of the previous patches) or discard it. | Date: Fri, 1 Apr 2022 12:59:28 -0400 | From: Michael Meissner | Subject: [PATCH] Replace UNSPEC with RTL code for extendditi2. | Message-ID: -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Ping #2: [PATCH, V2] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.
Ping #2 on this patch. | Date: Tue, 29 Mar 2022 23:25:31 -0400 | From: Michael Meissner } Subject: [PATCH, V2] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293. | Message-ID: -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]
> On Apr 20, 2022, at 5:38 AM, Richard Biener > wrote: > > On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao wrote: >> >> >> >>> On Apr 14, 2022, at 1:53 AM, Richard Biener >>> wrote: >>> >>> On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao wrote: Hi, Richard, Thanks a lot for taking a look at this issue (and Sorry that I haven’t fixed this one yet, I was distracted by other tasks then just forgot this one….) > On Apr 13, 2022, at 3:41 AM, Richard Biener > wrote: > > On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches > wrote: >> >> >> >>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek wrote: >>> >>> Hi! >>> >>> For IBM double double I've added in PR95450 and PR99648 verification >>> that >>> when we at the tree/GIMPLE or RTL level interpret target bytes as a >>> REAL_CST >>> or CONST_DOUBLE constant, we try to encode it back to target bytes and >>> verify it is the same. >>> This is because our real.c support isn't able to represent all valid >>> values >>> of IBM double double which has variable precision. >>> In PR104522, it has been noted that we have similar problem with the >>> Intel/Motorola extended XFmode formats, our internal representation >>> isn't >>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and >>> unnormal >>> values. >>> So, the following patch is an attempt to extend that verification to all >>> floats. >>> Unfortunately, it wasn't that straightforward, because the >>> __builtin_clear_padding code exactly for the XFmode long doubles needs >>> to >>> discover what bits are padding and does that by interpreting memory of >>> all 1s. That is actually a valid supported value, a qNaN with negative >>> sign with all mantissa bits set, but the verification includes also the >>> padding bits (exactly what __builtin_clear_padding wants to figure out) >>> and so fails the comparison check and so we ICE. >>> The patch fixes that case by moving that verification from >>> native_interpret_real to its caller, so that clear_padding_type can >>> call native_interpret_real and avoid that extra check. >>> >>> With this, the only thing that regresses in the testsuite is >>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times >>> long\\t-16843010 5 >>> because it decides to use a pattern that has non-zero bits in the >>> padding >>> bits of the long double, so the simplify-rtx.cc change prevents folding >>> a SUBREG into a constant. We emit (the testcase is -O0 but we emit >>> worse >>> code at all opt levels) something like: >>> movabsq $-72340172838076674, %rax >>> movabsq $-72340172838076674, %rdx >>> movq%rax, -48(%rbp) >>> movq%rdx, -40(%rbp) >>> fldt-48(%rbp) >>> fstpt -32(%rbp) >>> instead of >>> fldt.LC2(%rip) >>> fstpt -32(%rbp) >>> ... >>> .LC2: >>> .long -16843010 >>> .long -16843010 >>> .long 65278 >>> .long 0 >>> Note, neither of those sequences actually stores the padding bits, fstpt >>> simply doesn't touch them. >>> For vars with clear_padding_real_needs_padding_p types that are >>> allocated >>> to memory at expansion time, I'd say much better would be to do the >>> stores >>> using integral modes rather than XFmode, so do that: >>> movabsq $-72340172838076674, %rax >>>movq%rax, -32(%rbp) >>>movq%rax, -24(%rbp) >>> directly. That is the only way to ensure the padding bits are >>> initialized >>> (or expand __builtin_clear_padding, but then you initialize separately >>> the >>> value bits and padding bits). >>> >>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as >>> mentioned >>> above, the gcc.target/i386/auto-init-4.c case is unresolved. >> >> Thanks, I will try to fix this testing case in a later patch. > > I've looked at this FAIL now and really wonder whether "pattern init" as > implemented makes any sense for non-integral types. > We end up with > initializing a register (SSA name) with > > VIEW_CONVERT_EXPR(0xfefefefefefefefefefefefefefefefe) > > as we go building a TImode constant (we verified we have a TImode SET!) > but then > >/* Pun the LHS to make sure its type has constant size > unless it is an SSA name where that's already known. */ >if (TREE_CODE (lhs) != SSA_NAME) > lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs); >else > init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init); > ... >expand_assignment (lhs, init, false); > > and generally registers do not have any padding. This
Ping: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059
Ping patch. | Date: Tue, 12 Apr 2022 21:14:55 -0400 | From: Michael Meissner | Subject: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059 | Message-ID: I feel this is an important patch. Please look at it and approve the patch or give me feedback on how to change it. Note, I will be in today (April 20th) and tomorrow (April 21st), but I will be away from a computer on April 22-25 (Friday through Monday). -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [PATCH][v3] rtl-optimization/105231 - distribute_notes and REG_EH_REGION
Hi! This looks great :-) On Wed, Apr 20, 2022 at 03:52:33PM +0200, Richard Biener wrote: > The following mitigates a problem in combine distribute_notes which > places an original REG_EH_REGION based on only may_trap_p which is > good to test whether a non-call insn can possibly throw but not if > actually it does or we care. That's something we decided at RTL > expansion time where we possibly still know the insn evaluates > to a constant. > > In fact, the REG_EH_REGION note with lp > 0 can only come from the > original i3 and an assert is added to that effect. That means we only > need to retain the note on i3 or, if that cannot trap, drop it but we > should never move it to i2. > > For REG_EH_REGION corresponding to must-not-throw regions or > nothrow marking try_combine gets new code ensuring we can merge > and distribute notes which means placing must-not-throw notes > on all result insns, and dropping nothrow notes or preserve > them on i3 for calls. > * combine.cc (distribute_notes): Assert that a REG_EH_REGION > with landing pad > 0 is from i3 and only keep it there or drop > it if the insn can not trap. Throw away REG_EH_REGION with > landing pad = 0 or INT_MIN if it does not originate from a > call i3. Distribute must-not-throw REG_EH_REGION to all > resulting instructions. > (try_combine): Ensure that we can merge REG_EH_REGION notes. > --- a/gcc/combine.cc > +++ b/gcc/combine.cc > @@ -2951,6 +2951,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, > rtx_insn *i0, >return 0; > } > > + /* When i3 transfers to an EH handler we cannot combine if any of the > + sources are within a must-not-throw region. Else we can throw away > + any nothrow, pick a random must-not-throw region or preserve the EH > + transfer on i3. Since we want to preserve nothrow notes on calls > + we have to avoid combining from must-not-throw stmts there as well. > + This has to be kept in sync with distribute_note. */ > + if (rtx i3_eh = find_reg_note (i3, REG_EH_REGION, NULL_RTX)) > +{ > + int i3_lp_nr = INTVAL (XEXP (i3_eh, 0)); > + if (i3_lp_nr > 0 > + || ((i3_lp_nr == 0 || i3_lp_nr == INT_MIN) && CALL_P (i3))) > + { > + rtx eh; > + int eh_lp; > + if (((eh = find_reg_note (i2, REG_EH_REGION, NULL_RTX)) > +&& (eh_lp = INTVAL (XEXP (eh, 0))) < 0 > +&& eh_lp != INT_MIN) > + || (i2 > + && (eh = find_reg_note (i2, REG_EH_REGION, NULL_RTX)) > + && (eh_lp = INTVAL (XEXP (eh, 0))) < 0 > + && eh_lp != INT_MIN) > + || (i1 > + && (eh = find_reg_note (i1, REG_EH_REGION, NULL_RTX)) > + && (eh_lp = INTVAL (XEXP (eh, 0))) < 0 > + && eh_lp != INT_MIN) > + || (i0 > + && (eh = find_reg_note (i0, REG_EH_REGION, NULL_RTX)) > + && (eh_lp = INTVAL (XEXP (eh, 0))) < 0 > + && eh_lp != INT_MIN)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "Can't combine insn in must-not-throw " > + "EH region into i3 which can throws\n"); > + undo_all (); > + return 0; > + } > + } > +} The assignments in the conditionals make this hard to read, and harder to change, btw. A utility function wouldn't hurt? The problem of course would be thinking of a good name for it :-) > case REG_EH_REGION: > - /* These notes must remain with the call or trapping instruction. */ > - if (CALL_P (i3)) > - place = i3; > - else if (i2 && CALL_P (i2)) > - place = i2; > - else > - { > - gcc_assert (cfun->can_throw_non_call_exceptions); > - if (may_trap_p (i3)) > - place = i3; > - else if (i2 && may_trap_p (i2)) > - place = i2; > - /* ??? Otherwise assume we've combined things such that we > - can now prove that the instructions can't trap. Drop the > - note in this case. */ > - } > - break; > + { > + /* This handling needs to be kept in sync with the > +prerequesite checking in try_combine. */ (prerequisite) > + int lp_nr = INTVAL (XEXP (note, 0)); > + /* A REG_EH_REGION note transfering control can only ever come > +from i3 and it has to stay there. */ > + if (lp_nr > 0) > + { > + gcc_assert (from_insn == i3); > + if (CALL_P (i3)) > + place = i3; > + else > + { > + gcc_assert (cfun->can_throw_non_call_exceptions); > + /* If i3 can still trap preserve the note, otherwise we've > +combined things such that we can now prove that the > +instructions can't trap. Drop the note in
[PATCH][v3] rtl-optimization/105231 - distribute_notes and REG_EH_REGION
The following mitigates a problem in combine distribute_notes which places an original REG_EH_REGION based on only may_trap_p which is good to test whether a non-call insn can possibly throw but not if actually it does or we care. That's something we decided at RTL expansion time where we possibly still know the insn evaluates to a constant. In fact, the REG_EH_REGION note with lp > 0 can only come from the original i3 and an assert is added to that effect. That means we only need to retain the note on i3 or, if that cannot trap, drop it but we should never move it to i2. For REG_EH_REGION corresponding to must-not-throw regions or nothrow marking try_combine gets new code ensuring we can merge and distribute notes which means placing must-not-throw notes on all result insns, and dropping nothrow notes or preserve them on i3 for calls. Bootstrapped and tested on x86_64-unknown-linux-gnu. WDYT? Thanks, Richard. 2022-04-19 Richard Biener PR rtl-optimization/105231 * combine.cc (distribute_notes): Assert that a REG_EH_REGION with landing pad > 0 is from i3 and only keep it there or drop it if the insn can not trap. Throw away REG_EH_REGION with landing pad = 0 or INT_MIN if it does not originate from a call i3. Distribute must-not-throw REG_EH_REGION to all resulting instructions. (try_combine): Ensure that we can merge REG_EH_REGION notes. * gcc.dg/torture/pr105231.c: New testcase. --- gcc/combine.cc | 106 gcc/testsuite/gcc.dg/torture/pr105231.c | 15 2 files changed, 104 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr105231.c diff --git a/gcc/combine.cc b/gcc/combine.cc index 53dcac92abc..ba234e3af5f 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -2951,6 +2951,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, return 0; } + /* When i3 transfers to an EH handler we cannot combine if any of the + sources are within a must-not-throw region. Else we can throw away + any nothrow, pick a random must-not-throw region or preserve the EH + transfer on i3. Since we want to preserve nothrow notes on calls + we have to avoid combining from must-not-throw stmts there as well. + This has to be kept in sync with distribute_note. */ + if (rtx i3_eh = find_reg_note (i3, REG_EH_REGION, NULL_RTX)) +{ + int i3_lp_nr = INTVAL (XEXP (i3_eh, 0)); + if (i3_lp_nr > 0 + || ((i3_lp_nr == 0 || i3_lp_nr == INT_MIN) && CALL_P (i3))) + { + rtx eh; + int eh_lp; + if (((eh = find_reg_note (i2, REG_EH_REGION, NULL_RTX)) + && (eh_lp = INTVAL (XEXP (eh, 0))) < 0 + && eh_lp != INT_MIN) + || (i2 + && (eh = find_reg_note (i2, REG_EH_REGION, NULL_RTX)) + && (eh_lp = INTVAL (XEXP (eh, 0))) < 0 + && eh_lp != INT_MIN) + || (i1 + && (eh = find_reg_note (i1, REG_EH_REGION, NULL_RTX)) + && (eh_lp = INTVAL (XEXP (eh, 0))) < 0 + && eh_lp != INT_MIN) + || (i0 + && (eh = find_reg_note (i0, REG_EH_REGION, NULL_RTX)) + && (eh_lp = INTVAL (XEXP (eh, 0))) < 0 + && eh_lp != INT_MIN)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Can't combine insn in must-not-throw " +"EH region into i3 which can throws\n"); + undo_all (); + return 0; + } + } +} + /* Record whether i2 and i3 are trivial moves. */ i2_was_move = is_just_move (i2); i3_was_move = is_just_move (i3); @@ -14175,23 +14214,56 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2, break; case REG_EH_REGION: - /* These notes must remain with the call or trapping instruction. */ - if (CALL_P (i3)) - place = i3; - else if (i2 && CALL_P (i2)) - place = i2; - else - { - gcc_assert (cfun->can_throw_non_call_exceptions); - if (may_trap_p (i3)) - place = i3; - else if (i2 && may_trap_p (i2)) - place = i2; - /* ??? Otherwise assume we've combined things such that we -can now prove that the instructions can't trap. Drop the -note in this case. */ - } - break; + { + /* This handling needs to be kept in sync with the + prerequesite checking in try_combine. */ + int lp_nr = INTVAL (XEXP (note, 0)); + /* A REG_EH_REGION note transfering control can only ever come + from i3 and it has to stay there. */ + if (lp_nr > 0) + { + gcc_assert (from_insn
[PATCH] openmp: Handle unified address memory.
This patch adds enough support for "requires unified_address" to make the sollve_vv testcases pass. It implements unified_address as a synonym of unified_shared_memory, which is both valid and the only way I know of to unify addresses with Cuda (could be wrong). This patch should be applied on to of the previous patch set for USM. OK for stage 1? I'll apply it to OG11 shortly. Andrewopenmp: unified_address support This makes "requires unified_address" work by making it eqivalent to "requires unified_shared_memory". This is more than is strictly necessary, but should be standard compliant. gcc/c/ChangeLog: * c-parser.c (c_parser_omp_requires): Check requires unified_address for conflict with -foffload-memory=shared. gcc/cp/ChangeLog: * parser.c (cp_parser_omp_requires): Check requires unified_address for conflict with -foffload-memory=shared. gcc/fortran/ChangeLog: * openmp.c (gfc_match_omp_requires): Check requires unified_address for conflict with -foffload-memory=shared. gcc/ChangeLog: * omp-low.c: Do USM transformations for "unified_address". gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-4.c: New test. * gfortran.dg/gomp/usm-4.f90: New test. diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 12408770193..9a3d0cb8cea 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -22531,18 +22531,27 @@ c_parser_omp_requires (c_parser *parser) enum omp_requires this_req = (enum omp_requires) 0; if (!strcmp (p, "unified_address")) - this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + { + this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_address is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "unified_shared_memory")) - { - this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; - - if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED - && flag_offload_memory != OFFLOAD_MEMORY_NONE) - error_at (cloc, - "unified_shared_memory is incompatible with the " - "selected -foffload-memory option"); - flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; - } + { + this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_shared_memory is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS; else if (!strcmp (p, "reverse_offload")) diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index fd9f62f4543..3a9ea272f10 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -46406,18 +46406,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok) enum omp_requires this_req = (enum omp_requires) 0; if (!strcmp (p, "unified_address")) - this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + { + this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_address is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "unified_shared_memory")) - { - this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; - - if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED - && flag_offload_memory != OFFLOAD_MEMORY_NONE) - error_at (cloc, - "unified_shared_memory is incompatible with the " - "selected -foffload-memory option"); - flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; - } + { + this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "unified_shared_memory is incompatible with the " + "selected -foffload-memory option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators"))
[Patch] OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg
For omp parallel shared(array_desc_var) the shared-variable is passed to the generated function as argument - and replaced by a DECL_VALUE_EXPR inside the parallel region. If inside the parallel region, a omp target data has_device_addr(array_descr_var) is used, the latter generates a omp_arr->array_descr_var = _descr_var.data; ... tmp_desc = array_descr_var tmp_desc.data = omp_o->array_descr_var that is: 'tmp_desc' gets assigned the original descriptor and only the data components is updated. However, if that's inside the parallel region, not 'array_descr_var' has to be used – but the value expression ('omp_i->array_descr_var'). Fixed by searching the variable used in use_device_{addr,ptr} in the outer OpenMP context – and then checking for a DECL_VALUE_EXPR. OK? Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg For array-descriptor vars, the descriptor is assigned to a temporary. However, this failed when the clause's argument was in turn in a data-sharing clause as the outer context's VALUE_EXPR wasn't used. gcc/ChangeLog: * omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list item that is in an outer data-sharing clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_addr-5.f90: New test. gcc/omp-low.cc | 22 ++-- .../libgomp.fortran/use_device_addr-5.f90 | 143 + 2 files changed, 156 insertions(+), 9 deletions(-) diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index bf5779b6543..6e387fd9a61 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -13656,26 +13656,30 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) new_var = lookup_decl (var, ctx); new_var = DECL_VALUE_EXPR (new_var); tree v = new_var; + tree v2 = var; + if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_PTR + || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_ADDR) + { + v2 = maybe_lookup_decl_in_outer_ctx (var, ctx); + if (DECL_HAS_VALUE_EXPR_P (v2)) + v2 = DECL_VALUE_EXPR (v2); + } if (is_ref) { - var = build_fold_indirect_ref (var); - gimplify_expr (, _body, NULL, is_gimple_val, - fb_rvalue); - v = create_tmp_var_raw (TREE_TYPE (var), get_name (var)); + v2 = build_fold_indirect_ref (v2); + v = create_tmp_var_raw (TREE_TYPE (v2), get_name (var)); gimple_add_tmp_var (v); TREE_ADDRESSABLE (v) = 1; - gimple_seq_add_stmt (_body, - gimple_build_assign (v, var)); + gimplify_assign (v, v2, _body); tree rhs = build_fold_addr_expr (v); gimple_seq_add_stmt (_body, gimple_build_assign (new_var, rhs)); } else - gimple_seq_add_stmt (_body, - gimple_build_assign (new_var, var)); + gimplify_assign (new_var, v2, _body); - tree v2 = lang_hooks.decls.omp_array_data (unshare_expr (v), false); + v2 = lang_hooks.decls.omp_array_data (unshare_expr (v), false); gcc_assert (v2); gimplify_expr (, _body, NULL, is_gimple_val, fb_rvalue); gimple_seq_add_stmt (_body, diff --git a/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90 b/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90 new file mode 100644 index 000..1def70a1bc0 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90 @@ -0,0 +1,143 @@ +program main + use omp_lib + implicit none + integer, allocatable :: aaa(:,:,:) + integer :: i + + allocate (aaa(-4:10,-3:8,2)) + aaa(:,:,:) = reshape ([(i, i = 1, size(aaa))], shape(aaa)) + + do i = 0, omp_get_num_devices() +!$omp target data map(to: aaa) + call test_addr (aaa, i) + call test_ptr (aaa, i) +!$omp end target data + end do + deallocate (aaa) + +contains + + subroutine test_addr (, dev) +use iso_c_binding +integer, target, allocatable :: (:,:,:), (:,:,:) +integer, value :: dev +integer :: i +type(c_ptr) :: ptr +logical :: is_shared + +is_shared = .false. +!$omp target device(dev) map(to: is_shared) + is_shared = .true. +!$omp end target + +allocate ((-4:10,-3:8,2)) +(:,:,:) = reshape ([(-i, i = 1, size())], shape()) +!$omp target enter data map(to: ) device(dev) +if (any (lbound () /= [-4, -3, 1])) error stop 1 +if (any (shape () /= [15, 12, 2])) error stop 2 +if (any (lbound () /= [-4, -3, 1])) error stop 3 +if (any (shape () /= [15, 12, 2])) error stop 4 +if (any ( /= -)) error stop 5 +if (any ( /= reshape ([(i, i = 1, size())], shape( & + error stop 6 + +!$omp parallel do shared(, ) +do i = 1,1 + if (any (lbound () /= [-4, -3, 1])) error stop 5 +
Re: [PATCH] libstdc++: Use LTLIBICONV when linking libstdc++.so [PR93602]
Pushed to trunk now. On Wed, 13 Apr 2022 at 15:24, Jonathan Wakely via Libstdc++ wrote: > > Tested x86_64-linux, without libiconv installed, with libiconv installed, > with libiconv installed but using an in-tree libiconv, with libiconv.a > installed and using --with-libiconv-type=static, and with libiconv.so > installed and using --without-libiconv-prefix (which still fails). > > I'm not entirely happy about the fact that libtool's LTLIBICONV adds an > rpath to libstdc++.so, but that can be avoided (as documented by this > patch) and I don't really see a better solution. Another option would be > to use -l:libiconv.a if configure defines LTLIBICONV to non-empty and > the linker supports it, which would *force* the use of a static lib. But > that seems unnecessarily hostile; not all users will dislike the rpath > solution. The proposed patch makes it Just Work™ for users who (for > whatever reason) have installed libiconv, while also allowing them to do > something more sensible if they care enough to do so. > > Thoughts? > > -- >8 -- > > This fixes missing libiconv symbols when libstdc++ is built on a system > that has libiconv installed. If the libiconv headers are found then > libstdc++ depends on libiconv_open etc instead of libc's iconv_open. But > without this fix libstdc++ is not linked to the libiconv library that > provides the definitions of those symbols. > > As discussed in PR 93602 this changed means that libstdc++.so.6 might > have an rpath pointing to the location of the libiconv.so library. If > that is not desired, then GCC must be configured to link to a static > libiconv.a instead, using either --with-libiconv-type=static or an > in-tree build of libiconv. > > libstdc++-v3/ChangeLog: > > PR libstdc++/93602 > * doc/xml/manual/prerequisites.xml: Document libiconv > workarounds. > * doc/html/manual/setup.html: Regenerate. > * src/Makefile.am (CXXLINK): Add $(LTLIBICONV). > * src/Makefile.in: Regenerate. > --- > diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml > b/libstdc++-v3/doc/xml/manual/prerequisites.xml > index 22e90a7e79d..8799487c821 100644 > --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml > +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml > @@ -48,6 +48,56 @@ > > linux > > + > + > + The 'gnu' locale model makes use of iconv > + for character set conversions. The relevant functions are provided > + by Glibc and so are always available, however they can also be > + provided by the separate GNU libiconv library. If GNU libiconv is > + found when GCC is built (e.g., because its headers are installed > + in /usr/local/include) > + then the libstdc++.so.6 library will have a > + run-time dependency on libiconv.so.2. > + If you do not want that run-time dependency then you should do > + one of the following: > + > + > + > + > + Uninstall the libiconv headers before building GCC. > + Glibc already provides iconv so you should > + not need libiconv anyway. > + > + > + > + > +linkend="https://www.gnu.org/software/libiconv/#downloading;> > + Download the libiconv sources and extract them into the > + top level of the GCC source tree, e.g., > + > + > +wget https://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.16.tar.gz > +tar xf libiconv-1.16.tar.gz > +ln -s libiconv-1.16 libiconv > + > + > + This will build libiconv as part of building GCC and link to > + it statically, so there is no libiconv.so.2 > + dependency. > + > + > + > + > + Configure GCC with --with-libiconv-type=static. > + This requires the static libiconv.a > library, > + which is not installed by default. You might need to reinstall > + libiconv using the --enable-static configure > + option to get the static library. > + > + > + > + > + > > > If GCC 3.1.0 or later on is being used on GNU/Linux, an attempt > diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am > index 18f57632c3d..9c3f4aca655 100644 > --- a/libstdc++-v3/src/Makefile.am > +++ b/libstdc++-v3/src/Makefile.am > @@ -278,7 +278,9 @@ CXXLINK = \ > $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \ > --mode=link $(CXX) \ > $(VTV_CXXLINKFLAGS) \ > - $(OPT_LDFLAGS) $(SECTION_LDFLAGS) $(AM_CXXFLAGS) $(LTLDFLAGS) -o $@ > + $(OPT_LDFLAGS) $(SECTION_LDFLAGS) $(AM_CXXFLAGS) \ > + $(LTLDFLAGS) $(LTLIBICONV) \ > + -o $@ > > # Symbol versioning for shared libraries. > if ENABLE_SYMVERS >
[committed] libstdc++: Fix macro checked by test
Tested x86_64-linux, pushed to trunk. -- >8 -- The macro being tested here is wrong, but just happens to have the same value as the one supposed to be tests. libstdc++-v3/ChangeLog: * testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc: Check correct feature test macro. --- .../basic_string_view/operations/copy/char/constexpr.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc b/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc index 28f8ae845c2..2705098fb76 100644 --- a/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc +++ b/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc @@ -22,7 +22,7 @@ #ifndef __cpp_lib_constexpr_string_view # error "Feature test macro for constexpr copy is missing in " -#elif __cpp_lib_constexpr_iterator < 201811L +#elif __cpp_lib_constexpr_string_view < 201811L # error "Feature test macro for constexpr copy has wrong value in " #endif -- 2.34.1
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
> On Wed, Apr 20, 2022 at 01:47:43PM +0200, Martin Jambor wrote: > > Hi, > > > > On Wed, Apr 20 2022, Jan Hubicka via Gcc-patches wrote: > > >> On Wed, 20 Apr 2022, Jakub Jelinek wrote: > > > > [...] > > > > >> > > > >> >if ((flag_openacc || flag_openmp) > > >> >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES > > >> > (decl))) > > >> > --- gcc/cgraphclones.cc.jj 2022-01-18 11:58:58.948991114 +0100 > > >> > +++ gcc/cgraphclones.cc2022-04-19 13:38:43.594262397 +0200 > > >> > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl > > >> >new_node->versionable = versionable; > > >> >new_node->can_change_signature = can_change_signature; > > >> >new_node->redefined_extern_inline = redefined_extern_inline; > > >> > + new_node->semantic_interposition = semantic_interposition; > > > > > > This indeed makes sense to me. > > > > but that means theat create_clone (and therefore also > > create_virtual_clone) now creates nodes which are both local and > > potentially interposable... is that what we want? (Does the local flag > > make the interposition flag meaningless in that case?) > > Usually set_new_clone_decl_and_node_flags is called afterwards and that > makes both the decl local and clears node->semantic_interposition. > The above is just for the case when that isn't done. We also simply ignore semantic_interposition flag on everything local. But indeed perhaps for consistency purposes we should force it to false whenever externally_visible is false. But more sanity checkers only in stage1 :) Honza > > Jakub >
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
On Wed, Apr 20, 2022 at 01:47:43PM +0200, Martin Jambor wrote: > Hi, > > On Wed, Apr 20 2022, Jan Hubicka via Gcc-patches wrote: > >> On Wed, 20 Apr 2022, Jakub Jelinek wrote: > > [...] > > >> > > >> >if ((flag_openacc || flag_openmp) > >> >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES > >> > (decl))) > >> > --- gcc/cgraphclones.cc.jj 2022-01-18 11:58:58.948991114 +0100 > >> > +++ gcc/cgraphclones.cc 2022-04-19 13:38:43.594262397 +0200 > >> > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl > >> >new_node->versionable = versionable; > >> >new_node->can_change_signature = can_change_signature; > >> >new_node->redefined_extern_inline = redefined_extern_inline; > >> > + new_node->semantic_interposition = semantic_interposition; > > > > This indeed makes sense to me. > > but that means theat create_clone (and therefore also > create_virtual_clone) now creates nodes which are both local and > potentially interposable... is that what we want? (Does the local flag > make the interposition flag meaningless in that case?) Usually set_new_clone_decl_and_node_flags is called afterwards and that makes both the decl local and clears node->semantic_interposition. The above is just for the case when that isn't done. Jakub
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
Hi, On Wed, Apr 20 2022, Jan Hubicka via Gcc-patches wrote: >> On Wed, 20 Apr 2022, Jakub Jelinek wrote: [...] >> > >> >if ((flag_openacc || flag_openmp) >> >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) >> > --- gcc/cgraphclones.cc.jj 2022-01-18 11:58:58.948991114 +0100 >> > +++ gcc/cgraphclones.cc2022-04-19 13:38:43.594262397 +0200 >> > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl >> >new_node->versionable = versionable; >> >new_node->can_change_signature = can_change_signature; >> >new_node->redefined_extern_inline = redefined_extern_inline; >> > + new_node->semantic_interposition = semantic_interposition; > > This indeed makes sense to me. but that means theat create_clone (and therefore also create_virtual_clone) now creates nodes which are both local and potentially interposable... is that what we want? (Does the local flag make the interposition flag meaningless in that case?) Martin
Re: [PATCH] Add HAVE_DEBUGINFOD_SUPPORT to built-in features.
On 4/20/22 13:30, Martin Liška wrote: The change adds debuginfod to ./perf -vv: ... debuginfod: [ OFF ] # HAVE_DEBUGINFOD_SUPPORT ... Signed-off-by: Martin Liska --- tools/perf/builtin-version.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 9cd074a3d825..a71f491224da 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -65,6 +65,7 @@ static void library_status(void) #endif STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table); STATUS(HAVE_LIBBFD_SUPPORT, libbfd); + STATUS(HAVE_DEBUGINFOD_SUPPORT, debuginfod); STATUS(HAVE_LIBELF_SUPPORT, libelf); STATUS(HAVE_LIBNUMA_SUPPORT, libnuma); STATUS(HAVE_LIBNUMA_SUPPORT, numa_num_possible_cpus); Please ignore the thread, it belongs to perf ML ;) Martin
[PATCH] Add HAVE_DEBUGINFOD_SUPPORT to built-in features.
The change adds debuginfod to ./perf -vv: ... debuginfod: [ OFF ] # HAVE_DEBUGINFOD_SUPPORT ... Signed-off-by: Martin Liska --- tools/perf/builtin-version.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 9cd074a3d825..a71f491224da 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -65,6 +65,7 @@ static void library_status(void) #endif STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table); STATUS(HAVE_LIBBFD_SUPPORT, libbfd); + STATUS(HAVE_DEBUGINFOD_SUPPORT, debuginfod); STATUS(HAVE_LIBELF_SUPPORT, libelf); STATUS(HAVE_LIBNUMA_SUPPORT, libnuma); STATUS(HAVE_LIBNUMA_SUPPORT, numa_num_possible_cpus); -- 2.35.3
Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]
On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao wrote: > > > > > On Apr 14, 2022, at 1:53 AM, Richard Biener > > wrote: > > > > On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao wrote: > >> > >> Hi, Richard, > >> > >> Thanks a lot for taking a look at this issue (and Sorry that I haven’t > >> fixed this one yet, I was distracted by other tasks then just forgot this > >> one….) > >> > >>> On Apr 13, 2022, at 3:41 AM, Richard Biener > >>> wrote: > >>> > >>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches > >>> wrote: > > > > > On Feb 15, 2022, at 3:58 AM, Jakub Jelinek wrote: > > > > Hi! > > > > For IBM double double I've added in PR95450 and PR99648 verification > > that > > when we at the tree/GIMPLE or RTL level interpret target bytes as a > > REAL_CST > > or CONST_DOUBLE constant, we try to encode it back to target bytes and > > verify it is the same. > > This is because our real.c support isn't able to represent all valid > > values > > of IBM double double which has variable precision. > > In PR104522, it has been noted that we have similar problem with the > > Intel/Motorola extended XFmode formats, our internal representation > > isn't > > able to record pseudo denormals, pseudo infinities, pseudo NaNs and > > unnormal > > values. > > So, the following patch is an attempt to extend that verification to all > > floats. > > Unfortunately, it wasn't that straightforward, because the > > __builtin_clear_padding code exactly for the XFmode long doubles needs > > to > > discover what bits are padding and does that by interpreting memory of > > all 1s. That is actually a valid supported value, a qNaN with negative > > sign with all mantissa bits set, but the verification includes also the > > padding bits (exactly what __builtin_clear_padding wants to figure out) > > and so fails the comparison check and so we ICE. > > The patch fixes that case by moving that verification from > > native_interpret_real to its caller, so that clear_padding_type can > > call native_interpret_real and avoid that extra check. > > > > With this, the only thing that regresses in the testsuite is > > +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times > > long\\t-16843010 5 > > because it decides to use a pattern that has non-zero bits in the > > padding > > bits of the long double, so the simplify-rtx.cc change prevents folding > > a SUBREG into a constant. We emit (the testcase is -O0 but we emit > > worse > > code at all opt levels) something like: > > movabsq $-72340172838076674, %rax > > movabsq $-72340172838076674, %rdx > > movq%rax, -48(%rbp) > > movq%rdx, -40(%rbp) > > fldt-48(%rbp) > > fstpt -32(%rbp) > > instead of > > fldt.LC2(%rip) > > fstpt -32(%rbp) > > ... > > .LC2: > > .long -16843010 > > .long -16843010 > > .long 65278 > > .long 0 > > Note, neither of those sequences actually stores the padding bits, fstpt > > simply doesn't touch them. > > For vars with clear_padding_real_needs_padding_p types that are > > allocated > > to memory at expansion time, I'd say much better would be to do the > > stores > > using integral modes rather than XFmode, so do that: > > movabsq $-72340172838076674, %rax > > movq%rax, -32(%rbp) > > movq%rax, -24(%rbp) > > directly. That is the only way to ensure the padding bits are > > initialized > > (or expand __builtin_clear_padding, but then you initialize separately > > the > > value bits and padding bits). > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, though as > > mentioned > > above, the gcc.target/i386/auto-init-4.c case is unresolved. > > Thanks, I will try to fix this testing case in a later patch. > >>> > >>> I've looked at this FAIL now and really wonder whether "pattern init" as > >>> implemented makes any sense for non-integral types. > >>> We end up with > >>> initializing a register (SSA name) with > >>> > >>> VIEW_CONVERT_EXPR(0xfefefefefefefefefefefefefefefefe) > >>> > >>> as we go building a TImode constant (we verified we have a TImode SET!) > >>> but then > >>> > >>> /* Pun the LHS to make sure its type has constant size > >>>unless it is an SSA name where that's already known. */ > >>> if (TREE_CODE (lhs) != SSA_NAME) > >>> lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs); > >>> else > >>> init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init); > >>> ... > >>> expand_assignment (lhs, init, false); > >>> > >>> and generally registers do not have any padding. This weird expansion > >>> then causes us to spill the TImode constant
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
> > The cgraph.cc change was what I actually needed for the fix, the > cgraphclones.cc was only because I've noticed that it constructs a new > node (so is initialized to whatever random flag_semantic_interposition is > right now) and initializing it to what it is cloned from made more sense. OK, thanks. It only is needed for nodes which definition flag and public linkage, so should not need to copy in cgraph clones and there are other places that creates new nodes (late function etc). I will move the logic to visibility pass and to add_new_function and also kill the constructor. I originally intended to set it at the consturction time but forget to think of the frotnends changing opt_for_fn later from the optimization attribute. This also makes me wonder if C++ FE updates the implicit aliases once they have been created... Honza > > Jakub >
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
On Wed, Apr 20, 2022 at 11:06:12AM +0200, Jan Hubicka wrote: > > On Wed, Apr 20, 2022 at 10:45:53AM +0200, Jan Hubicka wrote: > > > So this change should be unnecessary unless there are nodes that are > > > missing finalization stage. It also is not good enough since frontends > > > may change opt_for_fn between node creation and finalization of > > > compilation unit (so even after cgraph_finalize unforutnately, we had > > > another bug about that). > > > > > > The PR was about implicit C++ alias. So the problem is that aliases > > > bypass finalization becuase they are produced by > > > cgraph_node::create_alias that sets definition flag to true. > > > > Note, I've already committed the patch as Richi acked it. > > So, can we move that > > node->semantic_interposition = opt_for_fn (decl, > > flag_semantic_interposition); > > from cgraph_node::create to cgraph_node::create_alias? > > I think it would be easiest to move it to the visibility pass > (after all it is about visibilities and all earlier uses of the flag > are wrong since frontend is changing it at any time until unit is fully > built). I will prepare patch tonight or tomorrow. > > Also thinking about the copying in cgraph_clone, it would make snese > only if we produce clones with public linkage. Do we ever do that? The cgraph.cc change was what I actually needed for the fix, the cgraphclones.cc was only because I've noticed that it constructs a new node (so is initialized to whatever random flag_semantic_interposition is right now) and initializing it to what it is cloned from made more sense. Jakub
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
> On Wed, Apr 20, 2022 at 10:45:53AM +0200, Jan Hubicka wrote: > > So this change should be unnecessary unless there are nodes that are > > missing finalization stage. It also is not good enough since frontends > > may change opt_for_fn between node creation and finalization of > > compilation unit (so even after cgraph_finalize unforutnately, we had > > another bug about that). > > > > The PR was about implicit C++ alias. So the problem is that aliases > > bypass finalization becuase they are produced by > > cgraph_node::create_alias that sets definition flag to true. > > Note, I've already committed the patch as Richi acked it. > So, can we move that > node->semantic_interposition = opt_for_fn (decl, > flag_semantic_interposition); > from cgraph_node::create to cgraph_node::create_alias? I think it would be easiest to move it to the visibility pass (after all it is about visibilities and all earlier uses of the flag are wrong since frontend is changing it at any time until unit is fully built). I will prepare patch tonight or tomorrow. Also thinking about the copying in cgraph_clone, it would make snese only if we produce clones with public linkage. Do we ever do that? Honza > > > I guess it would be most consistent to give up on having the flag up to > > date during cgraph construction (i.e. from finalization time down) and > > compute it during the cgraph_finalize_complation_unit. I will look into > > that. > > Jakub >
Re: [PATCH] gcov-profile: Allow negavive counts of indirect calls [PR105282]
On 4/20/22 10:55, Jan Hubicka via Gcc-patches wrote: I tink we can just drop the sanity check completely. In general the profile data may be corrupted and each use of it should be guarded to not explode on such situation. Makes sense to me. I'm going to do it once stage1 opens. Cheers, Martin
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
On Wed, Apr 20, 2022 at 10:45:53AM +0200, Jan Hubicka wrote: > So this change should be unnecessary unless there are nodes that are > missing finalization stage. It also is not good enough since frontends > may change opt_for_fn between node creation and finalization of > compilation unit (so even after cgraph_finalize unforutnately, we had > another bug about that). > > The PR was about implicit C++ alias. So the problem is that aliases > bypass finalization becuase they are produced by > cgraph_node::create_alias that sets definition flag to true. Note, I've already committed the patch as Richi acked it. So, can we move that node->semantic_interposition = opt_for_fn (decl, flag_semantic_interposition); from cgraph_node::create to cgraph_node::create_alias? > I guess it would be most consistent to give up on having the flag up to > date during cgraph construction (i.e. from finalization time down) and > compute it during the cgraph_finalize_complation_unit. I will look into > that. Jakub
Re: [PATCH] gcov-profile: Allow negavive counts of indirect calls [PR105282]
> From: Sergei Trofimovich > > TOPN metrics are histograms that contain overall count and per-bucket > count. Overall count can be nevative when two profiles merge and some > of per-bucket metrics are dropped. > > Noticed as an ICE on python PGO build where gcc crashes as: > > during IPA pass: modref > a.c:36:1: ICE: in stream_out_histogram_value, at value-prof.cc:340 >36 | } > | ^ > stream_out_histogram_value(output_block*, histogram_value_t*) > gcc/value-prof.cc:340 > > gcc/ChangeLog: > > PR gcov-profile/105282 > * value-prof.cc (stream_out_histogram_value): Allow negavive counts > on HIST_TYPE_INDIR_CALL. > --- > gcc/value-prof.cc | 4 > 1 file changed, 4 insertions(+) > > diff --git a/gcc/value-prof.cc b/gcc/value-prof.cc > index 9785c7a03ea..4927d119aa0 100644 > --- a/gcc/value-prof.cc > +++ b/gcc/value-prof.cc > @@ -319,40 +319,44 @@ stream_out_histogram_value (struct output_block *ob, > histogram_value hist) >streamer_write_bitpack (); >switch (hist->type) > { > case HIST_TYPE_INTERVAL: >streamer_write_hwi (ob, hist->hdata.intvl.int_start); >streamer_write_uhwi (ob, hist->hdata.intvl.steps); >break; > default: >break; > } >for (i = 0; i < hist->n_counters; i++) > { >/* When user uses an unsigned type with a big value, constant converted >to gcov_type (a signed type) can be negative. */ >gcov_type value = hist->hvalue.counters[i]; >if (hist->type == HIST_TYPE_TOPN_VALUES > || hist->type == HIST_TYPE_IOR) > /* Note that the IOR counter tracks pointer values and these can have > sign bit set. */ > ; > + else if (hist->type == HIST_TYPE_INDIR_CALL && i == 0) > + /* 'all' counter overflow is stored as a negative value. Individual > +counters and values are expected to be non-negative. */ > + ; I tink we can just drop the sanity check completely. In general the profile data may be corrupted and each use of it should be guarded to not explode on such situation. I added the check here long time ago while implementing the early version of profile streaming patch. At that time some bugs was causing counts to be negative due to weird overflows in the logic normalizing profiles from different object files to same number of executions. Honza >else > gcc_assert (value >= 0); > >streamer_write_gcov_count (ob, value); > } >if (hist->hvalue.next) > stream_out_histogram_value (ob, hist->hvalue.next); > } > > /* Dump information about HIST to DUMP_FILE. */ > > void > stream_in_histogram_value (class lto_input_block *ib, gimple *stmt) > { >enum hist_type type; >unsigned int ncounters = 0; >struct bitpack_d bp; >unsigned int i; >histogram_value new_val; >bool next; > -- > 2.35.1 >
Re: [PATCH][v2] tree-optimization/104912 - ensure cost model is checked first
> The following makes sure that when we build the versioning condition > for vectorization including the cost model check, we check for the > cost model and branch over other versioning checks. That is what > the cost modeling assumes, since the cost model check is the only > one accounted for in the scalar outside cost. Currently we emit > all checks as straight-line code combined with bitwise ops which > can result in surprising ordering of checks in the final assembly. > > Since loop_version accepts only a single versioning condition > the splitting is done after the fact. > > The result is a 1.5% speedup of 416.gamess on x86_64 when compiling > with -Ofast and tuning for generic or skylake. That's not enough > to recover from the slowdown when vectorizing but it now cuts off > the expensive alias versioning test. > > This is an update to the previously posted patch splitting the > probability between the two branches as outlined in > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592597.html > > I've re-bootstrapped and tested this on x86_64-unknown-linux-gnu. > > Honza - is the approach to splitting the probabilities sensible? > This fixes a piece of a P1 regression. > > Thanks, > Richard. > > 2022-03-21 Richard Biener > > PR tree-optimization/104912 > * tree-vect-loop-manip.cc (vect_loop_versioning): Split > the cost model check to a separate BB to make sure it is > checked first and not combined with other version checks. > --- > gcc/tree-vect-loop-manip.cc | 60 +++-- > 1 file changed, 57 insertions(+), 3 deletions(-) > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index 63fb6f669a0..e4381eb7079 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -3445,13 +3445,34 @@ vect_loop_versioning (loop_vec_info loop_vinfo, > cond_expr = expr; > } > > + tree cost_name = NULL_TREE; > + profile_probability prob2 = profile_probability::uninitialized (); > + if (cond_expr > + && !integer_truep (cond_expr) > + && (version_niter > + || version_align > + || version_alias > + || version_simd_if_cond)) I assume that this condition... > > + /* Split the cost model check off to a separate BB. Costing assumes > + this is the only thing we perform when we enter the scalar loop > + from a failed cost decision. */ > + if (cost_name && TREE_CODE (cost_name) == SSA_NAME) is if and only if this condition (otherwise prob2 would get uninitialized or lost) > +{ > + gimple *def = SSA_NAME_DEF_STMT (cost_name); > + /* All uses of the cost check are 'true' after the check we > + are going to insert. */ > + replace_uses_by (cost_name, boolean_true_node); > + /* And we're going to build the new single use of it. */ > + gcond *cond = gimple_build_cond (NE_EXPR, cost_name, > boolean_false_node, > +NULL_TREE, NULL_TREE); > + edge e = split_block (gimple_bb (def), def); > + gimple_stmt_iterator gsi = gsi_for_stmt (def); > + gsi_insert_after (, cond, GSI_NEW_STMT); > + edge true_e, false_e; > + extract_true_false_edges_from_block (e->dest, _e, _e); > + e->flags &= ~EDGE_FALLTHRU; > + e->flags |= EDGE_TRUE_VALUE; > + edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE); > + e->probability = prob2; > + e2->probability = prob2.invert (); So this looks fine to me. Honza > + set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src); > + auto_vec adj; > + for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest); > +son; > +son = next_dom_son (CDI_DOMINATORS, son)) > + if (EDGE_COUNT (son->preds) > 1) > + adj.safe_push (son); > + for (auto son : adj) > + set_immediate_dominator (CDI_DOMINATORS, son, e->src); > +} > + >if (version_niter) > { >/* The versioned loop could be infinite, we need to clear existing > -- > 2.34.1
[PATCH] arm: Restrict support of vectors of boolean immediates (PR target/104662)
This simple patch avoids the ICE described in the PR: internal compiler error: in simd_valid_immediate, at config/arm/arm.cc:12866 with an early exit from simd_valid_immediate if we are trying to handle a vector of booleans and MVE is not enabled. We still get an ICE when compiling the existing gcc.dg/rtl/arm/mve-vxbi.c without -march=armv8.1-m.main+mve: error: unrecognizable insn: (insn 7 5 8 2 (set (reg:V4BI 114) (const_vector:V4BI [ (const_int 1 [0x1]) (const_int 0 [0]) repeated x2 (const_int 1 [0x1]) ])) -1 (nil)) during RTL pass: ira but there's little we can do since the testcase explicitly creates vectors of booleans which do need MVE. That is the reason why I do not add a testcase. 2022-04-19 Christophe Lyon PR target/104662 * config/arm/arm.cc (simd_valid_immediate): Exit when input is a vector of booleans and MVE is not enabled. --- gcc/config/arm/arm.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 14e2fdfeafa..69a18c2f157 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -12849,6 +12849,9 @@ simd_valid_immediate (rtx op, machine_mode mode, int inverse, || n_elts * innersize != 16)) return -1; + if (!TARGET_HAVE_MVE && GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL) +return -1; + /* Vectors of float constants. */ if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) { -- 2.25.1
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
> On Wed, 20 Apr 2022, Jakub Jelinek wrote: > > > Hi! > > > > cgraph_node has a semantic_interposition flag which should mirror > > opt_for_fn (decl, flag_semantic_interposition). But it actually is > > initialized not from that, but from flag_semantic_interposition in the > > explicit symtab_node (symtab_type t) > > : type (t), resolution (LDPR_UNKNOWN), definition (false), alias > > (false), > > ... > > semantic_interposition (flag_semantic_interposition), > > ... > > x_comdat_group (NULL_TREE), x_section (NULL) > > {} > > ctor. I think that might be fine for varpool nodes, but since > > flag_semantic_interposition is now implied from -Ofast it isn't correct > > for cgraph nodes, unless we guarantee that cgraph node for a particular > > function decl is always created while that function is > > current_function_decl. That is often the case, but not always as the > > following function shows. Normally cgraph_nodes with function bodies are first created, later finalized and then analyzed. We copy over the semantic_interposition flag from opt_for_fn to cgraph_node in finalize_function. The ctor there is indeed only for varpool nodes since these do not have their opt_for_var. > > --- gcc/cgraph.cc.jj2022-02-04 14:36:54.069618372 +0100 > > +++ gcc/cgraph.cc 2022-04-19 13:38:06.223782974 +0200 > > @@ -507,6 +507,7 @@ cgraph_node::create (tree decl) > >gcc_assert (TREE_CODE (decl) == FUNCTION_DECL); > > > >node->decl = decl; > > + node->semantic_interposition = opt_for_fn (decl, > > flag_semantic_interposition); So this change should be unnecessary unless there are nodes that are missing finalization stage. It also is not good enough since frontends may change opt_for_fn between node creation and finalization of compilation unit (so even after cgraph_finalize unforutnately, we had another bug about that). The PR was about implicit C++ alias. So the problem is that aliases bypass finalization becuase they are produced by cgraph_node::create_alias that sets definition flag to true. I guess it would be most consistent to give up on having the flag up to date during cgraph construction (i.e. from finalization time down) and compute it during the cgraph_finalize_complation_unit. I will look into that. > > > >if ((flag_openacc || flag_openmp) > >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) > > --- gcc/cgraphclones.cc.jj 2022-01-18 11:58:58.948991114 +0100 > > +++ gcc/cgraphclones.cc 2022-04-19 13:38:43.594262397 +0200 > > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl > >new_node->versionable = versionable; > >new_node->can_change_signature = can_change_signature; > >new_node->redefined_extern_inline = redefined_extern_inline; > > + new_node->semantic_interposition = semantic_interposition; This indeed makes sense to me. Honza > >new_node->tm_may_enter_irr = tm_may_enter_irr; > >new_node->externally_visible = false; > >new_node->no_reorder = no_reorder; > > --- gcc/testsuite/g++.dg/opt/pr105306.C.jj 2022-04-19 13:42:33.908054114 > > +0200 > > +++ gcc/testsuite/g++.dg/opt/pr105306.C 2022-04-19 13:42:08.859403045 > > +0200 > > @@ -0,0 +1,13 @@ > > +// PR ipa/105306 > > +// { dg-do compile } > > +// { dg-options "-Ofast" } > > + > > +#pragma GCC optimize 0 > > +template void foo (T); > > +struct B { ~B () {} }; > > +struct C { B f; }; > > +template struct E { > > + void bar () { foo (g); } > > + C g; > > +}; > > +template class E; > > > > Jakub > > > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
[PATCH] tree-optimization/105312 - fix ISEL VCOND expansion
The following aligns ISEL VEC_COND_EXPR expansion using VCOND with the optab query done by vector lowering. Instead of only allowing the signed optab to provide EQ/NE compares we allow both here though since there seems to be no documented canonicalization. Bootstrap and regtest running on x86_64-unknown-linux-gnu, I've cut neon boilerplate for the testcase but cannot test it (a cc1 cross makes it UNSUPPORTED), if I don't hear otherwise I'm going to push as-is after testing completed. Thanks, Richard. 2022-04-20 Richard Biener PR tree-optimization/105312 * gimple-isel.cc (gimple_expand_vec_cond_expr): Query both VCOND and VCONDU for EQ and NE. * gcc.target/arm/pr105312.c: New testcase. --- gcc/gimple-isel.cc | 8 gcc/testsuite/gcc.target/arm/pr105312.c | 23 +++ 2 files changed, 31 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/pr105312.c diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc index 3635585bf45..a8f7a0d25d0 100644 --- a/gcc/gimple-isel.cc +++ b/gcc/gimple-isel.cc @@ -245,6 +245,14 @@ gimple_expand_vec_cond_expr (struct function *fun, gimple_stmt_iterator *gsi, GET_MODE_NUNITS (cmp_op_mode))); icode = get_vcond_icode (mode, cmp_op_mode, unsignedp); + /* Some targets do not have vcondeq and only vcond with NE/EQ + but not vcondu, so make sure to also try vcond here as + vcond_icode_p would canonicalize the optab query to. */ + if (icode == CODE_FOR_nothing + && (tcode == NE_EXPR || tcode == EQ_EXPR) + && ((icode = get_vcond_icode (mode, cmp_op_mode, !unsignedp)) + != CODE_FOR_nothing)) +unsignedp = !unsignedp; if (icode == CODE_FOR_nothing) { if (tcode == LT_EXPR diff --git a/gcc/testsuite/gcc.target/arm/pr105312.c b/gcc/testsuite/gcc.target/arm/pr105312.c new file mode 100644 index 000..a02831bcbcf --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr105312.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options "-mcpu=cortex-a15" } */ +/* { dg-add-options arm_neon } */ + +typedef float stress_matrix_type_t; +typedef unsigned int size_t; +static void __attribute__((optimize("-O3"))) stress_matrix_xy_identity( + const size_t n, + stress_matrix_type_t a[restrict n][n], + stress_matrix_type_t b[restrict n][n], + stress_matrix_type_t r[restrict n][n]) +{ + register size_t i; + (void)a; + (void)b; + for (i = 0; i < n; i++) { + register size_t j; + for (j = 0; j < n; j++) + r[i][j] = (i == j) ? 1.0 : 0.0; + return; + } +} -- 2.34.1
Re: [x86_64 PATCH] PR middle-end/105135: Catch more cmov idioms in combine.
On Tue, Apr 19, 2022 at 1:58 PM Roger Sayle wrote: > > > This patch addresses PR middle-end/105135, a missed-optimization regression > affecting mainline. I agree with Jakub's comment that the middle-end > optimizations are sound, reducing basic blocks and conditional expressions > at the tree-level, but requiring backend's to recognize conditional move > instructions/idioms if/when beneficial. This patch introduces two new > define_insn_and_split in i386.md to recognize two additional cmove idioms. > > The first recognizes (PR105135's): > > int foo(int x, int y, int z) > { > return ((x < y) << 5) + z; > } > > and transforms (the 6 insns, 13 bytes): > > xorl%eax, %eax ;; 2 bytes > cmpl%esi, %edi ;; 2 bytes > setl%al ;; 3 bytes > sall$5, %eax;; 3 bytes > addl%edx, %eax ;; 2 bytes > ret ;; 1 byte > > into (the 4 insns, 9 bytes): > > cmpl%esi, %edi ;; 2 bytes > leal32(%rdx), %eax ;; 3 bytes > cmovge %edx, %eax ;; 3 bytes > ret ;; 1 byte > > > The second catches the very closely related (from PR 98865): > > int bar(int x, int y, int z) > { > return -(x < y) & z; > } > > and transforms the (6 insns, 12 bytes): > xorl%eax, %eax ;; 2 bytes > cmpl%esi, %edi ;; 2 bytes > setl%al ;; 3 bytes > negl%eax;; 2 bytes > andl%edx, %eax ;; 2 bytes > ret ;; 1 byte > > into (4 insns, 8 bytes): > xorl%eax, %eax ;; 2 bytes > cmpl%esi, %edi ;; 2 bytes > cmovl %edx, %eax ;; 3 bytes > ret ;; 1 byte > > They both have in common that they recognize a setcc followed by two > instructions, and replace them with one instruction and a cmov, which > is typically a performance win, but always a size win. Fine tuning > these decisions based on microarchitecture is much easier in the > backend, than the middle-end. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-04-19 Roger Sayle > > gcc/ChangeLog > PR target/105135 > * config/i386/i386.md (*xor_cmov): Transform setcc, negate > then and into mov $0, followed by a cmov. > (*lea_cmov): Transform setcc, ashift const then plus into > lea followed by cmov. > > gcc/testsuite/ChangeLog > PR target/105135 > * gcc.target/i386/cmov10.c: New test case. > * gcc.target/i386/cmov11.c: New test case. > * gcc.target/i386/pr105135.c: New test case. > > > Thanks in advance, > Roger +;; Transform setcc;negate;and into mov_zero;cmov +(define_insn_and_split "*xor_cmov" + [(set (match_operand:SWI248 0 "register_operand") +(and:SWI248 + (neg:SWI248 (match_operator:SWI248 1 "ix86_comparison_operator" +[(match_operand 2 "flags_reg_operand") + (const_int 0)])) + (match_operand:SWI248 3 "register_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_CMOVE && can_create_pseudo_p ()" Please use ix86_pre_reload_split instead of can_create_pseudo_p () here. + "#" + "&& 1" + [(set (match_dup 4) (const_int 0)) + (set (match_dup 0) +(if_then_else:SWI248 (match_op_dup 1 [(match_dup 2) (const_int 0)]) + (match_dup 3) (match_dup 4)))] +{ + operands[4] = gen_reg_rtx (mode); +}) Single line preparation statements should use double quotes instead of curly braces. See many examples in i386 .md files. +;; Transform setcc;ashift_const;plus into lea_const;cmov +(define_insn_and_split "*lea_cmov" + [(set (match_operand:SWI 0 "register_operand") +(plus:SWI (ashift:SWI (match_operator:SWI 1 "ix86_comparison_operator" +[(match_operand 2 "flags_reg_operand") + (const_int 0)]) + (match_operand:SWI 3 "const_int_operand")) + (match_operand:SWI 4 "register_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_CMOVE && can_create_pseudo_p ()" Same here, ix86_pre_reload_split should be used for define_insn_and_split (FYI, can_create_pseudo_p is still good for define_split where no instruction is defined). + "#" + "&& 1" + [(set (match_dup 5) (plus: (match_dup 4) (match_dup 6))) + (set (match_dup 0) +(if_then_else: (match_op_dup 1 [(match_dup 2) (const_int 0)]) +(match_dup 5) (match_dup 4)))] +{ + operands[5] = gen_reg_rtx (mode); + operands[6] = GEN_INT (1 << INTVAL (operands[3])); + if (mode != mode) +{ + operands[0] = gen_lowpart (mode, operands[0]); + operands[4] = gen_lowpart (mode, operands[4]); gen_lowpart is dangerous to use before reload. It can choke when integer mode SUBREG of e.g. FP mode register is passed here. So you have to
Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
On Wed, 20 Apr 2022, Jakub Jelinek wrote: > Hi! > > cgraph_node has a semantic_interposition flag which should mirror > opt_for_fn (decl, flag_semantic_interposition). But it actually is > initialized not from that, but from flag_semantic_interposition in the > explicit symtab_node (symtab_type t) > : type (t), resolution (LDPR_UNKNOWN), definition (false), alias (false), > ... > semantic_interposition (flag_semantic_interposition), > ... > x_comdat_group (NULL_TREE), x_section (NULL) > {} > ctor. I think that might be fine for varpool nodes, but since > flag_semantic_interposition is now implied from -Ofast it isn't correct > for cgraph nodes, unless we guarantee that cgraph node for a particular > function decl is always created while that function is > current_function_decl. That is often the case, but not always as the > following function shows. > Because symtab_node's ctor doesn't know for which decl the cgraph node > is being created, the following patch keeps that as is, but updates it from > opt_for_fn (decl, flag_semantic_interposition) when we know that, or for > clones copies that flag (often it is then overridden in > set_new_clone_decl_and_node_flags, but not always). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. > 2022-04-20 Jakub Jelinek > > PR ipa/105306 > * cgraph.cc (cgraph_node::create): Set node->semantic_interposition > to opt_for_fn (decl, flag_semantic_interposition). > * cgraphclones.cc (cgraph_node::create_clone): Copy over > semantic_interposition flag. > > * g++.dg/opt/pr105306.C: New test. > > --- gcc/cgraph.cc.jj 2022-02-04 14:36:54.069618372 +0100 > +++ gcc/cgraph.cc 2022-04-19 13:38:06.223782974 +0200 > @@ -507,6 +507,7 @@ cgraph_node::create (tree decl) >gcc_assert (TREE_CODE (decl) == FUNCTION_DECL); > >node->decl = decl; > + node->semantic_interposition = opt_for_fn (decl, > flag_semantic_interposition); > >if ((flag_openacc || flag_openmp) >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) > --- gcc/cgraphclones.cc.jj2022-01-18 11:58:58.948991114 +0100 > +++ gcc/cgraphclones.cc 2022-04-19 13:38:43.594262397 +0200 > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl >new_node->versionable = versionable; >new_node->can_change_signature = can_change_signature; >new_node->redefined_extern_inline = redefined_extern_inline; > + new_node->semantic_interposition = semantic_interposition; >new_node->tm_may_enter_irr = tm_may_enter_irr; >new_node->externally_visible = false; >new_node->no_reorder = no_reorder; > --- gcc/testsuite/g++.dg/opt/pr105306.C.jj2022-04-19 13:42:33.908054114 > +0200 > +++ gcc/testsuite/g++.dg/opt/pr105306.C 2022-04-19 13:42:08.859403045 > +0200 > @@ -0,0 +1,13 @@ > +// PR ipa/105306 > +// { dg-do compile } > +// { dg-options "-Ofast" } > + > +#pragma GCC optimize 0 > +template void foo (T); > +struct B { ~B () {} }; > +struct C { B f; }; > +template struct E { > + void bar () { foo (g); } > + C g; > +}; > +template class E; > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
Re: [PATCH] c++, coroutines: Account for overloaded promise return_value() [PR105301].
On Wed, Apr 20, 2022 at 4:19 AM Jason Merrill via Gcc-patches wrote: > > On 4/18/22 10:03, Iain Sandoe wrote: > > Whether it was intended or not, it is possible to define a coroutine promise > > with multiple return_value() methods [which need not even have the same > > type]. > > > > We were not accounting for this possibility in the check to see whether both > > return_value and return_void are specifier (which is prohibited by the > > standard). Fixed thus and provided an adjusted diagnostic for the case that > > multiple return_value() methods are present. > > > > tested on x86_64-darwin, OK for mainline? / Backports? (when?) > > thanks, > > Iain > > > > Signed-off-by: Iain Sandoe > > > > PR c++/105301 > > > > gcc/cp/ChangeLog: > > > > * coroutines.cc (coro_promise_type_found_p): Account for possible > > mutliple overloads of the promise return_value() method. > > > > gcc/testsuite/ChangeLog: > > > > * g++.dg/coroutines/pr105301.C: New test. > > --- > > gcc/cp/coroutines.cc | 10 - > > gcc/testsuite/g++.dg/coroutines/pr105301.C | 49 ++ > > 2 files changed, 57 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/coroutines/pr105301.C > > > > diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc > > index dcc2284171b..d2a765cac11 100644 > > --- a/gcc/cp/coroutines.cc > > +++ b/gcc/cp/coroutines.cc > > @@ -513,8 +513,14 @@ coro_promise_type_found_p (tree fndecl, location_t loc) > > coro_info->promise_type); > > inform (DECL_SOURCE_LOCATION (BASELINK_FUNCTIONS (has_ret_void)), > > "% declared here"); > > - inform (DECL_SOURCE_LOCATION (BASELINK_FUNCTIONS (has_ret_val)), > > - "% declared here"); > > + has_ret_val = BASELINK_FUNCTIONS (has_ret_val); > > + const char *message = "% declared here"; > > + if (TREE_CODE (has_ret_val) == OVERLOAD) > > + { > > + has_ret_val = OVL_FIRST (has_ret_val); > > + message = "% first declared here"; > > + } > > You could also use get_first_fn, but the patch is OK as is. I'm > inclined to leave backports in coroutines.cc to your discretion, you > probably have a better idea of how important they are. Likewise. Please wait until after the 11.3 release. Richard. > > + inform (DECL_SOURCE_LOCATION (has_ret_val), message); > > coro_info->coro_co_return_error_emitted = true; > > return false; > > } > > diff --git a/gcc/testsuite/g++.dg/coroutines/pr105301.C > > b/gcc/testsuite/g++.dg/coroutines/pr105301.C > > new file mode 100644 > > index 000..33a0b03cf5d > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/coroutines/pr105301.C > > @@ -0,0 +1,49 @@ > > +// { dg-additional-options "-fsyntax-only" } > > +namespace std { > > +template > > +struct traits_sfinae_base {}; > > + > > +template > > +struct coroutine_traits : public traits_sfinae_base {}; > > +} > > + > > +template struct coro {}; > > +template > > +struct std::coroutine_traits, Ps...> { > > + using promise_type = Promise; > > +}; > > + > > +struct awaitable { > > + bool await_ready() noexcept; > > + template > > + void await_suspend(F) noexcept; > > + void await_resume() noexcept; > > +} a; > > + > > +struct suspend_always { > > + bool await_ready() noexcept { return false; } > > + template > > + void await_suspend(F) noexcept; > > + void await_resume() noexcept {} > > +}; > > + > > +namespace std { > > +template > > +struct coroutine_handle {}; > > +} > > + > > +struct bad_promise_6 { > > + coro get_return_object(); > > + suspend_always initial_suspend(); > > + suspend_always final_suspend() noexcept; > > + void unhandled_exception(); > > + void return_void(); > > + void return_value(int) const; > > + void return_value(int); > > +}; > > + > > +coro > > +bad_implicit_return() // { dg-error {.aka 'bad_promise_6'. declares both > > 'return_value' and 'return_void'} } > > +{ > > + co_await a; > > +} >
[PATCH] loongarch: ignore zero-size fields in calling convention
Currently, LoongArch ELF psABI is not clear on the handling of zero- sized fields in aggregates arguments or return values [1]. The behavior of GCC trunk is puzzling considering the following cases: struct test1 { double a[0]; float x; }; struct test2 { float a[0]; float x; }; GCC trunk passes test1::x via GPR, but test2::x via FPR. I believe no rational Homo Sapiens can understand (or even expect) this. And, to make things even worse, test1 behaves differently in C and C++. GCC trunk passes test1::x via GPR, but G++ trunk passes test1::x via FPR. I've write a paragraph about current GCC behavior for the psABI [2], but I think it's cleaner to just ignore all zero-sized fields in the ABI. This will require only a two-line change in GCC (this patch), and an one-line change in the ABI doc. If there is not any better idea I'd like to see this reviewed and applied ASAP. If we finally have to apply this patch after GCC 12 release, we'll need to add a lot more boring code to emit a -Wpsabi inform [3]. That will be an unnecessary burden for both us, and the users using the compiler (as the compiler will spend CPU time only for checking if a warning should be informed). [1]:https://github.com/loongson/LoongArch-Documentation/issues/48 [2]:https://github.com/loongson/LoongArch-Documentation/pull/49 [3]:https://gcc.gnu.org/PR102024 gcc/ * config/loongarch/loongarch.cc (loongarch_flatten_aggregate_field): Ignore empty fields for RECORD_TYPE. gcc/testsuite/ * gcc.target/loongarch/zero-size-field-pass.c: New test. * gcc.target/loongarch/zero-size-field-ret.c: New test. --- gcc/config/loongarch/loongarch.cc | 3 ++ .../loongarch/zero-size-field-pass.c | 30 +++ .../loongarch/zero-size-field-ret.c | 28 + 3 files changed, 61 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c create mode 100644 gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index f22150a60cc..57e4d9f82ce 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -326,6 +326,9 @@ loongarch_flatten_aggregate_field (const_tree type, for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f)) if (TREE_CODE (f) == FIELD_DECL) { + if (DECL_SIZE (f) && integer_zerop (DECL_SIZE (f))) + continue; + if (!TYPE_P (TREE_TYPE (f))) return -1; diff --git a/gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c b/gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c new file mode 100644 index 000..999dc913a71 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c @@ -0,0 +1,30 @@ +/* Test that LoongArch backend ignores zero-sized fields of aggregates in + argument passing. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -mdouble-float -mabi=lp64d" } */ +/* { dg-final { scan-assembler "\\\$f1" } } */ + +struct test +{ + int empty1[0]; + double empty2[0]; + int : 0; + float x; + long empty3[0]; + long : 0; + float y; + unsigned : 0; + char empty4[0]; +}; + +extern void callee (struct test); + +void +caller (void) +{ + struct test test; + test.x = 114; + test.y = 514; + callee (test); +} diff --git a/gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c b/gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c new file mode 100644 index 000..40137d97555 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c @@ -0,0 +1,28 @@ +/* Test that LoongArch backend ignores zero-sized fields of aggregates in + returning. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -mdouble-float -mabi=lp64d" } */ +/* { dg-final { scan-assembler-not "\\\$r4" } } */ + +struct test +{ + int empty1[0]; + double empty2[0]; + int : 0; + float x; + long empty3[0]; + long : 0; + float y; + unsigned : 0; + char empty4[0]; +}; + +extern struct test callee (void); + +float +caller (void) +{ + struct test test = callee (); + return test.x + test.y; +} -- 2.36.0
[PATCH] cgraph: Fix up semantic_interposition handling [PR105306]
Hi! cgraph_node has a semantic_interposition flag which should mirror opt_for_fn (decl, flag_semantic_interposition). But it actually is initialized not from that, but from flag_semantic_interposition in the explicit symtab_node (symtab_type t) : type (t), resolution (LDPR_UNKNOWN), definition (false), alias (false), ... semantic_interposition (flag_semantic_interposition), ... x_comdat_group (NULL_TREE), x_section (NULL) {} ctor. I think that might be fine for varpool nodes, but since flag_semantic_interposition is now implied from -Ofast it isn't correct for cgraph nodes, unless we guarantee that cgraph node for a particular function decl is always created while that function is current_function_decl. That is often the case, but not always as the following function shows. Because symtab_node's ctor doesn't know for which decl the cgraph node is being created, the following patch keeps that as is, but updates it from opt_for_fn (decl, flag_semantic_interposition) when we know that, or for clones copies that flag (often it is then overridden in set_new_clone_decl_and_node_flags, but not always). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-04-20 Jakub Jelinek PR ipa/105306 * cgraph.cc (cgraph_node::create): Set node->semantic_interposition to opt_for_fn (decl, flag_semantic_interposition). * cgraphclones.cc (cgraph_node::create_clone): Copy over semantic_interposition flag. * g++.dg/opt/pr105306.C: New test. --- gcc/cgraph.cc.jj2022-02-04 14:36:54.069618372 +0100 +++ gcc/cgraph.cc 2022-04-19 13:38:06.223782974 +0200 @@ -507,6 +507,7 @@ cgraph_node::create (tree decl) gcc_assert (TREE_CODE (decl) == FUNCTION_DECL); node->decl = decl; + node->semantic_interposition = opt_for_fn (decl, flag_semantic_interposition); if ((flag_openacc || flag_openmp) && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) --- gcc/cgraphclones.cc.jj 2022-01-18 11:58:58.948991114 +0100 +++ gcc/cgraphclones.cc 2022-04-19 13:38:43.594262397 +0200 @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl new_node->versionable = versionable; new_node->can_change_signature = can_change_signature; new_node->redefined_extern_inline = redefined_extern_inline; + new_node->semantic_interposition = semantic_interposition; new_node->tm_may_enter_irr = tm_may_enter_irr; new_node->externally_visible = false; new_node->no_reorder = no_reorder; --- gcc/testsuite/g++.dg/opt/pr105306.C.jj 2022-04-19 13:42:33.908054114 +0200 +++ gcc/testsuite/g++.dg/opt/pr105306.C 2022-04-19 13:42:08.859403045 +0200 @@ -0,0 +1,13 @@ +// PR ipa/105306 +// { dg-do compile } +// { dg-options "-Ofast" } + +#pragma GCC optimize 0 +template void foo (T); +struct B { ~B () {} }; +struct C { B f; }; +template struct E { + void bar () { foo (g); } + C g; +}; +template class E; Jakub
回复:[PATCH] Asan changes for RISC-V.
Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is still unsupported for RISC-V. If I add '--enable-libsanitizer' in Makefile.in to reconfigure, there are compiling errors. Is it because # libsanitizer not supported rv32, but it will break the rv64 multi-lib build, so we disable that temporally until rv32 supported# in Makefile.in? -- 发件人:Jim Wilson 发送时间:2020年10月29日(星期四) 07:59 收件人:gcc-patches 抄 送:cooper.joshua ; Jim Wilson 主 题:[PATCH] Asan changes for RISC-V. We have only riscv64 asan support, there is no riscv32 support as yet. So I need to be able to conditionally enable asan support for the riscv target. I implemented this by returning zero from the asan_shadow_offset function. This requires a change to toplev.c and docs in target.def. The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel. The problem is that the asan high memory region is a small wedge below 0x40. The new kernel puts shared libraries at 0x3f and going down which works. But the old kernel puts shared libraries at 0x20 and going up which does not work, as it isn't in any recognized memory region. This might be fixable with more asan work, but we don't really need support for old kernel versions. The asan port is curious in that it uses 1<<29 for the shadow offset, but all other 64-bit targets use a number larger than 1<<32. But what we have is working OK for now. I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running on qemu and the results look reasonable. === gcc Summary === # of expected passes 1905 # of unexpected failures 11 # of unsupported tests 224 === g++ Summary === # of expected passes 2002 # of unexpected failures 6 # of unresolved testcases 1 # of unsupported tests 175 OK? Jim 2020-10-28 Jim Wilson gcc/ * config/riscv/riscv.c (riscv_asan_shadow_offset): New. (TARGET_ASAN_SHADOW_OFFSET): New. * doc/tm.texi: Regenerated. * target.def (asan_shadow_offset); Mention that it can return zero. * toplev.c (process_options): Check for and handle zero return from targetm.asan_shadow_offset call. Co-Authored-By: cooper.joshua --- gcc/config/riscv/riscv.c | 16 gcc/doc/tm.texi | 3 ++- gcc/target.def | 3 ++- gcc/toplev.c | 3 ++- 4 files changed, 22 insertions(+), 3 deletions(-) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 989a9f15250..6909e200de1 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op) return true; } +/* Implement TARGET_ASAN_SHADOW_OFFSET. */ + +static unsigned HOST_WIDE_INT +riscv_asan_shadow_offset (void) +{ + /* We only have libsanitizer support for RV64 at present. + + This number must match kRiscv*_ShadowOffset* in the file + libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64, + even though 1<<36 makes more sense. */ + return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0; +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op) #undef TARGET_NEW_ADDRESS_PROFITABLE_P #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p +#undef TARGET_ASAN_SHADOW_OFFSET +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-riscv.h" diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 24c37f655c8..39c596b647a 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -12078,7 +12078,8 @@ is zero, which disables this optimization. @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_ASAN_SHADOW_OFFSET (void) Return the offset bitwise ored into shifted address to get corresponding Address Sanitizer shadow memory address. NULL if Address Sanitizer is not -supported by the target. +supported by the target. May return 0 if Address Sanitizer is not supported +by a subtarget. @end deftypefn @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK (unsigned HOST_WIDE_INT @var{val}) diff --git a/gcc/target.def b/gcc/target.def index ed2da154e30..268b56b6ebd 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4452,7 +4452,8 @@ DEFHOOK (asan_shadow_offset, "Return the offset bitwise ored into shifted address to get corresponding\n\ Address Sanitizer shadow memory address. NULL if Address Sanitizer is not\n\ -supported by the target.", +supported by the target. May return 0 if Address Sanitizer is not supported\n\ +by a subtarget.", unsigned HOST_WIDE_INT, (void), NULL) diff --git a/gcc/toplev.c b/gcc/toplev.c index 20e231f4d2a..cf89598252c 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1834,7 +1834,8 @@ process_options (void) } if ((flag_sanitize & SANITIZE_USER_ADDRESS) - &&