[PATCH] libstdc++: Work around modules ICE in [PR105297]

2022-04-20 Thread Patrick Palka via Gcc-patches
This makes the initializer for __table in __from_chars_alnum_to_val
dependent in an artificial way, which works around the modules testsuite
ICE reported in PR105297 by preventing the initializer from getting
evaluated at parse time.

Compared to the alternative workaround of using a non-local class type
for __table, this workaround has the advantage of slightly speeding up
compilation of the  header, since now the table will not get
built (via constexpr evaluation) until one of the integer std::from_chars
overloads is actually instantiated.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR c++/105297
PR c++/105322

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum_to_val): Make
initializer for __table dependent in an artificial way.
---
 libstdc++-v3/include/std/charconv | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index f1ace406017..561234cb2fc 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -445,7 +445,9 @@ namespace __detail
return __c - '0';
   else
{
- static constexpr auto __table = __from_chars_alnum_to_val_table();
+ // This initializer is deliberately made dependent in order to work
+ // around modules bug PR105322.
+ static constexpr auto __table = (_DecOnly, 
__from_chars_alnum_to_val_table());
  return __table.__data[__c];
}
 }
-- 
2.36.0.rc2.10.g1ac7422e39



[PATCH] c++: Add srodata to the allowed sections

2022-04-20 Thread Palmer Dabbelt
This fires errors like

FAIL: g++.dg/opt/const7.C  -std=c++14  scan-assembler-symbol-section symbol 
b_var (found _ZL5b_var) has section ^\\.(const|rodata)|\\[RO\\] (found .srodata)

on RISC-V, where RO data can end up in the srodata section.

gcc/testsuite/ChangeLog:

* g++.dg/opt/const7.C: Allow symbols in .srodata

---

I didn't actually re-run the test suite, as I was poking around with
something else.  This one seems pretty trivial, though.  Happy to do so
before committing, but figured I'd send it out anyway in case anyone
else is triaging our bugs.
---
 gcc/testsuite/g++.dg/opt/const7.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/opt/const7.C 
b/gcc/testsuite/g++.dg/opt/const7.C
index 5bcf94897a8..8bbd9db973f 100644
--- a/gcc/testsuite/g++.dg/opt/const7.C
+++ b/gcc/testsuite/g++.dg/opt/const7.C
@@ -4,4 +4,4 @@
 
 struct B { B()=default; };
 static const B b_var;  //  { dg-bogus "" }
-// { dg-final { scan-assembler-symbol-section {b_var} 
{^\.(const|rodata)|\[RO\]} } }
+// { dg-final { scan-assembler-symbol-section {b_var} 
{^\.(const|rodata|srodata)|\[RO\]} } }
-- 
2.34.1



Re: [PATCH] Asan changes for RISC-V.

2022-04-20 Thread Kito Cheng via Gcc-patches
Hi Joshua:

> Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is 
> still unsupported for RISC-V. If I add '--enable-libsanitizer' in Makefile.in 
> to reconfigure, there are compiling errors.
Is it because # libsanitizer not supported rv32, but it will break the
rv64 multi-lib build, so we disable that temporally until rv32
supported# in Makefile.in?

IIUC, you mean the Makefile in riscv-gnu-toolchain instead of upstream
GCC, right? I guess we can make a configure option to enable that and
check it does not come with multi-lib, or maybe you could fix that on
GCC's configure script to make the multi-lib build be ignored for
rv32?


On Wed, Apr 20, 2022 at 2:13 PM joshua via Gcc-patches
 wrote:
>
> Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is 
> still unsupported for RISC-V. If I add '--enable-libsanitizer' in Makefile.in 
> to reconfigure, there are compiling errors.
> Is it because # libsanitizer not supported rv32, but it will break the rv64 
> multi-lib build, so we disable that temporally until rv32 supported# in 
> Makefile.in?
>
>
> --
> 发件人:Jim Wilson 
> 发送时间:2020年10月29日(星期四) 07:59
> 收件人:gcc-patches 
> 抄 送:cooper.joshua ; Jim Wilson 
> 
> 主 题:[PATCH] Asan changes for RISC-V.
>
> We have only riscv64 asan support, there is no riscv32 support as yet.  So I
> need to be able to conditionally enable asan support for the riscv target.  I
> implemented this by returning zero from the asan_shadow_offset function.  This
> requires a change to toplev.c and docs in target.def.
>
> The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel.
> The problem is that the asan high memory region is a small wedge below
> 0x40.  The new kernel puts shared libraries at 0x3f and going
> down which works.  But the old kernel puts shared libraries at 0x20
> and going up which does not work, as it isn't in any recognized memory
> region.  This might be fixable with more asan work, but we don't really need
> support for old kernel versions.
>
> The asan port is curious in that it uses 1<<29 for the shadow offset, but all
> other 64-bit targets use a number larger than 1<<32.  But what we have is
> working OK for now.
>
> I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running on
> qemu and the results look reasonable.
>
>   === gcc Summary ===
>
> # of expected passes  1905
> # of unexpected failures 11
> # of unsupported tests  224
>
>   === g++ Summary ===
>
> # of expected passes  2002
> # of unexpected failures 6
> # of unresolved testcases 1
> # of unsupported tests  175
>
> OK?
>
> Jim
>
> 2020-10-28  Jim Wilson  
>
>  gcc/
>  * config/riscv/riscv.c (riscv_asan_shadow_offset): New.
>  (TARGET_ASAN_SHADOW_OFFSET): New.
>  * doc/tm.texi: Regenerated.
>  * target.def (asan_shadow_offset); Mention that it can return zero.
>  * toplev.c (process_options): Check for and handle zero return from
>  targetm.asan_shadow_offset call.
>
> Co-Authored-By: cooper.joshua 
> ---
>  gcc/config/riscv/riscv.c | 16 
>  gcc/doc/tm.texi  |  3 ++-
>  gcc/target.def   |  3 ++-
>  gcc/toplev.c |  3 ++-
>  4 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 989a9f15250..6909e200de1 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op)
>return true;
>  }
>
> +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> +
> +static unsigned HOST_WIDE_INT
> +riscv_asan_shadow_offset (void)
> +{
> +  /* We only have libsanitizer support for RV64 at present.
> +
> + This number must match kRiscv*_ShadowOffset* in the file
> + libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64,
> + even though 1<<36 makes more sense.  */
> +  return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> @@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op)
>  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
>  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
>
> +#undef TARGET_ASAN_SHADOW_OFFSET
> +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  #include "gt-riscv.h"
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 24c37f655c8..39c596b647a 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12078,7 +12078,8 @@ is zero, which disables this optimization.
>  @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_ASAN_SHADOW_OFFSET 
> (void)
>  Return the offset bitwise ored into shifted address to get corresponding
>  Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not
> -supported by the target.
> +supported by the target.  May 

Re: 回复:[PATCH] Asan changes for RISC-V.

2022-04-20 Thread Kito Cheng via Gcc-patches
Arm 32, x86 (32) and mips has support for Asan[1], so we can
`reference` how they implement that,
but I guess the problem is we need someone to do that.

[1] 
https://github.com/llvm/llvm-project/blob/main/compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake#L28

On Thu, Apr 21, 2022 at 7:54 AM Palmer Dabbelt  wrote:
>
> On Tue, 19 Apr 2022 23:13:15 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> > Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is 
> > still unsupported for RISC-V. If I add '--enable-libsanitizer' in 
> > Makefile.in to reconfigure, there are compiling errors.
> > Is it because # libsanitizer not supported rv32, but it will break the rv64 
> > multi-lib build, so we disable that temporally until rv32 supported# in 
> > Makefile.in?
>
> Not quite sure what's going on here, I keep getting copies of this
> message that look empty in gmail.
>
> I was under the impression that asan worked on rv64, but remember there
> being some worrisome constants floating around (as Jim alludes to in the
> forwarded patch).  As far as I can tell there's no libsanitizer support
> for rv32 (upstream is at LLVM), probably because we didn't have a stable
> uABI back then.  It's not super hard to do a libsanitizer port, but I
> don't see any other 32-bit targets with asan so either I'm missing
> something or it's tricky (and we don't have much free VA space, so not
> sure if it'd even run anything useful).
>
> > --
> > 发件人:Jim Wilson 
> > 发送时间:2020年10月29日(星期四) 07:59
> > 收件人:gcc-patches 
> > 抄 送:cooper.joshua ; Jim Wilson 
> > 
> > 主 题:[PATCH] Asan changes for RISC-V.
> >
> > We have only riscv64 asan support, there is no riscv32 support as yet.  So I
> > need to be able to conditionally enable asan support for the riscv target.  
> > I
> > implemented this by returning zero from the asan_shadow_offset function.  
> > This
> > requires a change to toplev.c and docs in target.def.
> >
> > The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel.
> > The problem is that the asan high memory region is a small wedge below
> > 0x40.  The new kernel puts shared libraries at 0x3f and 
> > going
> > down which works.  But the old kernel puts shared libraries at 0x20
> > and going up which does not work, as it isn't in any recognized memory
> > region.  This might be fixable with more asan work, but we don't really need
> > support for old kernel versions.
> >
> > The asan port is curious in that it uses 1<<29 for the shadow offset, but 
> > all
> > other 64-bit targets use a number larger than 1<<32.  But what we have is
> > working OK for now.
> >
> > I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running 
> > on
> > qemu and the results look reasonable.
> >
> >   === gcc Summary ===
> >
> > # of expected passes  1905
> > # of unexpected failures 11
> > # of unsupported tests  224
> >
> >   === g++ Summary ===
> >
> > # of expected passes  2002
> > # of unexpected failures 6
> > # of unresolved testcases 1
> > # of unsupported tests  175
> >
> > OK?
> >
> > Jim
> >
> > 2020-10-28  Jim Wilson  
> >
> >  gcc/
> >  * config/riscv/riscv.c (riscv_asan_shadow_offset): New.
> >  (TARGET_ASAN_SHADOW_OFFSET): New.
> >  * doc/tm.texi: Regenerated.
> >  * target.def (asan_shadow_offset); Mention that it can return zero.
> >  * toplev.c (process_options): Check for and handle zero return from
> >  targetm.asan_shadow_offset call.
> >
> > Co-Authored-By: cooper.joshua 
> > ---
> >  gcc/config/riscv/riscv.c | 16 
> >  gcc/doc/tm.texi  |  3 ++-
> >  gcc/target.def   |  3 ++-
> >  gcc/toplev.c |  3 ++-
> >  4 files changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> > index 989a9f15250..6909e200de1 100644
> > --- a/gcc/config/riscv/riscv.c
> > +++ b/gcc/config/riscv/riscv.c
> > @@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op)
> >return true;
> >  }
> >
> > +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> > +
> > +static unsigned HOST_WIDE_INT
> > +riscv_asan_shadow_offset (void)
> > +{
> > +  /* We only have libsanitizer support for RV64 at present.
> > +
> > + This number must match kRiscv*_ShadowOffset* in the file
> > + libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64,
> > + even though 1<<36 makes more sense.  */
> > +  return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
> > +}
> > +
> >  /* Initialize the GCC target structure.  */
> >  #undef TARGET_ASM_ALIGNED_HI_OP
> >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > @@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op)
> >  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
> >  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
> >
> > +#undef TARGET_ASAN_SHADOW_OFFSET
> > +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
> > +
> >  struct gcc_target targetm = 

Re: [PATCH v4] libgo: Don't use pt_regs member in mcontext_t

2022-04-20 Thread Ian Lance Taylor via Gcc-patches
On Thu, Apr 14, 2022 at 3:15 PM Ian Lance Taylor  wrote:
>
> Thanks!  I tested a version of that code with glibc, and it works
> there too, so I've committed this patch after testing on
> powerpc-linux-gnu and x86_64-linux-gnu.  Please let me know about any
> problems.

Well, that patch broke PPC 32-bit, as reported in PR 105315, so I've
committed this one.  Tested on powerpc-linux-gnu, powerpc64-linux-gnu,
powerpc64le-linux-gnu, all with glibc.  I hope that it doesn't break
musl again.

Ian
8e14028002a661be19619ee8df081b713a8ec4a5
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 63238715bd0..ef20a0aafd6 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-99ca6be406a5781be078ff23f45a72b4c84b16e3
+70ca85f08edf63f46c87d540fa99c45e2903edc2
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
index 2caddd068d6..528d9b6d9fe 100644
--- a/libgo/runtime/go-signal.c
+++ b/libgo/runtime/go-signal.c
@@ -233,7 +233,11 @@ getSiginfo(siginfo_t *info, void *context 
__attribute__((unused)))
 #elif defined(__PPC64__) && defined(__linux__)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32];
 #elif defined(__PPC__) && defined(__linux__)
+# if defined(__GLIBC__)
+   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32];
+# else
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
+# endif
 #elif defined(__PPC__) && defined(_AIX)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar;
 #elif defined(__aarch64__) && defined(__linux__)
@@ -344,12 +348,13 @@ dumpregs(siginfo_t *info __attribute__((unused)), void 
*context __attribute__((u
runtime_printf("sp  %X\n", m->sc_regs[30]);
runtime_printf("pc  %X\n", m->sc_pc);
  }
-#elif defined(__PPC__) && defined(__LITTLE_ENDIAN__) && defined(__linux__)
+#elif defined(__PPC__) && defined(__linux__)
  {
-   mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
int i;
 
-#if defined(__PPC64__)
+# if defined(__PPC64__)
+   mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
+
for (i = 0; i < 32; i++)
runtime_printf("r%d %X\n", i, m->gp_regs[i]);
runtime_printf("pc  %X\n", m->gp_regs[32]);
@@ -358,16 +363,22 @@ dumpregs(siginfo_t *info __attribute__((unused)), void 
*context __attribute__((u
runtime_printf("lr  %X\n", m->gp_regs[36]);
runtime_printf("ctr %X\n", m->gp_regs[35]);
runtime_printf("xer %X\n", m->gp_regs[37]);
-#else
+# else
+#  if defined(__GLIBC__)
+   mcontext_t *m = ((ucontext_t*)(context))->uc_mcontext.uc_regs;
+#  else
+   mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
+#  endif
+
for (i = 0; i < 32; i++)
-   runtime_printf("r%d %X\n", i, m->gregs[i]);
-   runtime_printf("pc  %X\n", m->gregs[32]);
-   runtime_printf("msr %X\n", m->gregs[33]);
-   runtime_printf("cr  %X\n", m->gregs[38]);
-   runtime_printf("lr  %X\n", m->gregs[36]);
-   runtime_printf("ctr %X\n", m->gregs[35]);
-   runtime_printf("xer %X\n", m->gregs[37]);
-#endif
+   runtime_printf("r%d %x\n", i, m->gregs[i]);
+   runtime_printf("pc  %x\n", m->gregs[32]);
+   runtime_printf("msr %x\n", m->gregs[33]);
+   runtime_printf("cr  %x\n", m->gregs[38]);
+   runtime_printf("lr  %x\n", m->gregs[36]);
+   runtime_printf("ctr %x\n", m->gregs[35]);
+   runtime_printf("xer %x\n", m->gregs[37]);
+# endif
  }
 #elif defined(__PPC__) && defined(_AIX)
  {


Re: 回复:[PATCH] Asan changes for RISC-V.

2022-04-20 Thread Palmer Dabbelt

On Tue, 19 Apr 2022 23:13:15 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is 
still unsupported for RISC-V. If I add '--enable-libsanitizer' in Makefile.in 
to reconfigure, there are compiling errors.
Is it because # libsanitizer not supported rv32, but it will break the rv64 
multi-lib build, so we disable that temporally until rv32 supported# in 
Makefile.in?


Not quite sure what's going on here, I keep getting copies of this 
message that look empty in gmail.


I was under the impression that asan worked on rv64, but remember there 
being some worrisome constants floating around (as Jim alludes to in the 
forwarded patch).  As far as I can tell there's no libsanitizer support 
for rv32 (upstream is at LLVM), probably because we didn't have a stable 
uABI back then.  It's not super hard to do a libsanitizer port, but I 
don't see any other 32-bit targets with asan so either I'm missing 
something or it's tricky (and we don't have much free VA space, so not 
sure if it'd even run anything useful).



--
发件人:Jim Wilson 
发送时间:2020年10月29日(星期四) 07:59
收件人:gcc-patches 
抄 送:cooper.joshua ; Jim Wilson 

主 题:[PATCH] Asan changes for RISC-V.

We have only riscv64 asan support, there is no riscv32 support as yet.  So I
need to be able to conditionally enable asan support for the riscv target.  I
implemented this by returning zero from the asan_shadow_offset function.  This
requires a change to toplev.c and docs in target.def.

The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel.
The problem is that the asan high memory region is a small wedge below
0x40.  The new kernel puts shared libraries at 0x3f and going
down which works.  But the old kernel puts shared libraries at 0x20
and going up which does not work, as it isn't in any recognized memory
region.  This might be fixable with more asan work, but we don't really need
support for old kernel versions.

The asan port is curious in that it uses 1<<29 for the shadow offset, but all
other 64-bit targets use a number larger than 1<<32.  But what we have is
working OK for now.

I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running on
qemu and the results look reasonable.

  === gcc Summary ===

# of expected passes  1905
# of unexpected failures 11
# of unsupported tests  224

  === g++ Summary ===

# of expected passes  2002
# of unexpected failures 6
# of unresolved testcases 1
# of unsupported tests  175

OK?

Jim

2020-10-28  Jim Wilson  

 gcc/
 * config/riscv/riscv.c (riscv_asan_shadow_offset): New.
 (TARGET_ASAN_SHADOW_OFFSET): New.
 * doc/tm.texi: Regenerated.
 * target.def (asan_shadow_offset); Mention that it can return zero.
 * toplev.c (process_options): Check for and handle zero return from
 targetm.asan_shadow_offset call.

Co-Authored-By: cooper.joshua 
---
 gcc/config/riscv/riscv.c | 16 
 gcc/doc/tm.texi  |  3 ++-
 gcc/target.def   |  3 ++-
 gcc/toplev.c |  3 ++-
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 989a9f15250..6909e200de1 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op)
   return true;
 }

+/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
+
+static unsigned HOST_WIDE_INT
+riscv_asan_shadow_offset (void)
+{
+  /* We only have libsanitizer support for RV64 at present.
+
+ This number must match kRiscv*_ShadowOffset* in the file
+ libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64,
+ even though 1<<36 makes more sense.  */
+  return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op)
 #undef TARGET_NEW_ADDRESS_PROFITABLE_P
 #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p

+#undef TARGET_ASAN_SHADOW_OFFSET
+#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 #include "gt-riscv.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 24c37f655c8..39c596b647a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12078,7 +12078,8 @@ is zero, which disables this optimization.
 @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_ASAN_SHADOW_OFFSET 
(void)
 Return the offset bitwise ored into shifted address to get corresponding
 Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not
-supported by the target.
+supported by the target.  May return 0 if Address Sanitizer is not supported
+by a subtarget.
 @end deftypefn

 @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK 
(unsigned HOST_WIDE_INT @var{val})
diff --git 

[PATCH] c++: wrong error with constexpr COMPOUND_EXPR [PR105321]

2022-04-20 Thread Marek Polacek via Gcc-patches
Here we issue a bogus error for the first assert in the test.  Therein
we have

 = (void) (VIEW_CONVERT_EXPR(yes) || handle_error ());, 
VIEW_CONVERT_EXPR(value);

which has a COMPOUND_EXPR, so we get to cxx_eval_constant_expression
.  The problem here is that we call

7044 /* Check that the LHS is constant and then discard it.  */
7045 cxx_eval_constant_expression (ctx, op0,
7046   true, non_constant_p, overflow_p,
7047   jump_target);

where lval is always true, so the PARM_DECL 'yes' is not evaluated into
its value.  r218832 changed the argument for 'lval' from false to true:

(cxx_eval_constant_expression) [COMPOUND_EXPR]: Pass true for lval.

but I think we want to pass 'lval' instead.  Jakub tells me that's what
we do for "(void) expr" as well.  [expr.comma] says that the left expression
is a discarded-value expression, but [expr.context] doesn't suggest that
we should always be passing false for lval as pre-r218832.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11.3?

PR c++/105321

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression) : Pass
lval to cxx_eval_constant_expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-105321.C: New test.
---
 gcc/cp/constexpr.cc   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index e89440e770f..28271d4405d 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -7043,7 +7043,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  {
/* Check that the LHS is constant and then discard it.  */
cxx_eval_constant_expression (ctx, op0,
- true, non_constant_p, overflow_p,
+ lval, non_constant_p, overflow_p,
  jump_target);
if (*non_constant_p)
  return t;
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C
new file mode 100644
index 000..adb6830ff22
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-105321.C
@@ -0,0 +1,18 @@
+// PR c++/105321
+// { dg-do compile { target c++11 } }
+
+bool handle_error();
+
+constexpr int echo(int value, bool yes = true) noexcept
+{
+return (yes || handle_error()), value;
+}
+
+static_assert(echo(10) == 10, "");
+
+constexpr int echo2(int value, bool no = false) noexcept
+{
+return (!no || handle_error()), value;
+}
+
+static_assert(echo2(10) == 10, "");

base-commit: 5bde80f48bcc594658c788895ad1fd86d0916fc2
-- 
2.35.1



Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-04-20 Thread will schmidt via Gcc-patches
On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059
> 
> This is V4 of the patch.  Compared to V3 of the patch, GCC will just
> ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.
> 


Hi, 
No comments on code, a few comments about the comments below.



> The splitting of signed halfword and word loads into unsigned load and
> sign extension is now suppressed with -Os, but it is done normally if we
> are not optimizing for space.

I see references to TARGET_P8_FUSION_SIGN in the patch below, and some
removal of old code.  I assume this describes the implementation that
remains.  

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.

specification.  :-)

> 
> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't 
> do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.

inlining. 


> 
> This code makes the -mpower8-fusion option a nop.  It is accepted without
> warning, but it does nothing.  Power8 fusion is only enabled if we are tuning
> for a power8.
> 
> The undocumented -mpower8-fusion-sign option is also made into a nop.
> 
> I left in the pragma target and attribute target support for power8-fusion, 
> but
> using it doesn't do anything now.  This is because I told the customer who
> encountered this problem that one solution was to add an explicit
> no-power8-fusion option in their target pragma or attribute to work around the
> problem.
> 
> I have tested this patch on a little endian power10 system.  I have tested
> previous versions on little endian power9 and big endian power8 systems.
> Can I apply this patch to the master branch?
> 
> If it is accepted, I will produce a similar patch for back porting to GCC 11
> and GCC 10.
> 
> 2022-04-12   Michael Meissner  
> 
> gcc/
>   PR target/102059
>   * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
>   (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
>   (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Delete code that set the power8 fusion options automatically.
>   (rs6000_opt_masks): Allow #pragma target and attribute target
>   power8-fusion option for backwards compatibility.
>   (rs6000_print_options_internal): Skip printing backward
>   compatibility options that are just ignored.
>   * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
>   (TARGET_P8_FUSION_SIGN): Likewise.
>   (MASK_P8_FUSION): Delete.
>   * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
>   ignore it completely.
>   (-mpower8-fusion-sign): Likewise.
>   * doc/invoke.texi (RS/6000 and PowerPC Options): Delete
>   -mpower8-fusion.
> 
> gcc/testsuite/
>   PR target/102059
>   * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
>   * gcc.dg/lto/pr102059-2_0.c: Likewise.
>   * gcc.target/powerpc/pr102059-3.c: Likewise.
>   * gcc.target/powerpc/pr102059-4.c: New test.
> ---
>  gcc/config/rs6000/rs6000-cpus.def | 18 +++
>  gcc/config/rs6000/rs6000.cc   | 49 +--
>  gcc/config/rs6000/rs6000.h| 13 -
>  gcc/config/rs6000/rs6000.opt  |  8 +--
>  gcc/doc/invoke.texi   | 13 +
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c   |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +
>  9 files changed, 62 insertions(+), 68 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..d913a3d6b73 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -54,19 +54,14 @@
>| OPTION_MASK_QUAD_MEMORY  \
>| OPTION_MASK_QUAD_MEMORY_ATOMIC)
> 
> -/* ISA masks setting fusion options.  */
> -#define OTHER_FUSION_MASKS   (OPTION_MASK_P8_FUSION  \
> -  | OPTION_MASK_P8_FUSION_SIGN)
> -
>  /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not 

Re: [PATCH] PR fortran/105310 - ICE when UNION is after the 8th field in a DEC STRUCTURE with -finit-derived -finit-local-zero

2022-04-20 Thread Harald Anlauf via Gcc-patches

Hi Fritz,

Am 20.04.22 um 20:03 schrieb Fritz Reese via Fortran:

See the bug report at gcc dot gnu dot org/bugzilla/show_bug.cgi?id=105310 .

This code was originally authored by me and the fix is trivial, so I
intend to commit the attached patch in the next few days if there is
no dissent.


OK if you add a/the testcase.



The bug is caused by gfc_conv_union_initializer in
gcc/fortran/trans-expr.cc, which accepts a pointer to a vector of
constructor trees (vec*) as an argument, then
appends one or two field constructors to the vector. The problem is
the use of CONSTRUCTOR_APPEND_ELT(v, ...) within
gfc_conv_union_initializer, which modifies the vector pointer v when a
reallocation of the vector occurs, but the pointer is passed by value.
Therefore, when a vector reallocation occurs, the caller's
(gfc_conv_structure) vector pointer is not updated and subsequently
points to freed memory. Chaos ensues.

The bug only occurs when gfc_conv_union_initializer itself triggers
the reallocation, which is whenever the vector is "full"
(v->m_vecpfx.m_alloc == v->m_vecpfx.m_num). Since the vector defaults
to allocating 8 elements and doubles in size for every reallocation,
the bug only occurs when there are 8, 16, 32, etc... fields with
initializers prior to the union, causing the vector of constructors to
be resized when entering gfc_conv_union_initializer. The
-finit-derived and -finit-local-zero options together ensure each
field has an initializer, triggering the bug.

The patch fixes the bug by passing the vector pointer to
gfc_conv_union_initializer by reference, matching the signature of
vec_safe_push from within the CONSTRUCTOR_APPEND_ELT macro.

--
Fritz Reese


As this affects all branches, you may backport the patch as far as
you feel reasonable.  (No, I do not use DEC extensions personally.)

Thanks for the patch!

Harald


Re: Ping: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-04-20 Thread Peter Bergner via Gcc-patches
On 4/20/22 11:01 AM, Michael Meissner wrote:
> Ping patch.
> 
> | Date: Tue, 12 Apr 2022 21:14:55 -0400
> | From: Michael Meissner 
> | Subject: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR 
> target/102059
> | Message-ID: 
> 
> I feel this is an important patch.  Please look at it and approve the patch or
> give me feedback on how to change it.  Note, I will be in today (April 20th)
> and tomorrow (April 21st), but I will be away from a computer on April 22-25
> (Friday through Monday).

I agree this is important and we want this is in so we can get it backported.
I'm being pinged about this from a customer who is using GCC10 and this issue
is holding them back, so the quicker we get this in, the better.

Peter




Re: [PATCH] emit-rtl: Fix -fcompare-debug bug with label references in debug insns [PR105203]

2022-04-20 Thread Richard Biener via Gcc-patches



> Am 20.04.2022 um 18:52 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> When we compute LABEL_NUSES from scratch, mark_all_labels doesn't call
> mark_jump_label on DEBUG_INSNs:
>  if (NONDEBUG_INSN_P (insn))
>mark_jump_label (PATTERN (insn), insn, 0);
> and so doesn't increment LABEL_NUSES from references in DEBUG_INSNs.
> But, when we call emit_copy_of_insn_after e.g. when duplicating some
> DEBUG_INSNs, we call it even on those, which then results in LABEL_NUSES
> differences and -fcompare-debug failures.
> 
> The following patch makes sure we don't call it on DEBUG_INSNs.
> 
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Ok

Richard 
> 
> 2022-04-20  Jakub Jelinek  
> 
>PR debug/105203
>* emit-rtl.cc (emit_copy_of_insn_after): Don't call mark_jump_label
>on DEBUG_INSNs.
> 
>* gfortran.dg/g77/pr105203.f: New test.
> 
> --- gcc/emit-rtl.cc.jj2022-02-23 09:17:04.805125253 +0100
> +++ gcc/emit-rtl.cc2022-04-20 10:26:44.972198107 +0200
> @@ -6440,7 +6440,8 @@ emit_copy_of_insn_after (rtx_insn *insn,
> }
> 
>   /* Update LABEL_NUSES.  */
> -  mark_jump_label (PATTERN (new_rtx), new_rtx, 0);
> +  if (NONDEBUG_INSN_P (insn))
> +mark_jump_label (PATTERN (new_rtx), new_rtx, 0);
> 
>   INSN_LOCATION (new_rtx) = INSN_LOCATION (insn);
> 
> --- gcc/testsuite/gfortran.dg/g77/pr105203.f.jj2022-04-20 
> 10:29:44.830696254 +0200
> +++ gcc/testsuite/gfortran.dg/g77/pr105203.f2022-04-20 10:31:13.532463772 
> +0200
> @@ -0,0 +1,20 @@
> +C Test case for PR debug/105203
> +C Origin: kmcca...@princeton.edu
> +C
> +C { dg-do compile }
> +C { dg-options "-O2 -fcompare-debug -ftracer -w" }
> +C { dg-additional-options "-fPIC" { target fpic } }
> +  SUBROUTINE FOO (B)
> +
> +  10  CALL BAR (A)
> +  ASSIGN 20 TO M
> +  IF (100.LT.A) GOTO 10
> +  GOTO 40
> +C
> +  20  IF (B.LT.ABS(A)) GOTO 10
> +  ASSIGN 30 TO M
> +  GOTO 40
> +C
> +  30  ASSIGN 10 TO M
> +  40  GOTO M,(10,20,30)
> +  END
> 
>Jakub
> 


Re: [PATCH] opts: Disable -gstatement-frontiers by default [PR103788]

2022-04-20 Thread Richard Biener via Gcc-patches



> Am 20.04.2022 um 19:15 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> As mentioned in those PRs and I think in others too, there are some long
> time unresolved -fcompare-debug issues with DEBUG_BEGIN_STMTs in the FEs and
> during gimplification, especially with statement expressions, where we end
> up with different code generation depending on whether there are
> DEBUG_BEGIN_STMTs (which force STATEMENT_LISTs) or not (in that case
> we often have just the single expression from the list).
> I've tried to fix that several times, but nothing worked.
> Furthermore, Alex mentioned in bugzilla that there are no consumers of the
> statement frontiers right now.
> 
> This patch turns -gstatement-frontiers off by default because of those
> 2 reasons, consumers for those can still be added (one can test with
> explicit -gstatement-frontiers) and if/once that happens, perhaps somebody
> will have some great idea how to resolve those -fcompare-debug issues.
> 
> Until then, can we go with this?
> 
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk if it also passes
> bootstrap/regtest on x86_64-linux/i686-linux?

OK.

Richard.

> 2022-04-20  Jakub Jelinek  
> 
>PR debug/103788
>PR middle-end/100733
>PR debug/104180
>* opts.cc (finish_options): Disable -gstatement-frontiers by default.
> 
>* gcc.dg/pr103788.c: New test.
>* c-c++-common/ubsan/pr100733.c: New test.
>* g++.dg/debug/pr104180.C: New test.
> 
> --- gcc/opts.cc.jj2022-04-06 17:42:03.084190238 +0200
> +++ gcc/opts.cc2022-04-20 13:12:22.282322920 +0200
> @@ -1317,12 +1317,16 @@ finish_options (struct gcc_options *opts
>debug_info_level = DINFO_LEVEL_NONE;
> }
> 
> +  /* Don't enable -gstatement-frontiers by default until some consumers
> + actually consume it and until the issues with DEBUG_BEGIN_STMTs
> + affecting code generation e.g. for statement expressions are resolved.
> + See PR103788, PR104180, PR100733.
>   if (!OPTION_SET_P (debug_nonbind_markers_p))
> debug_nonbind_markers_p
>   = (optimize
> && debug_info_level >= DINFO_LEVEL_NORMAL
> && dwarf_debuginfo_p ()
> - && !(flag_selective_scheduling || flag_selective_scheduling2));
> + && !(flag_selective_scheduling || flag_selective_scheduling2));  */
> 
>   /* Note -fvar-tracking is enabled automatically with OPT_LEVELS_1_PLUS and
>  so we need to drop it if we are called from optimize attribute.  */
> --- gcc/testsuite/gcc.dg/pr103788.c.jj2022-04-20 13:13:47.253141338 +0200
> +++ gcc/testsuite/gcc.dg/pr103788.c2022-04-20 13:13:29.301390970 +0200
> @@ -0,0 +1,28 @@
> +/* PR debug/103788 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fcompare-debug" } */
> +
> +int
> +bar (void);
> +
> +int
> +foo (int x)
> +{
> +  int i;
> +
> +  for (i = 0; i <= __INT_MAX__; ++i)
> +x += bar () < (x ? 2 : 1);
> +
> +  return x;
> +}
> +
> +int
> +baz (int x)
> +{
> +  int i;
> +
> +  for (i = 0; i <= __INT_MAX__; ++i)
> +x += bar () < (
> +x ? 2 : 1 );
> +  return x;
> +}
> --- gcc/testsuite/c-c++-common/ubsan/pr100733.c.jj2022-04-20 
> 13:18:09.135499667 +0200
> +++ gcc/testsuite/c-c++-common/ubsan/pr100733.c2022-04-20 
> 13:18:43.031028328 +0200
> @@ -0,0 +1,9 @@
> +/* PR middle-end/100733 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug 
> -fdisable-tree-phiopt2" } */
> +
> +int
> +foo (int x)
> +{
> +  return (__builtin_expect (({ x != 0; }) ? 0 : 1, 3) == 0) * -1 << 0;
> +}
> --- gcc/testsuite/g++.dg/debug/pr104180.C.jj2022-04-20 13:14:51.468248383 
> +0200
> +++ gcc/testsuite/g++.dg/debug/pr104180.C2022-04-20 13:15:17.856881425 
> +0200
> @@ -0,0 +1,14 @@
> +/* PR debug/104180 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fcompare-debug" } */
> +
> +int a[5];
> +
> +void
> +foo (void)
> +{
> +  unsigned int b;
> +
> +  for (b = 3; ; b--)
> +a[b] = ({ a[b + 1]; });
> +}
> 
>Jakub
> 


[PATCH] PR middle-end/98865: Optimize (a>>63)*b as -(a>>63) in match.pd.

2022-04-20 Thread Roger Sayle

This patch implements the constant folding optimization(s) described in
PR middle-end/98865, which should help address the serious performance
regression of Botan AES-128/XTS mentioned in PR tree-optimization/98856.
This combines aspects of both Jakub Jelinek's patch in comment #2 and
Andrew Pinski's patch in comment #4, so both are listed as co-authors.

Alas truth_valued_p is not quite what we want (and tweaking its
definition has undesirable side-effects), so instead this patch
introduces a new zero_one_valued predicate based on tree_nonzero_bits
that extends truth_valued_p (which is for Boolean types with single
bit precision).  This is then used to simple if X*Y into X when
both X and Y are zero_one_valued_p, and simplify X*Y into (-X) when
X is zero_one_valued_p, in both cases replacing an integer multiplication
with a cheaper bit-wise AND.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and with --target_board=unix{-m32}, with
no new failures, except for a tweak required to tree-ssa/vrp116.c.
The recently proposed cmove patch ensures the i386 backend continues
to generate identical code for vrp116.c as before.

Ok, either for mainline or when stage 1 reopens?


2022-04-20  Roger Sayle  
Andrew Pinski  
Jakub Jelinek  

gcc/ChangeLog
PR middle-end/98865
* match.pd (match zero_one_valued_p): New predicate.
(mult @0 @1): Use zero_one_valued_p for transforming into (and @0
@1).
(mult zero_one_valued_p@0 @1): Convert integer multiplication into
a negation and a bit-wise AND, if it can't be cheaply implemented by
a single left shift.

gcc/testsuite/ChangeLog
PR middle-end/98865
* gcc.dg/pr98865.c: New test case.
* gcc.dg/vrp116.c: Tweak test to confirm the integer multiplication
has been eliminated, not for the actual replacement implementation.

Thanks,
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 6d691d3..16a1203 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -285,14 +285,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
|| !COMPLEX_FLOAT_TYPE_P (type)))
(negate @0)))
 
-/* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 } */
-(simplify
- (mult SSA_NAME@1 SSA_NAME@2)
-  (if (INTEGRAL_TYPE_P (type)
-   && get_nonzero_bits (@1) == 1
-   && get_nonzero_bits (@2) == 1)
-   (bit_and @1 @2)))
-
 /* Transform x * { 0 or 1, 0 or 1, ... } into x & { 0 or -1, 0 or -1, ...},
unless the target has native support for the former but not the latter.  */
 (simplify
@@ -1789,6 +1781,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bit_not (bit_not @0))
   @0)
 
+(match zero_one_valued_p
+ @0
+ (if (INTEGRAL_TYPE_P (type) && tree_nonzero_bits (@0) == 1)))
+(match zero_one_valued_p
+ truth_valued_p@0)
+
+/* Transform { 0 or 1 } * { 0 or 1 } into { 0 or 1 } & { 0 or 1 } */
+(simplify
+ (mult zero_one_valued_p@0 zero_one_valued_p@1)
+ (if (INTEGRAL_TYPE_P (type))
+  (bit_and @0 @1)))
+
+/* Transform x * { 0 or 1 } into x & { 0 or -1 }, i.e. an integer
+   multiplication into negate/bitwise and.  Don't do this if the
+   multiplication is cheap, may be implemented by a single shift.  */
+(simplify
+ (mult:c zero_one_valued_p@0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+  && (TREE_CODE (@1) != INTEGER_CST
+  || wi::popcount (wi::to_wide (@1)) > 1))
+  (bit_and (negate @0) @1)))
+
 /* Convert ~ (-A) to A - 1.  */
 (simplify
  (bit_not (convert? (negate @0)))
diff --git a/gcc/testsuite/gcc.dg/pr98865.c b/gcc/testsuite/gcc.dg/pr98865.c
new file mode 100644
index 000..e7599d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98865.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#if __SIZEOF_INT__ == 4
+unsigned int foo(unsigned int a, unsigned int b)
+{
+  return (a >> 31) * b;
+}
+
+int bar(int a, int b)
+{
+  return -(a >> 31) * b;
+}
+
+int baz(int a, int b)
+{
+  int c = a >> 31;
+  int d = -c;
+  return d * b;
+}
+#endif
+
+#if __SIZEOF_LONG_LONG__ == 8
+unsigned long long fool (unsigned long long a, unsigned long long b)
+{
+  return (a >> 63) * b;
+}
+
+long long barl (long long a, long long b)
+{
+  return -(a >> 63) * b;
+}
+
+long long bazl (long long a, long long b)
+{
+  long long c = a >> 63;
+  long long d = -c;
+  return d * b;
+}
+#endif
+
+unsigned int pin (int a, unsigned int b)
+{
+  unsigned int t =  a & 1;
+  return t * b;
+}
+
+unsigned long pinl (long a, unsigned long b)
+{
+  unsigned long t =  a & 1;
+  return t * b;
+}
+
+unsigned long long pinll (long long a, unsigned long long b)
+{
+  unsigned long long t =  a & 1;
+  return t * b;
+}
+
+/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
index 9e68a77..469b232 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
@@ -9,4 +9,5 @@ f (int m1, int m2, int c)
   

RE: [x86 PATCH] Improved V1TI (and V2DI) mode equality/inequality.

2022-04-20 Thread Roger Sayle

Doh! ENOPATCH.

> -Original Message-
> From: Roger Sayle 
> Sent: 20 April 2022 18:50
> To: 'gcc-patches@gcc.gnu.org' 
> Subject: [x86 PATCH] Improved V1TI (and V2DI) mode equality/inequality.
> 
> 
> This patch (for when the compiler returns to stage 1) improves support for
> vector equality and inequality of V1TImode vectors, and V2DImode vectors
with
> sse2 but not sse4.  Consider the three functions below:
> 
> typedef unsigned int uv4si __attribute__ ((__vector_size__ (16))); typedef
> unsigned long long uv2di __attribute__ ((__vector_size__ (16))); typedef
> unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
> 
> uv4si eq_v4si(uv4si x, uv4si y) { return x == y; } uv2di eq_v2di(uv2di x,
uv2di y) {
> return x == y; } uv1ti eq_v1ti(uv1ti x, uv1ti y) { return x == y; }
> 
> These all perform vector comparisons of 128bit SSE2 registers, generating
the
> result as a vector, where ~0 (all 1 bits) represents true and a zero
represents
> false.  eq_v4si is trivially implemented by x86_64's pcmpeqd instruction.
This
> patch improves the other two cases:
> 
> For v2di, gcc -O2 currently generates:
> 
> movq%xmm0, %rdx
> movq%xmm1, %rax
> movdqa  %xmm0, %xmm2
> cmpq%rax, %rdx
> movhlps %xmm2, %xmm3
> movhlps %xmm1, %xmm4
> sete%al
> movq%xmm3, %rdx
> movzbl  %al, %eax
> negq%rax
> movq%rax, %xmm0
> movq%xmm4, %rax
> cmpq%rax, %rdx
> sete%al
> movzbl  %al, %eax
> negq%rax
> movq%rax, %xmm5
> punpcklqdq  %xmm5, %xmm0
> ret
> 
> but with this patch we now generate:
> 
> pcmpeqd %xmm0, %xmm1
> pshufd  $177, %xmm1, %xmm0
> pand%xmm1, %xmm0
> ret
> 
> where the results of a V4SI comparison are shuffled and bit-wise ANDed to
> produce the desired result.  There's no change in the code generated for
"-O2 -
> msse4" where the compiler generates a single "pcmpeqq" insn.
> 
> For V1TI mode, the results are equally dramatic, where the current -O2
output
> looks like:
> 
> movaps  %xmm0, -40(%rsp)
> movq-40(%rsp), %rax
> movq-32(%rsp), %rdx
> movaps  %xmm1, -24(%rsp)
> movq-24(%rsp), %rcx
> movq-16(%rsp), %rsi
> xorq%rcx, %rax
> xorq%rsi, %rdx
> orq %rdx, %rax
> sete%al
> xorl%edx, %edx
> movzbl  %al, %eax
> negq%rax
> adcq$0, %rdx
> movq%rax, %xmm2
> negq%rdx
> movq%rdx, -40(%rsp)
> movhps  -40(%rsp), %xmm2
> movdqa  %xmm2, %xmm0
> ret
> 
> with this patch we now generate:
> 
> pcmpeqd %xmm0, %xmm1
> pshufd  $177, %xmm1, %xmm0
> pand%xmm1, %xmm0
> pshufd  $78, %xmm0, %xmm1
> pand%xmm1, %xmm0
> ret
> 
> performing a V2DI comparison, followed by a shuffle and pand, and with
> -O2 -msse4 take advantages of SSE4.1's pcmpeqq:
> 
> pcmpeqq %xmm0, %xmm1
> pshufd  $78, %xmm1, %xmm0
> pand%xmm1, %xmm0
> ret
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32}, with no
new
> failures.  Is this OK for when we return to stage 1?
> 
> 
> 2022-04-20  Roger Sayle  
> 
> gcc/ChangeLog
>   * config/i386/sse.md (vec_cmpeqv2div2di): Enable for TARGET_SSE2.
>   For !TARGET_SSE4_1, expand as a V4SI vector comparison, followed
>   by a pshufd and pand.
>   (vec_cmpeqv1tiv1ti): New define_expand implementing V1TImode
>   vector equality as a V2DImode vector comparison (see above),
>   followed by a pshufd and pand.
> 
> gcc/testsuite/ChangeLog
>   * gcc.target/i386/sse2-v1ti-veq.c: New test case.
>   * gcc.target/i386/sse2-v1ti-vne.c: New test case.
> 
> 
> Roger
> --

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a852c16..9bc8fb0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4379,13 +4379,57 @@
(match_operator:V2DI 1 ""
  [(match_operand:V2DI 2 "register_operand")
   (match_operand:V2DI 3 "vector_operand")]))]
-  "TARGET_SSE4_1"
+  "TARGET_SSE2"
 {
-  bool ok = ix86_expand_int_vec_cmp (operands);
+  bool ok;
+  if (!TARGET_SSE4_1)
+{
+  rtx ops[4];
+  ops[0] = gen_reg_rtx (V4SImode);
+  ops[2] = force_reg (V4SImode, gen_lowpart (V4SImode, operands[2]));
+  ops[3] = force_reg (V4SImode, gen_lowpart (V4SImode, operands[3]));
+  ops[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]), V4SImode,
+  ops[2], ops[3]);
+  ok = ix86_expand_int_vec_cmp (ops);
+
+  rtx tmp1 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_sse2_pshufd (tmp1, ops[0], GEN_INT (0xb1)));
+
+  rtx tmp2 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_andv4si3 (tmp2, tmp1, 

[PATCH] PR fortran/105310 - ICE when UNION is after the 8th field in a DEC STRUCTURE with -finit-derived -finit-local-zero

2022-04-20 Thread Fritz Reese via Gcc-patches
See the bug report at gcc dot gnu dot org/bugzilla/show_bug.cgi?id=105310 .

This code was originally authored by me and the fix is trivial, so I
intend to commit the attached patch in the next few days if there is
no dissent.


The bug is caused by gfc_conv_union_initializer in
gcc/fortran/trans-expr.cc, which accepts a pointer to a vector of
constructor trees (vec*) as an argument, then
appends one or two field constructors to the vector. The problem is
the use of CONSTRUCTOR_APPEND_ELT(v, ...) within
gfc_conv_union_initializer, which modifies the vector pointer v when a
reallocation of the vector occurs, but the pointer is passed by value.
Therefore, when a vector reallocation occurs, the caller's
(gfc_conv_structure) vector pointer is not updated and subsequently
points to freed memory. Chaos ensues.

The bug only occurs when gfc_conv_union_initializer itself triggers
the reallocation, which is whenever the vector is "full"
(v->m_vecpfx.m_alloc == v->m_vecpfx.m_num). Since the vector defaults
to allocating 8 elements and doubles in size for every reallocation,
the bug only occurs when there are 8, 16, 32, etc... fields with
initializers prior to the union, causing the vector of constructors to
be resized when entering gfc_conv_union_initializer. The
-finit-derived and -finit-local-zero options together ensure each
field has an initializer, triggering the bug.

The patch fixes the bug by passing the vector pointer to
gfc_conv_union_initializer by reference, matching the signature of
vec_safe_push from within the CONSTRUCTOR_APPEND_ELT macro.

--
Fritz Reese
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 06713f24f95..8677a3b0b20 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -9195,7 +9195,7 @@ gfc_trans_structure_assign (tree dest, gfc_expr * expr, bool init, bool coarray)
 }
 
 void
-gfc_conv_union_initializer (vec *v,
+gfc_conv_union_initializer (vec *,
 gfc_component *un, gfc_expr *init)
 {
   gfc_constructor *ctor;


[x86 PATCH] Improved V1TI (and V2DI) mode equality/inequality.

2022-04-20 Thread Roger Sayle


This patch (for when the compiler returns to stage 1) improves support
for vector equality and inequality of V1TImode vectors, and V2DImode
vectors with sse2 but not sse4.  Consider the three functions below:

typedef unsigned int uv4si __attribute__ ((__vector_size__ (16)));
typedef unsigned long long uv2di __attribute__ ((__vector_size__ (16)));
typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));

uv4si eq_v4si(uv4si x, uv4si y) { return x == y; }
uv2di eq_v2di(uv2di x, uv2di y) { return x == y; }
uv1ti eq_v1ti(uv1ti x, uv1ti y) { return x == y; }

These all perform vector comparisons of 128bit SSE2 registers, generating
the result as a vector, where ~0 (all 1 bits) represents true and a zero
represents false.  eq_v4si is trivially implemented by x86_64's pcmpeqd
instruction. This patch improves the other two cases:

For v2di, gcc -O2 currently generates:

movq%xmm0, %rdx
movq%xmm1, %rax
movdqa  %xmm0, %xmm2
cmpq%rax, %rdx
movhlps %xmm2, %xmm3
movhlps %xmm1, %xmm4
sete%al
movq%xmm3, %rdx
movzbl  %al, %eax
negq%rax
movq%rax, %xmm0
movq%xmm4, %rax
cmpq%rax, %rdx
sete%al
movzbl  %al, %eax
negq%rax
movq%rax, %xmm5
punpcklqdq  %xmm5, %xmm0
ret

but with this patch we now generate:

pcmpeqd %xmm0, %xmm1
pshufd  $177, %xmm1, %xmm0
pand%xmm1, %xmm0
ret

where the results of a V4SI comparison are shuffled and bit-wise ANDed
to produce the desired result.  There's no change in the code generated
for "-O2 -msse4" where the compiler generates a single "pcmpeqq" insn.

For V1TI mode, the results are equally dramatic, where the current -O2
output looks like:

movaps  %xmm0, -40(%rsp)
movq-40(%rsp), %rax
movq-32(%rsp), %rdx
movaps  %xmm1, -24(%rsp)
movq-24(%rsp), %rcx
movq-16(%rsp), %rsi
xorq%rcx, %rax
xorq%rsi, %rdx
orq %rdx, %rax
sete%al
xorl%edx, %edx
movzbl  %al, %eax
negq%rax
adcq$0, %rdx
movq%rax, %xmm2
negq%rdx
movq%rdx, -40(%rsp)
movhps  -40(%rsp), %xmm2
movdqa  %xmm2, %xmm0
ret

with this patch we now generate:

pcmpeqd %xmm0, %xmm1
pshufd  $177, %xmm1, %xmm0
pand%xmm1, %xmm0
pshufd  $78, %xmm0, %xmm1
pand%xmm1, %xmm0
ret

performing a V2DI comparison, followed by a shuffle and pand, and with
-O2 -msse4 take advantages of SSE4.1's pcmpeqq:

pcmpeqq %xmm0, %xmm1
pshufd  $78, %xmm1, %xmm0
pand%xmm1, %xmm0
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}, with
no new failures.  Is this OK for when we return to stage 1?


2022-04-20  Roger Sayle  

gcc/ChangeLog
* config/i386/sse.md (vec_cmpeqv2div2di): Enable for TARGET_SSE2.
For !TARGET_SSE4_1, expand as a V4SI vector comparison, followed
by a pshufd and pand.
(vec_cmpeqv1tiv1ti): New define_expand implementing V1TImode
vector equality as a V2DImode vector comparison (see above),
followed by a pshufd and pand.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-veq.c: New test case.
* gcc.target/i386/sse2-v1ti-vne.c: New test case.


Roger
--




[PATCH] opts: Disable -gstatement-frontiers by default [PR103788]

2022-04-20 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in those PRs and I think in others too, there are some long
time unresolved -fcompare-debug issues with DEBUG_BEGIN_STMTs in the FEs and
during gimplification, especially with statement expressions, where we end
up with different code generation depending on whether there are
DEBUG_BEGIN_STMTs (which force STATEMENT_LISTs) or not (in that case
we often have just the single expression from the list).
I've tried to fix that several times, but nothing worked.
Furthermore, Alex mentioned in bugzilla that there are no consumers of the
statement frontiers right now.

This patch turns -gstatement-frontiers off by default because of those
2 reasons, consumers for those can still be added (one can test with
explicit -gstatement-frontiers) and if/once that happens, perhaps somebody
will have some great idea how to resolve those -fcompare-debug issues.

Until then, can we go with this?

Bootstrapped/regtested on powerpc64le-linux, ok for trunk if it also passes
bootstrap/regtest on x86_64-linux/i686-linux?

2022-04-20  Jakub Jelinek  

PR debug/103788
PR middle-end/100733
PR debug/104180
* opts.cc (finish_options): Disable -gstatement-frontiers by default.

* gcc.dg/pr103788.c: New test.
* c-c++-common/ubsan/pr100733.c: New test.
* g++.dg/debug/pr104180.C: New test.

--- gcc/opts.cc.jj  2022-04-06 17:42:03.084190238 +0200
+++ gcc/opts.cc 2022-04-20 13:12:22.282322920 +0200
@@ -1317,12 +1317,16 @@ finish_options (struct gcc_options *opts
debug_info_level = DINFO_LEVEL_NONE;
 }
 
+  /* Don't enable -gstatement-frontiers by default until some consumers
+ actually consume it and until the issues with DEBUG_BEGIN_STMTs
+ affecting code generation e.g. for statement expressions are resolved.
+ See PR103788, PR104180, PR100733.
   if (!OPTION_SET_P (debug_nonbind_markers_p))
 debug_nonbind_markers_p
   = (optimize
 && debug_info_level >= DINFO_LEVEL_NORMAL
 && dwarf_debuginfo_p ()
-&& !(flag_selective_scheduling || flag_selective_scheduling2));
+&& !(flag_selective_scheduling || flag_selective_scheduling2));  */
 
   /* Note -fvar-tracking is enabled automatically with OPT_LEVELS_1_PLUS and
  so we need to drop it if we are called from optimize attribute.  */
--- gcc/testsuite/gcc.dg/pr103788.c.jj  2022-04-20 13:13:47.253141338 +0200
+++ gcc/testsuite/gcc.dg/pr103788.c 2022-04-20 13:13:29.301390970 +0200
@@ -0,0 +1,28 @@
+/* PR debug/103788 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fcompare-debug" } */
+
+int
+bar (void);
+
+int
+foo (int x)
+{
+  int i;
+
+  for (i = 0; i <= __INT_MAX__; ++i)
+x += bar () < (x ? 2 : 1);
+
+  return x;
+}
+
+int
+baz (int x)
+{
+  int i;
+
+  for (i = 0; i <= __INT_MAX__; ++i)
+x += bar () < (
+x ? 2 : 1 );
+  return x;
+}
--- gcc/testsuite/c-c++-common/ubsan/pr100733.c.jj  2022-04-20 
13:18:09.135499667 +0200
+++ gcc/testsuite/c-c++-common/ubsan/pr100733.c 2022-04-20 13:18:43.031028328 
+0200
@@ -0,0 +1,9 @@
+/* PR middle-end/100733 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fsanitize=undefined -fcompare-debug 
-fdisable-tree-phiopt2" } */
+
+int
+foo (int x)
+{
+  return (__builtin_expect (({ x != 0; }) ? 0 : 1, 3) == 0) * -1 << 0;
+}
--- gcc/testsuite/g++.dg/debug/pr104180.C.jj2022-04-20 13:14:51.468248383 
+0200
+++ gcc/testsuite/g++.dg/debug/pr104180.C   2022-04-20 13:15:17.856881425 
+0200
@@ -0,0 +1,14 @@
+/* PR debug/104180 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fcompare-debug" } */
+
+int a[5];
+
+void
+foo (void)
+{
+  unsigned int b;
+
+  for (b = 3; ; b--)
+a[b] = ({ a[b + 1]; });
+}

Jakub



[PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]

2022-04-20 Thread Jakub Jelinek via Gcc-patches
Hi!

So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs
for which that function is invoked need an extra artificial BIND_EXPR
around their body so that we move all variables of the bodies.

The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
or OMP_TARGET and for OpenACC constructs that behave similarly to
OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.

The following patch does that for OpenACC constructs too.
This fixes ICE on the attached testcase.
Unfortunately, it also regresses
FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90   -O  (test for excess 
errors)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
Those emits emit tons of various messages and now there are some extra ones,
the previous as well as new ones are mostly on artificial variables created
by the compiler, so I wonder if we should emit those at all.

Anyway, here it is the patch, appart from those regressions passed
bootstrap/regtested on powerpc64le-linux.

2022-04-20  Jakub Jelinek  

PR fortran/104717
* trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body
in an extra BIND_EXPR.

* gfortran.dg/goacc/pr104717.f90: New test.

--- gcc/fortran/trans-openmp.cc.jj  2022-04-06 09:59:32.729654664 +0200
+++ gcc/fortran/trans-openmp.cc 2022-04-20 12:48:19.773402677 +0200
@@ -,7 +,9 @@ gfc_trans_oacc_construct (gfc_code *code
   gfc_start_block ();
   oacc_clauses = gfc_trans_omp_clauses (, code->ext.omp_clauses,
code->loc, false, true);
+  pushlevel ();
   stmt = gfc_trans_omp_code (code->block->next, true);
+  stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
   stmt = build2_loc (gfc_get_location (>loc), construct_code,
 void_type_node, stmt, oacc_clauses);
   gfc_add_expr_to_block (, stmt);
--- gcc/testsuite/gfortran.dg/goacc/pr104717.f90.jj 2022-04-20 
12:53:54.002748265 +0200
+++ gcc/testsuite/gfortran.dg/goacc/pr104717.f902022-04-20 
12:53:12.811321862 +0200
@@ -0,0 +1,22 @@
+! PR fortran/104717
+! { dg-do compile }
+! { dg-options "-O1 -fopenacc -fstack-arrays" }
+
+program main
+  implicit none (type, external)
+  integer :: j
+  integer, allocatable :: A(:)
+
+  A = [(3*j, j=1, 10)]
+  call foo (A, size(A))
+  deallocate (A)
+contains
+  subroutine foo (array, nn)
+integer :: i, nn
+integer :: array(nn)
+
+!$acc parallel copyout(array)
+array = [(-i, i = 1, nn)]
+!$acc end parallel
+  end subroutine foo
+end

Jakub



[PATCH] emit-rtl: Fix -fcompare-debug bug with label references in debug insns [PR105203]

2022-04-20 Thread Jakub Jelinek via Gcc-patches
Hi!

When we compute LABEL_NUSES from scratch, mark_all_labels doesn't call
mark_jump_label on DEBUG_INSNs:
  if (NONDEBUG_INSN_P (insn))
mark_jump_label (PATTERN (insn), insn, 0);
and so doesn't increment LABEL_NUSES from references in DEBUG_INSNs.
But, when we call emit_copy_of_insn_after e.g. when duplicating some
DEBUG_INSNs, we call it even on those, which then results in LABEL_NUSES
differences and -fcompare-debug failures.

The following patch makes sure we don't call it on DEBUG_INSNs.

Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

2022-04-20  Jakub Jelinek  

PR debug/105203
* emit-rtl.cc (emit_copy_of_insn_after): Don't call mark_jump_label
on DEBUG_INSNs.

* gfortran.dg/g77/pr105203.f: New test.

--- gcc/emit-rtl.cc.jj  2022-02-23 09:17:04.805125253 +0100
+++ gcc/emit-rtl.cc 2022-04-20 10:26:44.972198107 +0200
@@ -6440,7 +6440,8 @@ emit_copy_of_insn_after (rtx_insn *insn,
 }
 
   /* Update LABEL_NUSES.  */
-  mark_jump_label (PATTERN (new_rtx), new_rtx, 0);
+  if (NONDEBUG_INSN_P (insn))
+mark_jump_label (PATTERN (new_rtx), new_rtx, 0);
 
   INSN_LOCATION (new_rtx) = INSN_LOCATION (insn);
 
--- gcc/testsuite/gfortran.dg/g77/pr105203.f.jj 2022-04-20 10:29:44.830696254 
+0200
+++ gcc/testsuite/gfortran.dg/g77/pr105203.f2022-04-20 10:31:13.532463772 
+0200
@@ -0,0 +1,20 @@
+C Test case for PR debug/105203
+C Origin: kmcca...@princeton.edu
+C
+C { dg-do compile }
+C { dg-options "-O2 -fcompare-debug -ftracer -w" }
+C { dg-additional-options "-fPIC" { target fpic } }
+  SUBROUTINE FOO (B)
+
+  10  CALL BAR (A)
+  ASSIGN 20 TO M
+  IF (100.LT.A) GOTO 10
+  GOTO 40
+C
+  20  IF (B.LT.ABS(A)) GOTO 10
+  ASSIGN 30 TO M
+  GOTO 40
+C
+  30  ASSIGN 10 TO M
+  40  GOTO M,(10,20,30)
+  END

Jakub



Re: [PATCH] Add HAVE_DEBUGINFOD_SUPPORT to built-in features.

2022-04-20 Thread Arnaldo Carvalho de Melo via Gcc-patches
Em Wed, Apr 20, 2022 at 01:30:09PM +0200, Martin Liška escreveu:
> The change adds debuginfod to ./perf -vv:
> 
> ...
> debuginfod: [ OFF ]  # HAVE_DEBUGINFOD_SUPPORT
> ...

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Martin Liska 
> ---
>  tools/perf/builtin-version.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
> index 9cd074a3d825..a71f491224da 100644
> --- a/tools/perf/builtin-version.c
> +++ b/tools/perf/builtin-version.c
> @@ -65,6 +65,7 @@ static void library_status(void)
>  #endif
>   STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table);
>   STATUS(HAVE_LIBBFD_SUPPORT, libbfd);
> + STATUS(HAVE_DEBUGINFOD_SUPPORT, debuginfod);
>   STATUS(HAVE_LIBELF_SUPPORT, libelf);
>   STATUS(HAVE_LIBNUMA_SUPPORT, libnuma);
>   STATUS(HAVE_LIBNUMA_SUPPORT, numa_num_possible_cpus);
> -- 
> 2.35.3

-- 

- Arnaldo


Ping: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation.

2022-04-20 Thread Michael Meissner via Gcc-patches
Ping patch.

| Date: Wed, 6 Apr 2022 14:21:26 -0400
| From: Michael Meissner 
| Subject: [PATCH] Add zero_extendditi2.  Improve lxvr*x code generation.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH] Replace UNSPEC with RTL code for extendditi2.

2022-04-20 Thread Michael Meissner via Gcc-patches
Ping patch.  While this could be held for GCC 13, it would be nice to know
whether to keep this patch (which was asked for in one of the previous patches)
or discard it.

| Date: Fri, 1 Apr 2022 12:59:28 -0400
| From: Michael Meissner 
| Subject: [PATCH] Replace UNSPEC with RTL code for extendditi2.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #2: [PATCH, V2] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293.

2022-04-20 Thread Michael Meissner via Gcc-patches
Ping #2 on this patch.

| Date: Tue, 29 Mar 2022 23:25:31 -0400
| From: Michael Meissner 
} Subject: [PATCH, V2] Optimize vec_splats of constant vec_extract for 
V2DI/V2DF, PR target 99293.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]

2022-04-20 Thread Qing Zhao via Gcc-patches


> On Apr 20, 2022, at 5:38 AM, Richard Biener  
> wrote:
> 
> On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On Apr 14, 2022, at 1:53 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao  wrote:
 
 Hi, Richard,
 
 Thanks a lot for taking a look at this issue (and Sorry that I haven’t 
 fixed this one yet, I was distracted by other tasks then just forgot this 
 one….)
 
> On Apr 13, 2022, at 3:41 AM, Richard Biener  
> wrote:
> 
> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches
>  wrote:
>> 
>> 
>> 
>>> On Feb 15, 2022, at 3:58 AM, Jakub Jelinek  wrote:
>>> 
>>> Hi!
>>> 
>>> For IBM double double I've added in PR95450 and PR99648 verification 
>>> that
>>> when we at the tree/GIMPLE or RTL level interpret target bytes as a 
>>> REAL_CST
>>> or CONST_DOUBLE constant, we try to encode it back to target bytes and
>>> verify it is the same.
>>> This is because our real.c support isn't able to represent all valid 
>>> values
>>> of IBM double double which has variable precision.
>>> In PR104522, it has been noted that we have similar problem with the
>>> Intel/Motorola extended XFmode formats, our internal representation 
>>> isn't
>>> able to record pseudo denormals, pseudo infinities, pseudo NaNs and 
>>> unnormal
>>> values.
>>> So, the following patch is an attempt to extend that verification to all
>>> floats.
>>> Unfortunately, it wasn't that straightforward, because the
>>> __builtin_clear_padding code exactly for the XFmode long doubles needs 
>>> to
>>> discover what bits are padding and does that by interpreting memory of
>>> all 1s.  That is actually a valid supported value, a qNaN with negative
>>> sign with all mantissa bits set, but the verification includes also the
>>> padding bits (exactly what __builtin_clear_padding wants to figure out)
>>> and so fails the comparison check and so we ICE.
>>> The patch fixes that case by moving that verification from
>>> native_interpret_real to its caller, so that clear_padding_type can
>>> call native_interpret_real and avoid that extra check.
>>> 
>>> With this, the only thing that regresses in the testsuite is
>>> +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times 
>>> long\\t-16843010 5
>>> because it decides to use a pattern that has non-zero bits in the 
>>> padding
>>> bits of the long double, so the simplify-rtx.cc change prevents folding
>>> a SUBREG into a constant.  We emit (the testcase is -O0 but we emit 
>>> worse
>>> code at all opt levels) something like:
>>> movabsq $-72340172838076674, %rax
>>> movabsq $-72340172838076674, %rdx
>>> movq%rax, -48(%rbp)
>>> movq%rdx, -40(%rbp)
>>> fldt-48(%rbp)
>>> fstpt   -32(%rbp)
>>> instead of
>>> fldt.LC2(%rip)
>>> fstpt   -32(%rbp)
>>> ...
>>> .LC2:
>>> .long   -16843010
>>> .long   -16843010
>>> .long   65278
>>> .long   0
>>> Note, neither of those sequences actually stores the padding bits, fstpt
>>> simply doesn't touch them.
>>> For vars with clear_padding_real_needs_padding_p types that are 
>>> allocated
>>> to memory at expansion time, I'd say much better would be to do the 
>>> stores
>>> using integral modes rather than XFmode, so do that:
>>> movabsq $-72340172838076674, %rax
>>>movq%rax, -32(%rbp)
>>>movq%rax, -24(%rbp)
>>> directly.  That is the only way to ensure the padding bits are 
>>> initialized
>>> (or expand __builtin_clear_padding, but then you initialize separately 
>>> the
>>> value bits and padding bits).
>>> 
>>> Bootstrapped/regtested on x86_64-linux and i686-linux, though as 
>>> mentioned
>>> above, the gcc.target/i386/auto-init-4.c case is unresolved.
>> 
>> Thanks, I will try to fix this testing case in a later patch.
> 
> I've looked at this FAIL now and really wonder whether "pattern init" as
> implemented makes any sense for non-integral types.
> We end up with
> initializing a register (SSA name) with
> 
> VIEW_CONVERT_EXPR(0xfefefefefefefefefefefefefefefefe)
> 
> as we go building a TImode constant (we verified we have a TImode SET!)
> but then
> 
>/* Pun the LHS to make sure its type has constant size
>   unless it is an SSA name where that's already known.  */
>if (TREE_CODE (lhs) != SSA_NAME)
>  lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
>else
>  init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
> ...
>expand_assignment (lhs, init, false);
> 
> and generally registers do not have any padding.  This 

Ping: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-04-20 Thread Michael Meissner via Gcc-patches
Ping patch.

| Date: Tue, 12 Apr 2022 21:14:55 -0400
| From: Michael Meissner 
| Subject: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR 
target/102059
| Message-ID: 

I feel this is an important patch.  Please look at it and approve the patch or
give me feedback on how to change it.  Note, I will be in today (April 20th)
and tomorrow (April 21st), but I will be away from a computer on April 22-25
(Friday through Monday).

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH][v3] rtl-optimization/105231 - distribute_notes and REG_EH_REGION

2022-04-20 Thread Segher Boessenkool
Hi!

This looks great :-)

On Wed, Apr 20, 2022 at 03:52:33PM +0200, Richard Biener wrote:
> The following mitigates a problem in combine distribute_notes which
> places an original REG_EH_REGION based on only may_trap_p which is
> good to test whether a non-call insn can possibly throw but not if
> actually it does or we care.  That's something we decided at RTL
> expansion time where we possibly still know the insn evaluates
> to a constant.
> 
> In fact, the REG_EH_REGION note with lp > 0 can only come from the
> original i3 and an assert is added to that effect.  That means we only
> need to retain the note on i3 or, if that cannot trap, drop it but we
> should never move it to i2.
> 
> For REG_EH_REGION corresponding to must-not-throw regions or
> nothrow marking try_combine gets new code ensuring we can merge
> and distribute notes which means placing must-not-throw notes
> on all result insns, and dropping nothrow notes or preserve
> them on i3 for calls.

>   * combine.cc (distribute_notes): Assert that a REG_EH_REGION
>   with landing pad > 0 is from i3 and only keep it there or drop
>   it if the insn can not trap.  Throw away REG_EH_REGION with
>   landing pad = 0 or INT_MIN if it does not originate from a
>   call i3.  Distribute must-not-throw REG_EH_REGION to all
>   resulting instructions.
>   (try_combine): Ensure that we can merge REG_EH_REGION notes.

> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -2951,6 +2951,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>return 0;
>  }
>  
> +  /* When i3 transfers to an EH handler we cannot combine if any of the
> + sources are within a must-not-throw region.  Else we can throw away
> + any nothrow, pick a random must-not-throw region or preserve the EH
> + transfer on i3.  Since we want to preserve nothrow notes on calls
> + we have to avoid combining from must-not-throw stmts there as well.
> + This has to be kept in sync with distribute_note.  */
> +  if (rtx i3_eh = find_reg_note (i3, REG_EH_REGION, NULL_RTX))
> +{
> +  int i3_lp_nr = INTVAL (XEXP (i3_eh, 0));
> +  if (i3_lp_nr > 0
> +   || ((i3_lp_nr == 0 || i3_lp_nr == INT_MIN) && CALL_P (i3)))
> + {
> +   rtx eh;
> +   int eh_lp;
> +   if (((eh = find_reg_note (i2, REG_EH_REGION, NULL_RTX))
> +&& (eh_lp = INTVAL (XEXP (eh, 0))) < 0
> +&& eh_lp != INT_MIN)
> +   || (i2
> +   && (eh = find_reg_note (i2, REG_EH_REGION, NULL_RTX))
> +   && (eh_lp = INTVAL (XEXP (eh, 0))) < 0
> +   && eh_lp != INT_MIN)
> +   || (i1
> +   && (eh = find_reg_note (i1, REG_EH_REGION, NULL_RTX))
> +   && (eh_lp = INTVAL (XEXP (eh, 0))) < 0
> +   && eh_lp != INT_MIN)
> +   || (i0
> +   && (eh = find_reg_note (i0, REG_EH_REGION, NULL_RTX))
> +   && (eh_lp = INTVAL (XEXP (eh, 0))) < 0
> +   && eh_lp != INT_MIN))
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "Can't combine insn in must-not-throw "
> +  "EH region into i3 which can throws\n");
> +   undo_all ();
> +   return 0;
> + }
> + }
> +}

The assignments in the conditionals make this hard to read, and harder
to change, btw.  A utility function wouldn't hurt?  The problem of
course would be thinking of a good name for it :-)

>   case REG_EH_REGION:
> -   /* These notes must remain with the call or trapping instruction.  */
> -   if (CALL_P (i3))
> - place = i3;
> -   else if (i2 && CALL_P (i2))
> - place = i2;
> -   else
> - {
> -   gcc_assert (cfun->can_throw_non_call_exceptions);
> -   if (may_trap_p (i3))
> - place = i3;
> -   else if (i2 && may_trap_p (i2))
> - place = i2;
> -   /* ??? Otherwise assume we've combined things such that we
> -  can now prove that the instructions can't trap.  Drop the
> -  note in this case.  */
> - }
> -   break;
> +   {
> + /* This handling needs to be kept in sync with the
> +prerequesite checking in try_combine.  */

(prerequisite)

> + int lp_nr = INTVAL (XEXP (note, 0));
> + /* A REG_EH_REGION note transfering control can only ever come
> +from i3 and it has to stay there.  */
> + if (lp_nr > 0)
> +   {
> + gcc_assert (from_insn == i3);
> + if (CALL_P (i3))
> +   place = i3;
> + else
> +   {
> + gcc_assert (cfun->can_throw_non_call_exceptions);
> + /* If i3 can still trap preserve the note, otherwise we've
> +combined things such that we can now prove that the
> +instructions can't trap.  Drop the note in 

[PATCH][v3] rtl-optimization/105231 - distribute_notes and REG_EH_REGION

2022-04-20 Thread Richard Biener via Gcc-patches
The following mitigates a problem in combine distribute_notes which
places an original REG_EH_REGION based on only may_trap_p which is
good to test whether a non-call insn can possibly throw but not if
actually it does or we care.  That's something we decided at RTL
expansion time where we possibly still know the insn evaluates
to a constant.

In fact, the REG_EH_REGION note with lp > 0 can only come from the
original i3 and an assert is added to that effect.  That means we only
need to retain the note on i3 or, if that cannot trap, drop it but we
should never move it to i2.

For REG_EH_REGION corresponding to must-not-throw regions or
nothrow marking try_combine gets new code ensuring we can merge
and distribute notes which means placing must-not-throw notes
on all result insns, and dropping nothrow notes or preserve
them on i3 for calls.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

WDYT?

Thanks,
Richard.

2022-04-19  Richard Biener  

PR rtl-optimization/105231
* combine.cc (distribute_notes): Assert that a REG_EH_REGION
with landing pad > 0 is from i3 and only keep it there or drop
it if the insn can not trap.  Throw away REG_EH_REGION with
landing pad = 0 or INT_MIN if it does not originate from a
call i3.  Distribute must-not-throw REG_EH_REGION to all
resulting instructions.
(try_combine): Ensure that we can merge REG_EH_REGION notes.

* gcc.dg/torture/pr105231.c: New testcase.
---
 gcc/combine.cc  | 106 
 gcc/testsuite/gcc.dg/torture/pr105231.c |  15 
 2 files changed, 104 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr105231.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 53dcac92abc..ba234e3af5f 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -2951,6 +2951,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   return 0;
 }
 
+  /* When i3 transfers to an EH handler we cannot combine if any of the
+ sources are within a must-not-throw region.  Else we can throw away
+ any nothrow, pick a random must-not-throw region or preserve the EH
+ transfer on i3.  Since we want to preserve nothrow notes on calls
+ we have to avoid combining from must-not-throw stmts there as well.
+ This has to be kept in sync with distribute_note.  */
+  if (rtx i3_eh = find_reg_note (i3, REG_EH_REGION, NULL_RTX))
+{
+  int i3_lp_nr = INTVAL (XEXP (i3_eh, 0));
+  if (i3_lp_nr > 0
+ || ((i3_lp_nr == 0 || i3_lp_nr == INT_MIN) && CALL_P (i3)))
+   {
+ rtx eh;
+ int eh_lp;
+ if (((eh = find_reg_note (i2, REG_EH_REGION, NULL_RTX))
+  && (eh_lp = INTVAL (XEXP (eh, 0))) < 0
+  && eh_lp != INT_MIN)
+ || (i2
+ && (eh = find_reg_note (i2, REG_EH_REGION, NULL_RTX))
+ && (eh_lp = INTVAL (XEXP (eh, 0))) < 0
+ && eh_lp != INT_MIN)
+ || (i1
+ && (eh = find_reg_note (i1, REG_EH_REGION, NULL_RTX))
+ && (eh_lp = INTVAL (XEXP (eh, 0))) < 0
+ && eh_lp != INT_MIN)
+ || (i0
+ && (eh = find_reg_note (i0, REG_EH_REGION, NULL_RTX))
+ && (eh_lp = INTVAL (XEXP (eh, 0))) < 0
+ && eh_lp != INT_MIN))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Can't combine insn in must-not-throw "
+"EH region into i3 which can throws\n");
+ undo_all ();
+ return 0;
+   }
+   }
+}
+
   /* Record whether i2 and i3 are trivial moves.  */
   i2_was_move = is_just_move (i2);
   i3_was_move = is_just_move (i3);
@@ -14175,23 +14214,56 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
  break;
 
case REG_EH_REGION:
- /* These notes must remain with the call or trapping instruction.  */
- if (CALL_P (i3))
-   place = i3;
- else if (i2 && CALL_P (i2))
-   place = i2;
- else
-   {
- gcc_assert (cfun->can_throw_non_call_exceptions);
- if (may_trap_p (i3))
-   place = i3;
- else if (i2 && may_trap_p (i2))
-   place = i2;
- /* ??? Otherwise assume we've combined things such that we
-can now prove that the instructions can't trap.  Drop the
-note in this case.  */
-   }
- break;
+ {
+   /* This handling needs to be kept in sync with the
+  prerequesite checking in try_combine.  */
+   int lp_nr = INTVAL (XEXP (note, 0));
+   /* A REG_EH_REGION note transfering control can only ever come
+  from i3 and it has to stay there.  */
+   if (lp_nr > 0)
+ {
+   gcc_assert (from_insn 

[PATCH] openmp: Handle unified address memory.

2022-04-20 Thread Andrew Stubbs
This patch adds enough support for "requires unified_address" to make 
the sollve_vv testcases pass. It implements unified_address as a synonym 
of unified_shared_memory, which is both valid and the only way I know of 
to unify addresses with Cuda (could be wrong).


This patch should be applied on to of the previous patch set for USM.

OK for stage 1?

I'll apply it to OG11 shortly.

Andrewopenmp: unified_address support

This makes "requires unified_address" work by making it eqivalent to
"requires unified_shared_memory".  This is more than is strictly necessary,
but should be standard compliant.

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.

gcc/cp/ChangeLog:

* parser.c (cp_parser_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.

gcc/ChangeLog:

* omp-low.c: Do USM transformations for "unified_address".

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/usm-4.c: New test.
* gfortran.dg/gomp/usm-4.f90: New test.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 12408770193..9a3d0cb8cea 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -22531,18 +22531,27 @@ c_parser_omp_requires (c_parser *parser)
  enum omp_requires this_req = (enum omp_requires) 0;
 
  if (!strcmp (p, "unified_address"))
-   this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+   {
+ this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+   error_at (cloc,
+ "unified_address is incompatible with the "
+ "selected -foffload-memory option");
+ flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+   }
  else if (!strcmp (p, "unified_shared_memory"))
- {
-   this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
-
-   if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
-   && flag_offload_memory != OFFLOAD_MEMORY_NONE)
- error_at (cloc,
-   "unified_shared_memory is incompatible with the "
-   "selected -foffload-memory option");
-   flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
- }
+   {
+ this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+   error_at (cloc,
+ "unified_shared_memory is incompatible with the "
+ "selected -foffload-memory option");
+ flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+   }
  else if (!strcmp (p, "dynamic_allocators"))
this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fd9f62f4543..3a9ea272f10 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -46406,18 +46406,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token 
*pragma_tok)
  enum omp_requires this_req = (enum omp_requires) 0;
 
  if (!strcmp (p, "unified_address"))
-   this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+   {
+ this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+   error_at (cloc,
+ "unified_address is incompatible with the "
+ "selected -foffload-memory option");
+ flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+   }
  else if (!strcmp (p, "unified_shared_memory"))
- {
-   this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
-
-   if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
-   && flag_offload_memory != OFFLOAD_MEMORY_NONE)
- error_at (cloc,
-   "unified_shared_memory is incompatible with the "
-   "selected -foffload-memory option");
-   flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
- }
+   {
+ this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+   error_at (cloc,
+ "unified_shared_memory is incompatible with the "
+ "selected -foffload-memory option");
+ flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+   }
  else if (!strcmp (p, "dynamic_allocators"))

[Patch] OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg

2022-04-20 Thread Tobias Burnus

For
  omp parallel shared(array_desc_var)
the shared-variable is passed to the generated function as
argument - and replaced by a DECL_VALUE_EXPR inside the parallel region.

If inside the parallel region, a

  omp target data has_device_addr(array_descr_var)

is used, the latter generates a
  omp_arr->array_descr_var = _descr_var.data;
...
  tmp_desc = array_descr_var
  tmp_desc.data = omp_o->array_descr_var

that is: 'tmp_desc' gets assigned the original descriptor
and only the data components is updated.


However, if that's inside the parallel region, not 'array_descr_var'
has to be used – but the value expression ('omp_i->array_descr_var').

Fixed by searching the variable used in use_device_{addr,ptr} in the
outer OpenMP context – and then checking for a DECL_VALUE_EXPR.

OK?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg

For array-descriptor vars, the descriptor is assigned to a temporary. However,
this failed when the clause's argument was in turn in a data-sharing clause
as the outer context's VALUE_EXPR wasn't used.

gcc/ChangeLog:

	* omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list
	item that is in an outer data-sharing clause.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/use_device_addr-5.f90: New test.

 gcc/omp-low.cc |  22 ++--
 .../libgomp.fortran/use_device_addr-5.f90  | 143 +
 2 files changed, 156 insertions(+), 9 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index bf5779b6543..6e387fd9a61 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -13656,26 +13656,30 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		new_var = lookup_decl (var, ctx);
 		new_var = DECL_VALUE_EXPR (new_var);
 		tree v = new_var;
+		tree v2 = var;
+		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_PTR
+		|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_USE_DEVICE_ADDR)
+		  {
+		v2 = maybe_lookup_decl_in_outer_ctx (var, ctx);
+		if (DECL_HAS_VALUE_EXPR_P (v2))
+		  v2 = DECL_VALUE_EXPR (v2);
+		  }
 
 		if (is_ref)
 		  {
-		var = build_fold_indirect_ref (var);
-		gimplify_expr (, _body, NULL, is_gimple_val,
-   fb_rvalue);
-		v = create_tmp_var_raw (TREE_TYPE (var), get_name (var));
+		v2 = build_fold_indirect_ref (v2);
+		v = create_tmp_var_raw (TREE_TYPE (v2), get_name (var));
 		gimple_add_tmp_var (v);
 		TREE_ADDRESSABLE (v) = 1;
-		gimple_seq_add_stmt (_body,
-	 gimple_build_assign (v, var));
+		gimplify_assign (v, v2, _body);
 		tree rhs = build_fold_addr_expr (v);
 		gimple_seq_add_stmt (_body,
 	 gimple_build_assign (new_var, rhs));
 		  }
 		else
-		  gimple_seq_add_stmt (_body,
-   gimple_build_assign (new_var, var));
+		  gimplify_assign (new_var, v2, _body);
 
-		tree v2 = lang_hooks.decls.omp_array_data (unshare_expr (v), false);
+		v2 = lang_hooks.decls.omp_array_data (unshare_expr (v), false);
 		gcc_assert (v2);
 		gimplify_expr (, _body, NULL, is_gimple_val, fb_rvalue);
 		gimple_seq_add_stmt (_body,
diff --git a/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90 b/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90
new file mode 100644
index 000..1def70a1bc0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/use_device_addr-5.f90
@@ -0,0 +1,143 @@
+program main
+  use omp_lib
+  implicit none
+  integer, allocatable :: aaa(:,:,:)
+  integer :: i
+
+  allocate (aaa(-4:10,-3:8,2))
+  aaa(:,:,:) = reshape ([(i, i = 1, size(aaa))], shape(aaa))
+
+  do i = 0, omp_get_num_devices()
+!$omp target data map(to: aaa)
+  call test_addr (aaa, i)
+  call test_ptr (aaa, i)
+!$omp end target data
+  end do
+  deallocate (aaa)
+
+contains
+
+  subroutine test_addr (, dev)
+use iso_c_binding
+integer, target, allocatable :: (:,:,:), (:,:,:)
+integer, value :: dev
+integer :: i
+type(c_ptr) :: ptr
+logical :: is_shared
+
+is_shared = .false.
+!$omp target device(dev) map(to: is_shared)
+  is_shared = .true.
+!$omp end target
+
+allocate ((-4:10,-3:8,2))
+(:,:,:) = reshape ([(-i, i = 1, size())], shape())
+!$omp target enter data map(to: ) device(dev)
+if (any (lbound () /= [-4, -3, 1])) error stop 1
+if (any (shape () /= [15, 12, 2])) error stop 2
+if (any (lbound () /= [-4, -3, 1])) error stop 3
+if (any (shape () /= [15, 12, 2])) error stop 4
+if (any ( /= -)) error stop 5
+if (any ( /= reshape ([(i, i = 1, size())], shape( &
+  error stop 6
+
+!$omp parallel do shared(, )
+do i = 1,1
+  if (any (lbound () /= [-4, -3, 1])) error stop 5
+ 

Re: [PATCH] libstdc++: Use LTLIBICONV when linking libstdc++.so [PR93602]

2022-04-20 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk now.

On Wed, 13 Apr 2022 at 15:24, Jonathan Wakely via Libstdc++
 wrote:
>
> Tested x86_64-linux, without libiconv installed, with libiconv installed,
> with libiconv installed but using an in-tree libiconv, with libiconv.a
> installed and using --with-libiconv-type=static, and with libiconv.so
> installed and using --without-libiconv-prefix (which still fails).
>
> I'm not entirely happy about the fact that libtool's LTLIBICONV adds an
> rpath to libstdc++.so, but that can be avoided (as documented by this
> patch) and I don't really see a better solution. Another option would be
> to use -l:libiconv.a if configure defines LTLIBICONV to non-empty and
> the linker supports it, which would *force* the use of a static lib. But
> that seems unnecessarily hostile; not all users will dislike the rpath
> solution. The proposed patch makes it Just Work™ for users who (for
> whatever reason) have installed libiconv, while also allowing them to do
> something more sensible if they care enough to do so.
>
> Thoughts?
>
> -- >8 --
>
> This fixes missing libiconv symbols when libstdc++ is built on a system
> that has libiconv installed. If the libiconv headers are found then
> libstdc++ depends on libiconv_open etc instead of libc's iconv_open. But
> without this fix libstdc++ is not linked to the libiconv library that
> provides the definitions of those symbols.
>
> As discussed in PR 93602 this changed means that libstdc++.so.6 might
> have an rpath pointing to the location of the libiconv.so library. If
> that is not desired, then GCC must be configured to link to a static
> libiconv.a instead, using either --with-libiconv-type=static or an
> in-tree build of libiconv.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/93602
> * doc/xml/manual/prerequisites.xml: Document libiconv
> workarounds.
> * doc/html/manual/setup.html: Regenerate.
> * src/Makefile.am (CXXLINK): Add $(LTLIBICONV).
> * src/Makefile.in: Regenerate.
> ---
> diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml 
> b/libstdc++-v3/doc/xml/manual/prerequisites.xml
> index 22e90a7e79d..8799487c821 100644
> --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml
> +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml
> @@ -48,6 +48,56 @@
>
> linux
>
> +   
> +   
> + The 'gnu' locale model makes use of iconv
> + for character set conversions. The relevant functions are provided
> + by Glibc and so are always available, however they can also be
> + provided by the separate GNU libiconv library. If GNU libiconv is
> + found when GCC is built (e.g., because its headers are installed
> + in /usr/local/include)
> + then the libstdc++.so.6 library will have a
> + run-time dependency on libiconv.so.2.
> + If you do not want that run-time dependency then you should do
> + one of the following:
> +   
> +   
> + 
> +   
> + Uninstall the libiconv headers before building GCC.
> + Glibc already provides iconv so you should
> + not need libiconv anyway.
> +   
> + 
> + 
> +   
> +linkend="https://www.gnu.org/software/libiconv/#downloading;>
> +   Download the libiconv sources and extract them into the
> +   top level of the GCC source tree, e.g.,
> +   
> +
> +wget https://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.16.tar.gz
> +tar xf libiconv-1.16.tar.gz
> +ln -s libiconv-1.16 libiconv
> +
> +   
> + This will build libiconv as part of building GCC and link to
> + it statically, so there is no libiconv.so.2
> + dependency.
> +   
> + 
> + 
> +   
> + Configure GCC with --with-libiconv-type=static.
> + This requires the static libiconv.a 
> library,
> + which is not installed by default. You might need to reinstall
> + libiconv using the --enable-static configure
> + option to get the static library.
> +   
> + 
> +   
> +   
> +
> 
> 
>   If GCC 3.1.0 or later on is being used on GNU/Linux, an attempt
> diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
> index 18f57632c3d..9c3f4aca655 100644
> --- a/libstdc++-v3/src/Makefile.am
> +++ b/libstdc++-v3/src/Makefile.am
> @@ -278,7 +278,9 @@ CXXLINK = \
> $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
> --mode=link $(CXX) \
> $(VTV_CXXLINKFLAGS) \
> -   $(OPT_LDFLAGS) $(SECTION_LDFLAGS) $(AM_CXXFLAGS) $(LTLDFLAGS) -o $@
> +   $(OPT_LDFLAGS) $(SECTION_LDFLAGS) $(AM_CXXFLAGS) \
> +   $(LTLDFLAGS) $(LTLIBICONV) \
> +   -o $@
>
>  # Symbol versioning for shared libraries.
>  if ENABLE_SYMVERS
>



[committed] libstdc++: Fix macro checked by test

2022-04-20 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.

-- >8 --

The macro being tested here is wrong, but just happens to have the same
value as the one supposed to be tests.

libstdc++-v3/ChangeLog:

* 
testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc:
Check correct feature test macro.
---
 .../basic_string_view/operations/copy/char/constexpr.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc
index 28f8ae845c2..2705098fb76 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc
@@ -22,7 +22,7 @@
 
 #ifndef __cpp_lib_constexpr_string_view
 # error "Feature test macro for constexpr copy is missing in "
-#elif __cpp_lib_constexpr_iterator < 201811L
+#elif __cpp_lib_constexpr_string_view < 201811L
 # error "Feature test macro for constexpr copy has wrong value in 
"
 #endif
 
-- 
2.34.1



Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Jan Hubicka via Gcc-patches
> On Wed, Apr 20, 2022 at 01:47:43PM +0200, Martin Jambor wrote:
> > Hi,
> > 
> > On Wed, Apr 20 2022, Jan Hubicka via Gcc-patches wrote:
> > >> On Wed, 20 Apr 2022, Jakub Jelinek wrote:
> > 
> > [...]
> > 
> > >> >  
> > >> >if ((flag_openacc || flag_openmp)
> > >> >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES 
> > >> > (decl)))
> > >> > --- gcc/cgraphclones.cc.jj 2022-01-18 11:58:58.948991114 +0100
> > >> > +++ gcc/cgraphclones.cc2022-04-19 13:38:43.594262397 +0200
> > >> > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl
> > >> >new_node->versionable = versionable;
> > >> >new_node->can_change_signature = can_change_signature;
> > >> >new_node->redefined_extern_inline = redefined_extern_inline;
> > >> > +  new_node->semantic_interposition = semantic_interposition;
> > >
> > > This indeed makes sense to me. 
> > 
> > but that means theat create_clone (and therefore also
> > create_virtual_clone) now creates nodes which are both local and
> > potentially interposable... is that what we want?  (Does the local flag
> > make the interposition flag meaningless in that case?)
> 
> Usually set_new_clone_decl_and_node_flags is called afterwards and that
> makes both the decl local and clears node->semantic_interposition.
> The above is just for the case when that isn't done.

We also simply ignore semantic_interposition flag on everything local.
But indeed perhaps for consistency purposes we should force it to false
whenever externally_visible is false.  But more sanity checkers only in
stage1 :)

Honza
> 
>   Jakub
> 


Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 20, 2022 at 01:47:43PM +0200, Martin Jambor wrote:
> Hi,
> 
> On Wed, Apr 20 2022, Jan Hubicka via Gcc-patches wrote:
> >> On Wed, 20 Apr 2022, Jakub Jelinek wrote:
> 
> [...]
> 
> >> >  
> >> >if ((flag_openacc || flag_openmp)
> >> >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES 
> >> > (decl)))
> >> > --- gcc/cgraphclones.cc.jj   2022-01-18 11:58:58.948991114 +0100
> >> > +++ gcc/cgraphclones.cc  2022-04-19 13:38:43.594262397 +0200
> >> > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl
> >> >new_node->versionable = versionable;
> >> >new_node->can_change_signature = can_change_signature;
> >> >new_node->redefined_extern_inline = redefined_extern_inline;
> >> > +  new_node->semantic_interposition = semantic_interposition;
> >
> > This indeed makes sense to me. 
> 
> but that means theat create_clone (and therefore also
> create_virtual_clone) now creates nodes which are both local and
> potentially interposable... is that what we want?  (Does the local flag
> make the interposition flag meaningless in that case?)

Usually set_new_clone_decl_and_node_flags is called afterwards and that
makes both the decl local and clears node->semantic_interposition.
The above is just for the case when that isn't done.

Jakub



Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Martin Jambor
Hi,

On Wed, Apr 20 2022, Jan Hubicka via Gcc-patches wrote:
>> On Wed, 20 Apr 2022, Jakub Jelinek wrote:

[...]

>> >  
>> >if ((flag_openacc || flag_openmp)
>> >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
>> > --- gcc/cgraphclones.cc.jj 2022-01-18 11:58:58.948991114 +0100
>> > +++ gcc/cgraphclones.cc2022-04-19 13:38:43.594262397 +0200
>> > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl
>> >new_node->versionable = versionable;
>> >new_node->can_change_signature = can_change_signature;
>> >new_node->redefined_extern_inline = redefined_extern_inline;
>> > +  new_node->semantic_interposition = semantic_interposition;
>
> This indeed makes sense to me. 

but that means theat create_clone (and therefore also
create_virtual_clone) now creates nodes which are both local and
potentially interposable... is that what we want?  (Does the local flag
make the interposition flag meaningless in that case?)

Martin


Re: [PATCH] Add HAVE_DEBUGINFOD_SUPPORT to built-in features.

2022-04-20 Thread Martin Liška

On 4/20/22 13:30, Martin Liška wrote:

The change adds debuginfod to ./perf -vv:

...
debuginfod: [ OFF ]  # HAVE_DEBUGINFOD_SUPPORT
...

Signed-off-by: Martin Liska 
---
  tools/perf/builtin-version.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 9cd074a3d825..a71f491224da 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -65,6 +65,7 @@ static void library_status(void)
  #endif
  STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table);
  STATUS(HAVE_LIBBFD_SUPPORT, libbfd);
+    STATUS(HAVE_DEBUGINFOD_SUPPORT, debuginfod);
  STATUS(HAVE_LIBELF_SUPPORT, libelf);
  STATUS(HAVE_LIBNUMA_SUPPORT, libnuma);
  STATUS(HAVE_LIBNUMA_SUPPORT, numa_num_possible_cpus);


Please ignore the thread, it belongs to perf ML ;)

Martin


[PATCH] Add HAVE_DEBUGINFOD_SUPPORT to built-in features.

2022-04-20 Thread Martin Liška

The change adds debuginfod to ./perf -vv:

...
debuginfod: [ OFF ]  # HAVE_DEBUGINFOD_SUPPORT
...

Signed-off-by: Martin Liska 
---
 tools/perf/builtin-version.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 9cd074a3d825..a71f491224da 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -65,6 +65,7 @@ static void library_status(void)
 #endif
STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table);
STATUS(HAVE_LIBBFD_SUPPORT, libbfd);
+   STATUS(HAVE_DEBUGINFOD_SUPPORT, debuginfod);
STATUS(HAVE_LIBELF_SUPPORT, libelf);
STATUS(HAVE_LIBNUMA_SUPPORT, libnuma);
STATUS(HAVE_LIBNUMA_SUPPORT, numa_num_possible_cpus);
--
2.35.3



Re: [PATCH] fold, simplify-rtx: Punt on non-representable floating point constants [PR104522]

2022-04-20 Thread Richard Biener via Gcc-patches
On Tue, Apr 19, 2022 at 11:36 PM Qing Zhao  wrote:
>
>
>
> > On Apr 14, 2022, at 1:53 AM, Richard Biener  
> > wrote:
> >
> > On Wed, Apr 13, 2022 at 5:22 PM Qing Zhao  wrote:
> >>
> >> Hi, Richard,
> >>
> >> Thanks a lot for taking a look at this issue (and Sorry that I haven’t 
> >> fixed this one yet, I was distracted by other tasks then just forgot this 
> >> one….)
> >>
> >>> On Apr 13, 2022, at 3:41 AM, Richard Biener  
> >>> wrote:
> >>>
> >>> On Tue, Feb 15, 2022 at 5:31 PM Qing Zhao via Gcc-patches
> >>>  wrote:
> 
> 
> 
> > On Feb 15, 2022, at 3:58 AM, Jakub Jelinek  wrote:
> >
> > Hi!
> >
> > For IBM double double I've added in PR95450 and PR99648 verification 
> > that
> > when we at the tree/GIMPLE or RTL level interpret target bytes as a 
> > REAL_CST
> > or CONST_DOUBLE constant, we try to encode it back to target bytes and
> > verify it is the same.
> > This is because our real.c support isn't able to represent all valid 
> > values
> > of IBM double double which has variable precision.
> > In PR104522, it has been noted that we have similar problem with the
> > Intel/Motorola extended XFmode formats, our internal representation 
> > isn't
> > able to record pseudo denormals, pseudo infinities, pseudo NaNs and 
> > unnormal
> > values.
> > So, the following patch is an attempt to extend that verification to all
> > floats.
> > Unfortunately, it wasn't that straightforward, because the
> > __builtin_clear_padding code exactly for the XFmode long doubles needs 
> > to
> > discover what bits are padding and does that by interpreting memory of
> > all 1s.  That is actually a valid supported value, a qNaN with negative
> > sign with all mantissa bits set, but the verification includes also the
> > padding bits (exactly what __builtin_clear_padding wants to figure out)
> > and so fails the comparison check and so we ICE.
> > The patch fixes that case by moving that verification from
> > native_interpret_real to its caller, so that clear_padding_type can
> > call native_interpret_real and avoid that extra check.
> >
> > With this, the only thing that regresses in the testsuite is
> > +FAIL: gcc.target/i386/auto-init-4.c scan-assembler-times 
> > long\\t-16843010 5
> > because it decides to use a pattern that has non-zero bits in the 
> > padding
> > bits of the long double, so the simplify-rtx.cc change prevents folding
> > a SUBREG into a constant.  We emit (the testcase is -O0 but we emit 
> > worse
> > code at all opt levels) something like:
> >  movabsq $-72340172838076674, %rax
> >  movabsq $-72340172838076674, %rdx
> >  movq%rax, -48(%rbp)
> >  movq%rdx, -40(%rbp)
> >  fldt-48(%rbp)
> >  fstpt   -32(%rbp)
> > instead of
> >  fldt.LC2(%rip)
> >  fstpt   -32(%rbp)
> > ...
> > .LC2:
> >  .long   -16843010
> >  .long   -16843010
> >  .long   65278
> >  .long   0
> > Note, neither of those sequences actually stores the padding bits, fstpt
> > simply doesn't touch them.
> > For vars with clear_padding_real_needs_padding_p types that are 
> > allocated
> > to memory at expansion time, I'd say much better would be to do the 
> > stores
> > using integral modes rather than XFmode, so do that:
> >  movabsq $-72340172838076674, %rax
> > movq%rax, -32(%rbp)
> > movq%rax, -24(%rbp)
> > directly.  That is the only way to ensure the padding bits are 
> > initialized
> > (or expand __builtin_clear_padding, but then you initialize separately 
> > the
> > value bits and padding bits).
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, though as 
> > mentioned
> > above, the gcc.target/i386/auto-init-4.c case is unresolved.
> 
>  Thanks, I will try to fix this testing case in a later patch.
> >>>
> >>> I've looked at this FAIL now and really wonder whether "pattern init" as
> >>> implemented makes any sense for non-integral types.
> >>> We end up with
> >>> initializing a register (SSA name) with
> >>>
> >>> VIEW_CONVERT_EXPR(0xfefefefefefefefefefefefefefefefe)
> >>>
> >>> as we go building a TImode constant (we verified we have a TImode SET!)
> >>> but then
> >>>
> >>> /* Pun the LHS to make sure its type has constant size
> >>>unless it is an SSA name where that's already known.  */
> >>> if (TREE_CODE (lhs) != SSA_NAME)
> >>>   lhs = build1 (VIEW_CONVERT_EXPR, itype, lhs);
> >>> else
> >>>   init = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), init);
> >>> ...
> >>> expand_assignment (lhs, init, false);
> >>>
> >>> and generally registers do not have any padding.  This weird expansion
> >>> then causes us to spill the TImode constant 

Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Jan Hubicka via Gcc-patches
> 
> The cgraph.cc change was what I actually needed for the fix, the
> cgraphclones.cc was only because I've noticed that it constructs a new
> node (so is initialized to whatever random flag_semantic_interposition is
> right now) and initializing it to what it is cloned from made more sense.

OK, thanks.
It only is needed for nodes which definition flag and public linkage, so
should not need to copy in cgraph clones and there are other places that
creates new nodes (late function etc).  I will move the logic to
visibility pass and to add_new_function and also kill the constructor.

I originally intended to set it at the consturction time but forget to
think of the frotnends changing opt_for_fn later from the optimization
attribute.  This also makes me wonder if C++ FE updates the implicit
aliases once they have been created...

Honza
> 
>   Jakub
> 


Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 20, 2022 at 11:06:12AM +0200, Jan Hubicka wrote:
> > On Wed, Apr 20, 2022 at 10:45:53AM +0200, Jan Hubicka wrote:
> > > So this change should be unnecessary unless there are nodes that are
> > > missing finalization stage.  It also is not good enough since frontends
> > > may change opt_for_fn between node creation and finalization of
> > > compilation unit (so even after cgraph_finalize unforutnately, we had
> > > another bug about that).
> > > 
> > > The PR was about implicit C++ alias.  So the problem is that aliases
> > > bypass finalization becuase they are produced by
> > > cgraph_node::create_alias that sets definition flag to true.
> > 
> > Note, I've already committed the patch as Richi acked it.
> > So, can we move that
> >   node->semantic_interposition = opt_for_fn (decl, 
> > flag_semantic_interposition);
> > from cgraph_node::create to cgraph_node::create_alias?
> 
> I think it would be easiest to move it to the visibility pass
> (after all it is about visibilities and all earlier uses of the flag
> are wrong since frontend is changing it at any time until unit is fully
> built).  I will prepare patch tonight or tomorrow.
> 
> Also thinking about the copying in cgraph_clone, it would make snese
> only if we produce clones with public linkage.  Do we ever do that?

The cgraph.cc change was what I actually needed for the fix, the
cgraphclones.cc was only because I've noticed that it constructs a new
node (so is initialized to whatever random flag_semantic_interposition is
right now) and initializing it to what it is cloned from made more sense.

Jakub



Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Jan Hubicka via Gcc-patches
> On Wed, Apr 20, 2022 at 10:45:53AM +0200, Jan Hubicka wrote:
> > So this change should be unnecessary unless there are nodes that are
> > missing finalization stage.  It also is not good enough since frontends
> > may change opt_for_fn between node creation and finalization of
> > compilation unit (so even after cgraph_finalize unforutnately, we had
> > another bug about that).
> > 
> > The PR was about implicit C++ alias.  So the problem is that aliases
> > bypass finalization becuase they are produced by
> > cgraph_node::create_alias that sets definition flag to true.
> 
> Note, I've already committed the patch as Richi acked it.
> So, can we move that
>   node->semantic_interposition = opt_for_fn (decl, 
> flag_semantic_interposition);
> from cgraph_node::create to cgraph_node::create_alias?

I think it would be easiest to move it to the visibility pass
(after all it is about visibilities and all earlier uses of the flag
are wrong since frontend is changing it at any time until unit is fully
built).  I will prepare patch tonight or tomorrow.

Also thinking about the copying in cgraph_clone, it would make snese
only if we produce clones with public linkage.  Do we ever do that?

Honza
> 
> > I guess it would be most consistent to give up on having the flag up to
> > date during cgraph construction (i.e. from finalization time down) and
> > compute it during the cgraph_finalize_complation_unit.  I will look into
> > that.
> 
>   Jakub
> 


Re: [PATCH] gcov-profile: Allow negavive counts of indirect calls [PR105282]

2022-04-20 Thread Martin Liška

On 4/20/22 10:55, Jan Hubicka via Gcc-patches wrote:

I tink we can just drop the sanity check completely.  In general the
profile data may be corrupted and each use of it should be guarded to
not explode on such situation.


Makes sense to me. I'm going to do it once stage1 opens.

Cheers,
Martin


Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 20, 2022 at 10:45:53AM +0200, Jan Hubicka wrote:
> So this change should be unnecessary unless there are nodes that are
> missing finalization stage.  It also is not good enough since frontends
> may change opt_for_fn between node creation and finalization of
> compilation unit (so even after cgraph_finalize unforutnately, we had
> another bug about that).
> 
> The PR was about implicit C++ alias.  So the problem is that aliases
> bypass finalization becuase they are produced by
> cgraph_node::create_alias that sets definition flag to true.

Note, I've already committed the patch as Richi acked it.
So, can we move that
  node->semantic_interposition = opt_for_fn (decl, flag_semantic_interposition);
from cgraph_node::create to cgraph_node::create_alias?

> I guess it would be most consistent to give up on having the flag up to
> date during cgraph construction (i.e. from finalization time down) and
> compute it during the cgraph_finalize_complation_unit.  I will look into
> that.

Jakub



Re: [PATCH] gcov-profile: Allow negavive counts of indirect calls [PR105282]

2022-04-20 Thread Jan Hubicka via Gcc-patches
> From: Sergei Trofimovich 
> 
> TOPN metrics are histograms that contain overall count and per-bucket
> count. Overall count can be nevative when two profiles merge and some
> of per-bucket metrics are dropped.
> 
> Noticed as an ICE on python PGO build where gcc crashes as:
> 
> during IPA pass: modref
> a.c:36:1: ICE: in stream_out_histogram_value, at value-prof.cc:340
>36 | }
>   | ^
> stream_out_histogram_value(output_block*, histogram_value_t*)
> gcc/value-prof.cc:340
> 
> gcc/ChangeLog:
> 
>   PR gcov-profile/105282
>   * value-prof.cc (stream_out_histogram_value): Allow negavive counts
>   on HIST_TYPE_INDIR_CALL.
> ---
>  gcc/value-prof.cc | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/gcc/value-prof.cc b/gcc/value-prof.cc
> index 9785c7a03ea..4927d119aa0 100644
> --- a/gcc/value-prof.cc
> +++ b/gcc/value-prof.cc
> @@ -319,40 +319,44 @@ stream_out_histogram_value (struct output_block *ob, 
> histogram_value hist)
>streamer_write_bitpack ();
>switch (hist->type)
>  {
>  case HIST_TYPE_INTERVAL:
>streamer_write_hwi (ob, hist->hdata.intvl.int_start);
>streamer_write_uhwi (ob, hist->hdata.intvl.steps);
>break;
>  default:
>break;
>  }
>for (i = 0; i < hist->n_counters; i++)
>  {
>/* When user uses an unsigned type with a big value, constant converted
>to gcov_type (a signed type) can be negative.  */
>gcov_type value = hist->hvalue.counters[i];
>if (hist->type == HIST_TYPE_TOPN_VALUES
> || hist->type == HIST_TYPE_IOR)
>   /* Note that the IOR counter tracks pointer values and these can have
>  sign bit set.  */
>   ;
> +  else if (hist->type == HIST_TYPE_INDIR_CALL && i == 0)
> + /* 'all' counter overflow is stored as a negative value. Individual
> +counters and values are expected to be non-negative.  */
> + ;

I tink we can just drop the sanity check completely.  In general the
profile data may be corrupted and each use of it should be guarded to
not explode on such situation.
I added the check here long time ago while implementing the early
version of profile streaming patch. At that time some bugs was causing
counts to be negative due to weird overflows in the logic normalizing
profiles from different object files to same number of executions.

Honza
>else
>   gcc_assert (value >= 0);
>  
>streamer_write_gcov_count (ob, value);
>  }
>if (hist->hvalue.next)
>  stream_out_histogram_value (ob, hist->hvalue.next);
>  }
>  
>  /* Dump information about HIST to DUMP_FILE.  */
>  
>  void
>  stream_in_histogram_value (class lto_input_block *ib, gimple *stmt)
>  {
>enum hist_type type;
>unsigned int ncounters = 0;
>struct bitpack_d bp;
>unsigned int i;
>histogram_value new_val;
>bool next;
> -- 
> 2.35.1
> 


Re: [PATCH][v2] tree-optimization/104912 - ensure cost model is checked first

2022-04-20 Thread Jan Hubicka via Gcc-patches
> The following makes sure that when we build the versioning condition
> for vectorization including the cost model check, we check for the
> cost model and branch over other versioning checks.  That is what
> the cost modeling assumes, since the cost model check is the only
> one accounted for in the scalar outside cost.  Currently we emit
> all checks as straight-line code combined with bitwise ops which
> can result in surprising ordering of checks in the final assembly.
> 
> Since loop_version accepts only a single versioning condition
> the splitting is done after the fact.
> 
> The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
> with -Ofast and tuning for generic or skylake.  That's not enough
> to recover from the slowdown when vectorizing but it now cuts off
> the expensive alias versioning test.
> 
> This is an update to the previously posted patch splitting the
> probability between the two branches as outlined in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592597.html
> 
> I've re-bootstrapped and tested this on x86_64-unknown-linux-gnu.
> 
> Honza - is the approach to splitting the probabilities sensible?
> This fixes a piece of a P1 regression.
> 
> Thanks,
> Richard.
> 
> 2022-03-21  Richard Biener  
> 
>   PR tree-optimization/104912
>   * tree-vect-loop-manip.cc (vect_loop_versioning): Split
>   the cost model check to a separate BB to make sure it is
>   checked first and not combined with other version checks.
> ---
>  gcc/tree-vect-loop-manip.cc | 60 +++--
>  1 file changed, 57 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 63fb6f669a0..e4381eb7079 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3445,13 +3445,34 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>   cond_expr = expr;
>  }
>  
> +  tree cost_name = NULL_TREE;
> +  profile_probability prob2 = profile_probability::uninitialized ();
> +  if (cond_expr
> +  && !integer_truep (cond_expr)
> +  && (version_niter
> +   || version_align
> +   || version_alias
> +   || version_simd_if_cond))

I assume that this condition...
>  
> +  /* Split the cost model check off to a separate BB.  Costing assumes
> + this is the only thing we perform when we enter the scalar loop
> + from a failed cost decision.  */
> +  if (cost_name && TREE_CODE (cost_name) == SSA_NAME)
is if and only if this condition
(otherwise prob2 would get uninitialized or lost)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (cost_name);
> +  /* All uses of the cost check are 'true' after the check we
> +  are going to insert.  */
> +  replace_uses_by (cost_name, boolean_true_node);
> +  /* And we're going to build the new single use of it.  */
> +  gcond *cond = gimple_build_cond (NE_EXPR, cost_name, 
> boolean_false_node,
> +NULL_TREE, NULL_TREE);
> +  edge e = split_block (gimple_bb (def), def);
> +  gimple_stmt_iterator gsi = gsi_for_stmt (def);
> +  gsi_insert_after (, cond, GSI_NEW_STMT);
> +  edge true_e, false_e;
> +  extract_true_false_edges_from_block (e->dest, _e, _e);
> +  e->flags &= ~EDGE_FALLTHRU;
> +  e->flags |= EDGE_TRUE_VALUE;
> +  edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
> +  e->probability = prob2;
> +  e2->probability = prob2.invert ();

So this looks fine to me.
Honza
> +  set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
> +  auto_vec adj;
> +  for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);
> +son;
> +son = next_dom_son (CDI_DOMINATORS, son))
> + if (EDGE_COUNT (son->preds) > 1)
> +   adj.safe_push (son);
> +  for (auto son : adj)
> + set_immediate_dominator (CDI_DOMINATORS, son, e->src);
> +}
> +
>if (version_niter)
>  {
>/* The versioned loop could be infinite, we need to clear existing
> -- 
> 2.34.1


[PATCH] arm: Restrict support of vectors of boolean immediates (PR target/104662)

2022-04-20 Thread Christophe Lyon via Gcc-patches
This simple patch avoids the ICE described in the PR:
internal compiler error: in simd_valid_immediate, at config/arm/arm.cc:12866

with an early exit from simd_valid_immediate if we are trying to
handle a vector of booleans and MVE is not enabled.

We still get an ICE when compiling the existing
gcc.dg/rtl/arm/mve-vxbi.c without -march=armv8.1-m.main+mve:

error: unrecognizable insn:
(insn 7 5 8 2 (set (reg:V4BI 114)
(const_vector:V4BI [
(const_int 1 [0x1])
(const_int 0 [0]) repeated x2
(const_int 1 [0x1])
])) -1
 (nil))
during RTL pass: ira

but there's little we can do since the testcase explicitly creates
vectors of booleans which do need MVE.

That is the reason why I do not add a testcase.

2022-04-19  Christophe Lyon  

PR target/104662
* config/arm/arm.cc (simd_valid_immediate): Exit when input is a
vector of booleans and MVE is not enabled.
---
 gcc/config/arm/arm.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 14e2fdfeafa..69a18c2f157 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -12849,6 +12849,9 @@ simd_valid_immediate (rtx op, machine_mode mode, int 
inverse,
  || n_elts * innersize != 16))
 return -1;
 
+  if (!TARGET_HAVE_MVE && GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+return -1;
+
   /* Vectors of float constants.  */
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
 {
-- 
2.25.1



Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Jan Hubicka via Gcc-patches
> On Wed, 20 Apr 2022, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > cgraph_node has a semantic_interposition flag which should mirror
> > opt_for_fn (decl, flag_semantic_interposition).  But it actually is
> > initialized not from that, but from flag_semantic_interposition in the
> >   explicit symtab_node (symtab_type t)
> > : type (t), resolution (LDPR_UNKNOWN), definition (false), alias 
> > (false),
> > ...
> >   semantic_interposition (flag_semantic_interposition),
> > ...
> >   x_comdat_group (NULL_TREE), x_section (NULL)
> >   {}
> > ctor.  I think that might be fine for varpool nodes, but since
> > flag_semantic_interposition is now implied from -Ofast it isn't correct
> > for cgraph nodes, unless we guarantee that cgraph node for a particular
> > function decl is always created while that function is
> > current_function_decl.  That is often the case, but not always as the
> > following function shows.

Normally cgraph_nodes with function bodies are first created, later
finalized and then analyzed.  We copy over the semantic_interposition
flag from opt_for_fn to cgraph_node in finalize_function.
The ctor there is indeed only for varpool nodes since these do not have
their opt_for_var.
> > --- gcc/cgraph.cc.jj2022-02-04 14:36:54.069618372 +0100
> > +++ gcc/cgraph.cc   2022-04-19 13:38:06.223782974 +0200
> > @@ -507,6 +507,7 @@ cgraph_node::create (tree decl)
> >gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
> >  
> >node->decl = decl;
> > +  node->semantic_interposition = opt_for_fn (decl, 
> > flag_semantic_interposition);

So this change should be unnecessary unless there are nodes that are
missing finalization stage.  It also is not good enough since frontends
may change opt_for_fn between node creation and finalization of
compilation unit (so even after cgraph_finalize unforutnately, we had
another bug about that).

The PR was about implicit C++ alias.  So the problem is that aliases
bypass finalization becuase they are produced by
cgraph_node::create_alias that sets definition flag to true.

I guess it would be most consistent to give up on having the flag up to
date during cgraph construction (i.e. from finalization time down) and
compute it during the cgraph_finalize_complation_unit.  I will look into
that.
> >  
> >if ((flag_openacc || flag_openmp)
> >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> > --- gcc/cgraphclones.cc.jj  2022-01-18 11:58:58.948991114 +0100
> > +++ gcc/cgraphclones.cc 2022-04-19 13:38:43.594262397 +0200
> > @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl
> >new_node->versionable = versionable;
> >new_node->can_change_signature = can_change_signature;
> >new_node->redefined_extern_inline = redefined_extern_inline;
> > +  new_node->semantic_interposition = semantic_interposition;

This indeed makes sense to me. 
Honza
> >new_node->tm_may_enter_irr = tm_may_enter_irr;
> >new_node->externally_visible = false;
> >new_node->no_reorder = no_reorder;
> > --- gcc/testsuite/g++.dg/opt/pr105306.C.jj  2022-04-19 13:42:33.908054114 
> > +0200
> > +++ gcc/testsuite/g++.dg/opt/pr105306.C 2022-04-19 13:42:08.859403045 
> > +0200
> > @@ -0,0 +1,13 @@
> > +// PR ipa/105306
> > +// { dg-do compile }
> > +// { dg-options "-Ofast" }
> > +
> > +#pragma GCC optimize 0
> > +template  void foo (T);
> > +struct B { ~B () {} };
> > +struct C { B f; };
> > +template  struct E {
> > +  void bar () { foo (g); }
> > +  C g;
> > +};
> > +template class E;
> > 
> > Jakub
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


[PATCH] tree-optimization/105312 - fix ISEL VCOND expansion

2022-04-20 Thread Richard Biener via Gcc-patches
The following aligns ISEL VEC_COND_EXPR expansion using VCOND
with the optab query done by vector lowering.  Instead of only
allowing the signed optab to provide EQ/NE compares we allow both
here though since there seems to be no documented canonicalization.

Bootstrap and regtest running on x86_64-unknown-linux-gnu,
I've cut neon boilerplate for the testcase but cannot
test it (a cc1 cross makes it UNSUPPORTED), if I don't hear
otherwise I'm going to push as-is after testing completed.

Thanks,
Richard.

2022-04-20  Richard Biener  

PR tree-optimization/105312
* gimple-isel.cc (gimple_expand_vec_cond_expr): Query both
VCOND and VCONDU for EQ and NE.

* gcc.target/arm/pr105312.c: New testcase.
---
 gcc/gimple-isel.cc  |  8 
 gcc/testsuite/gcc.target/arm/pr105312.c | 23 +++
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr105312.c

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 3635585bf45..a8f7a0d25d0 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -245,6 +245,14 @@ gimple_expand_vec_cond_expr (struct function *fun, 
gimple_stmt_iterator *gsi,
GET_MODE_NUNITS (cmp_op_mode)));
 
   icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
+  /* Some targets do not have vcondeq and only vcond with NE/EQ
+ but not vcondu, so make sure to also try vcond here as
+ vcond_icode_p would canonicalize the optab query to.  */
+  if (icode == CODE_FOR_nothing
+  && (tcode == NE_EXPR || tcode == EQ_EXPR)
+  && ((icode = get_vcond_icode (mode, cmp_op_mode, !unsignedp))
+ != CODE_FOR_nothing))
+unsignedp = !unsignedp;
   if (icode == CODE_FOR_nothing)
 {
   if (tcode == LT_EXPR
diff --git a/gcc/testsuite/gcc.target/arm/pr105312.c 
b/gcc/testsuite/gcc.target/arm/pr105312.c
new file mode 100644
index 000..a02831bcbcf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr105312.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-mcpu=cortex-a15" } */
+/* { dg-add-options arm_neon } */
+
+typedef float stress_matrix_type_t;
+typedef unsigned int size_t;
+static void __attribute__((optimize("-O3"))) stress_matrix_xy_identity(
+ const size_t n,
+ stress_matrix_type_t a[restrict n][n],
+ stress_matrix_type_t b[restrict n][n],
+ stress_matrix_type_t r[restrict n][n])
+{
+ register size_t i;
+ (void)a;
+ (void)b;
+ for (i = 0; i < n; i++) {
+  register size_t j;
+  for (j = 0; j < n; j++)
+   r[i][j] = (i == j) ? 1.0 : 0.0;
+   return;
+ }
+}
-- 
2.34.1


Re: [x86_64 PATCH] PR middle-end/105135: Catch more cmov idioms in combine.

2022-04-20 Thread Uros Bizjak via Gcc-patches
On Tue, Apr 19, 2022 at 1:58 PM Roger Sayle  wrote:
>
>
> This patch addresses PR middle-end/105135, a missed-optimization regression
> affecting mainline.  I agree with Jakub's comment that the middle-end
> optimizations are sound, reducing basic blocks and conditional expressions
> at the tree-level, but requiring backend's to recognize conditional move
> instructions/idioms if/when beneficial.  This patch introduces two new
> define_insn_and_split in i386.md to recognize two additional cmove idioms.
>
> The first recognizes (PR105135's):
>
> int foo(int x, int y, int z)
> {
>   return ((x < y) << 5) + z;
> }
>
> and transforms (the 6 insns, 13 bytes):
>
> xorl%eax, %eax  ;; 2 bytes
> cmpl%esi, %edi  ;; 2 bytes
> setl%al ;; 3 bytes
> sall$5, %eax;; 3 bytes
> addl%edx, %eax  ;; 2 bytes
> ret ;; 1 byte
>
> into (the 4 insns, 9 bytes):
>
> cmpl%esi, %edi  ;; 2 bytes
> leal32(%rdx), %eax  ;; 3 bytes
> cmovge  %edx, %eax  ;; 3 bytes
> ret ;; 1 byte
>
>
> The second catches the very closely related (from PR 98865):
>
> int bar(int x, int y, int z)
> {
>   return -(x < y) & z;
> }
>
> and transforms the (6 insns, 12 bytes):
> xorl%eax, %eax  ;; 2 bytes
> cmpl%esi, %edi  ;; 2 bytes
> setl%al ;; 3 bytes
> negl%eax;; 2 bytes
> andl%edx, %eax  ;; 2 bytes
> ret ;; 1 byte
>
> into (4 insns, 8 bytes):
> xorl%eax, %eax  ;; 2 bytes
> cmpl%esi, %edi  ;; 2 bytes
> cmovl   %edx, %eax  ;; 3 bytes
> ret ;; 1 byte
>
> They both have in common that they recognize a setcc followed by two
> instructions, and replace them with one instruction and a cmov, which
> is typically a performance win, but always a size win.  Fine tuning
> these decisions based on microarchitecture is much easier in the
> backend, than the middle-end.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-04-19  Roger Sayle  
>
> gcc/ChangeLog
> PR target/105135
> * config/i386/i386.md (*xor_cmov): Transform setcc, negate
> then and into mov $0, followed by a cmov.
> (*lea_cmov): Transform setcc, ashift const then plus into
> lea followed by cmov.
>
> gcc/testsuite/ChangeLog
> PR target/105135
> * gcc.target/i386/cmov10.c: New test case.
> * gcc.target/i386/cmov11.c: New test case.
> * gcc.target/i386/pr105135.c: New test case.
>
>
> Thanks in advance,
> Roger


+;; Transform setcc;negate;and into mov_zero;cmov
+(define_insn_and_split "*xor_cmov"
+  [(set (match_operand:SWI248 0 "register_operand")
+(and:SWI248
+  (neg:SWI248 (match_operator:SWI248 1 "ix86_comparison_operator"
+[(match_operand 2 "flags_reg_operand")
+ (const_int 0)]))
+  (match_operand:SWI248 3 "register_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_CMOVE && can_create_pseudo_p ()"

Please use ix86_pre_reload_split instead of can_create_pseudo_p () here.

+  "#"
+  "&& 1"
+  [(set (match_dup 4) (const_int 0))
+   (set (match_dup 0)
+(if_then_else:SWI248 (match_op_dup 1 [(match_dup 2) (const_int 0)])
+ (match_dup 3) (match_dup 4)))]
+{
+  operands[4] = gen_reg_rtx (mode);
+})

Single line preparation statements should use double quotes instead of
curly braces. See many examples in i386 .md files.

+;; Transform setcc;ashift_const;plus into lea_const;cmov
+(define_insn_and_split "*lea_cmov"
+  [(set (match_operand:SWI 0 "register_operand")
+(plus:SWI (ashift:SWI (match_operator:SWI 1 "ix86_comparison_operator"
+[(match_operand 2 "flags_reg_operand")
+ (const_int 0)])
+  (match_operand:SWI 3 "const_int_operand"))
+  (match_operand:SWI 4 "register_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_CMOVE && can_create_pseudo_p ()"

Same here, ix86_pre_reload_split should be used for
define_insn_and_split (FYI, can_create_pseudo_p is still good for
define_split where no instruction is defined).

+  "#"
+  "&& 1"
+  [(set (match_dup 5) (plus: (match_dup 4) (match_dup 6)))
+   (set (match_dup 0)
+(if_then_else: (match_op_dup 1 [(match_dup 2) (const_int 0)])
+(match_dup 5) (match_dup 4)))]
+{
+  operands[5] = gen_reg_rtx (mode);
+  operands[6] = GEN_INT (1 << INTVAL (operands[3]));
+  if (mode != mode)
+{
+  operands[0] = gen_lowpart (mode, operands[0]);
+  operands[4] = gen_lowpart (mode, operands[4]);

gen_lowpart is dangerous to use before reload. It can choke when
integer mode SUBREG of e.g. FP mode register is passed here. So you
have to 

Re: [PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Richard Biener via Gcc-patches
On Wed, 20 Apr 2022, Jakub Jelinek wrote:

> Hi!
> 
> cgraph_node has a semantic_interposition flag which should mirror
> opt_for_fn (decl, flag_semantic_interposition).  But it actually is
> initialized not from that, but from flag_semantic_interposition in the
>   explicit symtab_node (symtab_type t)
> : type (t), resolution (LDPR_UNKNOWN), definition (false), alias (false),
> ...
>   semantic_interposition (flag_semantic_interposition),
> ...
>   x_comdat_group (NULL_TREE), x_section (NULL)
>   {}
> ctor.  I think that might be fine for varpool nodes, but since
> flag_semantic_interposition is now implied from -Ofast it isn't correct
> for cgraph nodes, unless we guarantee that cgraph node for a particular
> function decl is always created while that function is
> current_function_decl.  That is often the case, but not always as the
> following function shows.
> Because symtab_node's ctor doesn't know for which decl the cgraph node
> is being created, the following patch keeps that as is, but updates it from
> opt_for_fn (decl, flag_semantic_interposition) when we know that, or for
> clones copies that flag (often it is then overridden in
> set_new_clone_decl_and_node_flags, but not always).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2022-04-20  Jakub Jelinek  
> 
>   PR ipa/105306
>   * cgraph.cc (cgraph_node::create): Set node->semantic_interposition
>   to opt_for_fn (decl, flag_semantic_interposition).
>   * cgraphclones.cc (cgraph_node::create_clone): Copy over
>   semantic_interposition flag.
> 
>   * g++.dg/opt/pr105306.C: New test.
> 
> --- gcc/cgraph.cc.jj  2022-02-04 14:36:54.069618372 +0100
> +++ gcc/cgraph.cc 2022-04-19 13:38:06.223782974 +0200
> @@ -507,6 +507,7 @@ cgraph_node::create (tree decl)
>gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
>  
>node->decl = decl;
> +  node->semantic_interposition = opt_for_fn (decl, 
> flag_semantic_interposition);
>  
>if ((flag_openacc || flag_openmp)
>&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> --- gcc/cgraphclones.cc.jj2022-01-18 11:58:58.948991114 +0100
> +++ gcc/cgraphclones.cc   2022-04-19 13:38:43.594262397 +0200
> @@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl
>new_node->versionable = versionable;
>new_node->can_change_signature = can_change_signature;
>new_node->redefined_extern_inline = redefined_extern_inline;
> +  new_node->semantic_interposition = semantic_interposition;
>new_node->tm_may_enter_irr = tm_may_enter_irr;
>new_node->externally_visible = false;
>new_node->no_reorder = no_reorder;
> --- gcc/testsuite/g++.dg/opt/pr105306.C.jj2022-04-19 13:42:33.908054114 
> +0200
> +++ gcc/testsuite/g++.dg/opt/pr105306.C   2022-04-19 13:42:08.859403045 
> +0200
> @@ -0,0 +1,13 @@
> +// PR ipa/105306
> +// { dg-do compile }
> +// { dg-options "-Ofast" }
> +
> +#pragma GCC optimize 0
> +template  void foo (T);
> +struct B { ~B () {} };
> +struct C { B f; };
> +template  struct E {
> +  void bar () { foo (g); }
> +  C g;
> +};
> +template class E;
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] c++, coroutines: Account for overloaded promise return_value() [PR105301].

2022-04-20 Thread Richard Biener via Gcc-patches
On Wed, Apr 20, 2022 at 4:19 AM Jason Merrill via Gcc-patches
 wrote:
>
> On 4/18/22 10:03, Iain Sandoe wrote:
> > Whether it was intended or not, it is possible to define a coroutine promise
> > with multiple return_value() methods [which need not even have the same 
> > type].
> >
> > We were not accounting for this possibility in the check to see whether both
> > return_value and return_void are specifier (which is prohibited by the
> > standard).  Fixed thus and provided an adjusted diagnostic for the case that
> > multiple return_value() methods are present.
> >
> > tested on x86_64-darwin, OK for mainline? / Backports? (when?)
> > thanks,
> > Iain
> >
> > Signed-off-by: Iain Sandoe 
> >
> >   PR c++/105301
> >
> > gcc/cp/ChangeLog:
> >
> >   * coroutines.cc (coro_promise_type_found_p): Account for possible
> >   mutliple overloads of the promise return_value() method.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/coroutines/pr105301.C: New test.
> > ---
> >   gcc/cp/coroutines.cc   | 10 -
> >   gcc/testsuite/g++.dg/coroutines/pr105301.C | 49 ++
> >   2 files changed, 57 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/coroutines/pr105301.C
> >
> > diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
> > index dcc2284171b..d2a765cac11 100644
> > --- a/gcc/cp/coroutines.cc
> > +++ b/gcc/cp/coroutines.cc
> > @@ -513,8 +513,14 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
> > coro_info->promise_type);
> > inform (DECL_SOURCE_LOCATION (BASELINK_FUNCTIONS (has_ret_void)),
> > "% declared here");
> > -   inform (DECL_SOURCE_LOCATION (BASELINK_FUNCTIONS (has_ret_val)),
> > -   "% declared here");
> > +   has_ret_val = BASELINK_FUNCTIONS (has_ret_val);
> > +   const char *message = "% declared here";
> > +   if (TREE_CODE (has_ret_val) == OVERLOAD)
> > + {
> > +   has_ret_val = OVL_FIRST (has_ret_val);
> > +   message = "% first declared here";
> > + }
>
> You could also use get_first_fn, but the patch is OK as is.  I'm
> inclined to leave backports in coroutines.cc to your discretion, you
> probably have a better idea of how important they are.

Likewise.  Please wait until after the 11.3 release.

Richard.

> > +   inform (DECL_SOURCE_LOCATION (has_ret_val), message);
> > coro_info->coro_co_return_error_emitted = true;
> > return false;
> >   }
> > diff --git a/gcc/testsuite/g++.dg/coroutines/pr105301.C 
> > b/gcc/testsuite/g++.dg/coroutines/pr105301.C
> > new file mode 100644
> > index 000..33a0b03cf5d
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/coroutines/pr105301.C
> > @@ -0,0 +1,49 @@
> > +// { dg-additional-options "-fsyntax-only" }
> > +namespace std {
> > +template 
> > +struct traits_sfinae_base {};
> > +
> > +template 
> > +struct coroutine_traits : public traits_sfinae_base {};
> > +}
> > +
> > +template struct coro {};
> > +template 
> > +struct std::coroutine_traits, Ps...> {
> > +  using promise_type = Promise;
> > +};
> > +
> > +struct awaitable {
> > +  bool await_ready() noexcept;
> > +  template 
> > +  void await_suspend(F) noexcept;
> > +  void await_resume() noexcept;
> > +} a;
> > +
> > +struct suspend_always {
> > +  bool await_ready() noexcept { return false; }
> > +  template 
> > +  void await_suspend(F) noexcept;
> > +  void await_resume() noexcept {}
> > +};
> > +
> > +namespace std {
> > +template 
> > +struct coroutine_handle {};
> > +}
> > +
> > +struct bad_promise_6 {
> > +  coro get_return_object();
> > +  suspend_always initial_suspend();
> > +  suspend_always final_suspend() noexcept;
> > +  void unhandled_exception();
> > +  void return_void();
> > +  void return_value(int) const;
> > +  void return_value(int);
> > +};
> > +
> > +coro
> > +bad_implicit_return() // { dg-error {.aka 'bad_promise_6'. declares both 
> > 'return_value' and 'return_void'} }
> > +{
> > +  co_await a;
> > +}
>


[PATCH] loongarch: ignore zero-size fields in calling convention

2022-04-20 Thread Xi Ruoyao via Gcc-patches
Currently, LoongArch ELF psABI is not clear on the handling of zero-
sized fields in aggregates arguments or return values [1].  The behavior
of GCC trunk is puzzling considering the following cases:

struct test1
{
  double a[0];
  float x;
};

struct test2
{
  float a[0];
  float x;
};

GCC trunk passes test1::x via GPR, but test2::x via FPR.  I believe no
rational Homo Sapiens can understand (or even expect) this.

And, to make things even worse, test1 behaves differently in C and C++.
GCC trunk passes test1::x via GPR, but G++ trunk passes test1::x via
FPR.

I've write a paragraph about current GCC behavior for the psABI [2], but
I think it's cleaner to just ignore all zero-sized fields in the ABI. 
This will require only a two-line change in GCC (this patch), and an
one-line change in the ABI doc.

If there is not any better idea I'd like to see this reviewed and
applied ASAP.  If we finally have to apply this patch after GCC 12
release, we'll need to add a lot more boring code to emit a -Wpsabi
inform [3].  That will be an unnecessary burden for both us, and the
users using the compiler (as the compiler will spend CPU time only for
checking if a warning should be informed).

[1]:https://github.com/loongson/LoongArch-Documentation/issues/48
[2]:https://github.com/loongson/LoongArch-Documentation/pull/49
[3]:https://gcc.gnu.org/PR102024

gcc/

* config/loongarch/loongarch.cc
(loongarch_flatten_aggregate_field): Ignore empty fields for
RECORD_TYPE.

gcc/testsuite/

* gcc.target/loongarch/zero-size-field-pass.c: New test.
* gcc.target/loongarch/zero-size-field-ret.c: New test.
---
 gcc/config/loongarch/loongarch.cc |  3 ++
 .../loongarch/zero-size-field-pass.c  | 30 +++
 .../loongarch/zero-size-field-ret.c   | 28 +
 3 files changed, 61 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index f22150a60cc..57e4d9f82ce 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -326,6 +326,9 @@ loongarch_flatten_aggregate_field (const_tree type,
   for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
if (TREE_CODE (f) == FIELD_DECL)
  {
+   if (DECL_SIZE (f) && integer_zerop (DECL_SIZE (f)))
+ continue;
+
if (!TYPE_P (TREE_TYPE (f)))
  return -1;
 
diff --git a/gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c 
b/gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c
new file mode 100644
index 000..999dc913a71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/zero-size-field-pass.c
@@ -0,0 +1,30 @@
+/* Test that LoongArch backend ignores zero-sized fields of aggregates in
+   argument passing.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdouble-float -mabi=lp64d" } */
+/* { dg-final { scan-assembler "\\\$f1" } } */
+
+struct test
+{
+  int empty1[0];
+  double empty2[0];
+  int : 0;
+  float x;
+  long empty3[0];
+  long : 0;
+  float y;
+  unsigned : 0;
+  char empty4[0];
+};
+
+extern void callee (struct test);
+
+void
+caller (void)
+{
+  struct test test;
+  test.x = 114;
+  test.y = 514;
+  callee (test);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c 
b/gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c
new file mode 100644
index 000..40137d97555
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/zero-size-field-ret.c
@@ -0,0 +1,28 @@
+/* Test that LoongArch backend ignores zero-sized fields of aggregates in
+   returning.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdouble-float -mabi=lp64d" } */
+/* { dg-final { scan-assembler-not "\\\$r4" } } */
+
+struct test
+{
+  int empty1[0];
+  double empty2[0];
+  int : 0;
+  float x;
+  long empty3[0];
+  long : 0;
+  float y;
+  unsigned : 0;
+  char empty4[0];
+};
+
+extern struct test callee (void);
+
+float
+caller (void)
+{
+  struct test test = callee ();
+  return test.x + test.y;
+}
-- 
2.36.0



[PATCH] cgraph: Fix up semantic_interposition handling [PR105306]

2022-04-20 Thread Jakub Jelinek via Gcc-patches
Hi!

cgraph_node has a semantic_interposition flag which should mirror
opt_for_fn (decl, flag_semantic_interposition).  But it actually is
initialized not from that, but from flag_semantic_interposition in the
  explicit symtab_node (symtab_type t)
: type (t), resolution (LDPR_UNKNOWN), definition (false), alias (false),
...
  semantic_interposition (flag_semantic_interposition),
...
  x_comdat_group (NULL_TREE), x_section (NULL)
  {}
ctor.  I think that might be fine for varpool nodes, but since
flag_semantic_interposition is now implied from -Ofast it isn't correct
for cgraph nodes, unless we guarantee that cgraph node for a particular
function decl is always created while that function is
current_function_decl.  That is often the case, but not always as the
following function shows.
Because symtab_node's ctor doesn't know for which decl the cgraph node
is being created, the following patch keeps that as is, but updates it from
opt_for_fn (decl, flag_semantic_interposition) when we know that, or for
clones copies that flag (often it is then overridden in
set_new_clone_decl_and_node_flags, but not always).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-04-20  Jakub Jelinek  

PR ipa/105306
* cgraph.cc (cgraph_node::create): Set node->semantic_interposition
to opt_for_fn (decl, flag_semantic_interposition).
* cgraphclones.cc (cgraph_node::create_clone): Copy over
semantic_interposition flag.

* g++.dg/opt/pr105306.C: New test.

--- gcc/cgraph.cc.jj2022-02-04 14:36:54.069618372 +0100
+++ gcc/cgraph.cc   2022-04-19 13:38:06.223782974 +0200
@@ -507,6 +507,7 @@ cgraph_node::create (tree decl)
   gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
 
   node->decl = decl;
+  node->semantic_interposition = opt_for_fn (decl, 
flag_semantic_interposition);
 
   if ((flag_openacc || flag_openmp)
   && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
--- gcc/cgraphclones.cc.jj  2022-01-18 11:58:58.948991114 +0100
+++ gcc/cgraphclones.cc 2022-04-19 13:38:43.594262397 +0200
@@ -394,6 +394,7 @@ cgraph_node::create_clone (tree new_decl
   new_node->versionable = versionable;
   new_node->can_change_signature = can_change_signature;
   new_node->redefined_extern_inline = redefined_extern_inline;
+  new_node->semantic_interposition = semantic_interposition;
   new_node->tm_may_enter_irr = tm_may_enter_irr;
   new_node->externally_visible = false;
   new_node->no_reorder = no_reorder;
--- gcc/testsuite/g++.dg/opt/pr105306.C.jj  2022-04-19 13:42:33.908054114 
+0200
+++ gcc/testsuite/g++.dg/opt/pr105306.C 2022-04-19 13:42:08.859403045 +0200
@@ -0,0 +1,13 @@
+// PR ipa/105306
+// { dg-do compile }
+// { dg-options "-Ofast" }
+
+#pragma GCC optimize 0
+template  void foo (T);
+struct B { ~B () {} };
+struct C { B f; };
+template  struct E {
+  void bar () { foo (g); }
+  C g;
+};
+template class E;

Jakub



回复:[PATCH] Asan changes for RISC-V.

2022-04-20 Thread joshua via Gcc-patches
Does Asan work for RISC-V currently? It seems that '-fsanitize=address' is 
still unsupported for RISC-V. If I add '--enable-libsanitizer' in Makefile.in 
to reconfigure, there are compiling errors.
Is it because # libsanitizer not supported rv32, but it will break the rv64 
multi-lib build, so we disable that temporally until rv32 supported# in 
Makefile.in?


--
发件人:Jim Wilson 
发送时间:2020年10月29日(星期四) 07:59
收件人:gcc-patches 
抄 送:cooper.joshua ; Jim Wilson 

主 题:[PATCH] Asan changes for RISC-V.

We have only riscv64 asan support, there is no riscv32 support as yet.  So I
need to be able to conditionally enable asan support for the riscv target.  I
implemented this by returning zero from the asan_shadow_offset function.  This
requires a change to toplev.c and docs in target.def.

The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel.
The problem is that the asan high memory region is a small wedge below
0x40.  The new kernel puts shared libraries at 0x3f and going
down which works.  But the old kernel puts shared libraries at 0x20
and going up which does not work, as it isn't in any recognized memory
region.  This might be fixable with more asan work, but we don't really need
support for old kernel versions.

The asan port is curious in that it uses 1<<29 for the shadow offset, but all
other 64-bit targets use a number larger than 1<<32.  But what we have is
working OK for now.

I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running on
qemu and the results look reasonable.

  === gcc Summary ===

# of expected passes  1905
# of unexpected failures 11
# of unsupported tests  224

  === g++ Summary ===

# of expected passes  2002
# of unexpected failures 6
# of unresolved testcases 1
# of unsupported tests  175

OK?

Jim

2020-10-28  Jim Wilson  

 gcc/
 * config/riscv/riscv.c (riscv_asan_shadow_offset): New.
 (TARGET_ASAN_SHADOW_OFFSET): New.
 * doc/tm.texi: Regenerated.
 * target.def (asan_shadow_offset); Mention that it can return zero.
 * toplev.c (process_options): Check for and handle zero return from
 targetm.asan_shadow_offset call.

Co-Authored-By: cooper.joshua 
---
 gcc/config/riscv/riscv.c | 16 
 gcc/doc/tm.texi  |  3 ++-
 gcc/target.def   |  3 ++-
 gcc/toplev.c |  3 ++-
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 989a9f15250..6909e200de1 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op)
   return true;
 }

+/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
+
+static unsigned HOST_WIDE_INT
+riscv_asan_shadow_offset (void)
+{
+  /* We only have libsanitizer support for RV64 at present.
+
+ This number must match kRiscv*_ShadowOffset* in the file
+ libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64,
+ even though 1<<36 makes more sense.  */
+  return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op)
 #undef TARGET_NEW_ADDRESS_PROFITABLE_P
 #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p

+#undef TARGET_ASAN_SHADOW_OFFSET
+#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 #include "gt-riscv.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 24c37f655c8..39c596b647a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12078,7 +12078,8 @@ is zero, which disables this optimization.
 @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_ASAN_SHADOW_OFFSET 
(void)
 Return the offset bitwise ored into shifted address to get corresponding
 Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not
-supported by the target.
+supported by the target.  May return 0 if Address Sanitizer is not supported
+by a subtarget.
 @end deftypefn

 @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK 
(unsigned HOST_WIDE_INT @var{val})
diff --git a/gcc/target.def b/gcc/target.def
index ed2da154e30..268b56b6ebd 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4452,7 +4452,8 @@ DEFHOOK
 (asan_shadow_offset,
  "Return the offset bitwise ored into shifted address to get corresponding\n\
 Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not\n\
-supported by the target.",
+supported by the target.  May return 0 if Address Sanitizer is not supported\n\
+by a subtarget.",
  unsigned HOST_WIDE_INT, (void),
  NULL)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 20e231f4d2a..cf89598252c 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1834,7 +1834,8 @@ process_options (void)
 }

   if ((flag_sanitize & SANITIZE_USER_ADDRESS)
-  &&