[committed] RISC-V: Fix wrong predicator for zero_extendsidi2_internal pattern

2021-10-27 Thread Kito Cheng
We're wrongly guard zero_extendsidi2_internal pattern both ZBA and ZBB,
only ZBA provide zero_extendsidi2 instruction.

gcc/ChangeLog

* config/riscv/riscv.md (zero_extendsidi2_internal): Allow ZBB
use this pattern.
---
 gcc/config/riscv/riscv.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index dd4c24292f2..225e5b259c1 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1311,7 +1311,7 @@ (define_insn_and_split "*zero_extendsidi2_internal"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI
(match_operand:SI 1 "nonimmediate_operand" " r,m")))]
-  "TARGET_64BIT && !(TARGET_ZBA || TARGET_ZBB)"
+  "TARGET_64BIT && !TARGET_ZBA"
   "@
#
lwu\t%0,%1"
-- 
2.33.0



[committed] RISC-V: Handle zi* extension correctly for arch-canonicalize script

2021-10-27 Thread Kito Cheng
Canonical order for z-prefixed extension are rely on the canonical order of
single letter extension, however we didn't put i into the list before,
so when we put zicsr or zifencei it will got exception.

gcc/ChangeLog:

* config/riscv/arch-canonicalize (CANONICAL_ORDER): Add `i` to
CANONICAL_ORDER.
---
 gcc/config/riscv/arch-canonicalize | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
index ea95a0693f3..c7df3c8a313 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -28,7 +28,7 @@ import itertools
 from functools import reduce
 
 
-CANONICAL_ORDER = "mafdgqlcbjtpvn"
+CANONICAL_ORDER = "imafdgqlcbjtpvn"
 LONG_EXT_PREFIXES = ['z', 's', 'h', 'x']
 
 #
-- 
2.33.0



[PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]

2021-10-27 Thread Xionghu Luo via Gcc-patches



On 2021/10/27 21:24, David Edelsohn wrote:
> On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo  wrote:
>>
>> If the second operand of __builtin_shuffle is const vector 0, and with
>> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
>>
>> gcc/ChangeLog:
>>
>> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
>> patterns match and emit for VSX xxpermdi.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/powerpc/pr102868.c: New test.
>> ---
>>  gcc/config/rs6000/rs6000.c  | 47 --
>>  gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +
>>  2 files changed, 97 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index d0730253bcc..5d802c1fa96 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
>> rtx op1,
>>  {OPTION_MASK_P8_VECTOR,
>>   BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
>>   : CODE_FOR_p8_vmrgew_v4sf_direct,
>> - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
>> + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
>> +{OPTION_MASK_VSX,
>> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
>> +  : CODE_FOR_vsx_xxpermdi_v16qi),
>> + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
>> +{OPTION_MASK_VSX,
>> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
>> +  : CODE_FOR_vsx_xxpermdi_v16qi),
>> + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
>> +{OPTION_MASK_VSX,
>> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
>> +  : CODE_FOR_vsx_xxpermdi_v16qi),
>> + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
>> +{OPTION_MASK_VSX,
>> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
>> +  : CODE_FOR_vsx_xxpermdi_v16qi),
>> + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};
> 
> If the insn_code is the same for big endian and little endian, why
> does the new code test BYTES_BIG_ENDIAN to set the same value
> (CODE_FOR_vsx_xxpermdi_v16qi)?
> 

Thanks for the catch, updated the patch as below:


[PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper 
bits [PR102868]

If the second operand of __builtin_shuffle is const vector 0, and with
specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.

gcc/ChangeLog:

* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
patterns match and emit for VSX xxpermdi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr102868.c: New test.
---
 gcc/config/rs6000/rs6000.c  | 39 +--
 gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +
 2 files changed, 89 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d0730253bcc..533560bb9ba 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -23046,7 +23046,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
rtx op1,
 {OPTION_MASK_P8_VECTOR,
  BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
  : CODE_FOR_p8_vmrgew_v4sf_direct,
- {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
+ {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
+{OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi,
+ {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
+{OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi,
+ {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
+{OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi,
+ {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
+{OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi,
+ {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};
 
   unsigned int i, j, elt, which;
   unsigned char perm[16];
@@ -23169,6 +23177,27 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
rtx op1,
  machine_mode omode = insn_data[icode].operand[0].mode;
  machine_mode imode = insn_data[icode].operand[1].mode;
 
+ rtx perm_idx = GEN_INT (0);
+ if (icode == CODE_FOR_vsx_xxpermdi_v16qi)
+   {
+ int perm_val = 0;
+ if (one_vec)
+   {
+ if (perm[0] == 8)
+   perm_val |= 2;
+ if (perm[8] == 8)
+   perm_val |= 1;
+   }
+ else
+   {
+ if (perm[0] != 0)
+   perm_val |= 2;
+ if (perm[8] != 16)
+   perm_val |= 1;
+   }
+ perm_idx = GEN

Re: [PATCH] elf: Add __libc_get_static_tls_bounds [BZ #16291]

2021-10-27 Thread Fāng-ruì Sòng via Gcc-patches
On Tue, Oct 19, 2021 at 12:37 PM Fāng-ruì Sòng  wrote:
>
> On Thu, Oct 14, 2021 at 5:13 PM Fangrui Song  wrote:
> >
> > On 2021-10-06, Fangrui Song wrote:
> > >On 2021-09-27, Fangrui Song wrote:
> > >>On 2021-09-27, Florian Weimer wrote:
> > >>>* Fangrui Song:
> > >>>
> > Sanitizer runtimes need static TLS boundaries for a variety of use 
> > cases.
> > 
> > * asan/hwasan/msan/tsan need to unpoison static TLS blocks to prevent 
> > false
> >  positives due to reusing the TLS blocks with a previous thread.
> > * lsan needs TCB for pointers into pthread_setspecific regions.
> > 
> > See https://maskray.me/blog/2021-02-14-all-about-thread-local-storage
> > for details.
> > 
> > compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp GetTls has
> > to infer the static TLS bounds from TP, _dl_get_tls_static_info, and
> > hard-coded TCB sizes. Currently this is somewhat robust for
> > aarch64/powerpc64/x86-64 but is brittle for many other architectures.
> > 
> > This patch implements __libc_get_static_tls_bounds@@GLIBC_PRIVATE which
> > is available in Android bionic since API level 31. This API allows the
> > sanitizer code to be more robust. _dl_get_tls_static_info@@GLIBC_PRIVATE
> > can probably be removed when Clang/GCC sanitizers drop reliance on it.
> > I am unclear whether the version should be GLIBC_2.*.
> > >>>
> > >>>Does this really cover the right memory region?  I assume LSAN needs
> > >>>something that identifies pointers to malloc'ed memory that are stored
> > >>>in non-malloc'ed (mmap'ed) memory.  The static TLS region is certainly a
> > >>>place where such pointers can be stored.  But struct pthread also
> > >>>contains other such pointers: the DTV, the TPP data, and POSIX TLS
> > >>>(pthread_setspecific) data, and struct pthread is not obviously part of
> > >>>the static TLS region.
> > >>
> > >>I know the pthread_setspecific leak detection is brittle but it is
> > >>currently implemented this way ;-)
> > >>
> > >>https://maskray.me/blog/2021-02-14-all-about-thread-local-storage says
> > >>
> > >>"On glibc, GetTls returned range includes
> > >>pthread::{specific_1stblock,specific} for thread-specific data keys.
> > >>There is currently a hack to ignore allocations from ld.so allocated
> > >>dynamic TLS blocks. Note: if the pthread::{specific_1stblock,specific}
> > >>pointers are encrypted, lsan cannot track the allocation."
> > >>
> > >>If pthread::{specific_1stblock,specific} use an XOR technique (like
> > >>__cxa_atexit/setjmp) the pthread_setspecific leak detection will stop
> > >>working :(
> > >>
> > >>---
> > >>
> > >>In any case, the pthread_setspecific leak detection is a relatively
> > >>minor issue. The big issue is asan/msan/tsan false positives due to
> > >>reusing an (exited) thread stack or its TLS blocks.
> > >>
> > >>Around
> > >>https://code.woboq.org/llvm/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp.html#435
> > >>there is very long messy code hard coding the thread descriptor size in
> > >>glibc.
> > >>
> > >>Android `__libc_get_static_tls_bounds(&start_addr, &end_addr);` is the
> > >>most robust one.
> > >>
> > >>---
> > >>
> > >>I ported sanitizers to musl (https://reviews.llvm.org/D93848)
> > >>in LLVM 12.0.0 and fixed some TLS block detection aarch64/ppc64 issues
> > >>(https://reviews.llvm.org/D98926 and its follow-up, due to the
> > >>complexity I couldn't get it right in the first place), so I have some
> > >>understanding about sanitizers' TLS usage.
> > >
> > >Adhemerval showed me that the __libc_get_static_tls_bounds behavior is
> > >expected on aarch64 as well (
> > >__libc_get_static_tls_bounds should match sanitizer GetTls)
> > >
> > >From https://gist.github.com/MaskRay/e035b85dce008f0c6d4997b98354d355
> > >```
> > >$ ./testrun.sh ./test-tls-boundary
> > >+++GetTls: 0x7f9c5fd6c000 4416
> > >get_tls=0x7f9c600b4050
> > >_dl_get_tls_static_info: 4416 64
> > >get_static=0x7f9c600b4070
> > >__libc_get_static_tls_bounds: 0x7f9c5fd6c000 4416
> > >```
> > >
> > >
> > >
> > >Is there any concern adding the interface?
> >
> > Gentle ping...
>
>
> CC gcc-patches which ports compiler-rt and may be interested in more
> reliable sanitizers.

PING^3


Re: [PATCH] hardened conditionals

2021-10-27 Thread Alexandre Oliva via Gcc-patches
On Oct 26, 2021, Richard Biener  wrote:

> OK.

Thanks.  I've just fixed the ChangeLog entry and pushed it:

>> * common.opt (fharden-compares): New.
>> (fharden-conditional-branches): New.
>> * doc/invoke.texi: Document new options.
>> * gimple-harden-conditionals.cc: New.

 + * Makefile.in (OBJS): Build it.

>> * passes.def: Add new passes.
>> * tree-pass.h (make_pass_harden_compares): Declare.
>> (make_pass_harden_conditional_branches): Declare.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


rs6000: Fix up flag_shrink_wrap handling in presence of -mrop-protect [PR101324]

2021-10-27 Thread Peter Bergner via Gcc-patches
Sorry for reposting, but I forgot to CC the gcc-patches mailing list. :-(


PR101324 shows a problem in disabling shrink-wrapping when using -mrop-protect
when there is a attribute optimize/pragma.  Martin's patch below moves handling
of flag_shrink_wrap so it gets re-disbled when we change or add options.

This passed bootstrap and regtesting with no regressions.  Segher, you
approved Martin's patch in the bugzilla.  Is the test case ok too?

I'll note the test case uses the "new" rop_ok effective-target function which
I submitted as a separate patch.

Peter


2021-10-27  Martin Liska  

gcc/
PR target/101324
* config/rs6000/rs6000.c (rs6000_option_override_internal): Move the
disabling of shrink-wrapping when using -mrop-protect from here...
(rs6000_override_options_after_change): ...to here.

2021-10-27  Peter Bergner  

gcc/testsuite/
PR target/101324
* gcc.target/powerpc/pr101324.c: New test.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index bac959f4ef4..95e0d2cffdd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3484,6 +3484,10 @@ rs6000_override_options_after_change (void)
 }
   else if (!OPTION_SET_P (flag_cunroll_grow_size))
 flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
+
+  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
+  if (rs6000_rop_protect)
+flag_shrink_wrap = 0;
 }
 
 #ifdef TARGET_USES_LINUX64_OPT
@@ -4048,10 +4052,6 @@ rs6000_option_override_internal (bool global_init_p)
   && ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY_ATOMIC) == 0))
 rs6000_isa_flags |= OPTION_MASK_QUAD_MEMORY_ATOMIC;
 
-  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
-  if (rs6000_rop_protect)
-flag_shrink_wrap = 0;
-
   /* If we can shrink-wrap the TOC register save separately, then use
  -msave-toc-indirect unless explicitly disabled.  */
   if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
diff --git a/gcc/testsuite/gcc.target/powerpc/pr101324.c 
b/gcc/testsuite/gcc.target/powerpc/pr101324.c
new file mode 100644
index 000..d27cc2876f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr101324.c
@@ -0,0 +1,17 @@
+/* { dg-require-effective-target rop_ok } */
+/* { dg-options "-O1 -mrop-protect -mdejagnu-cpu=power10" } */
+
+extern void foo (void);
+
+long int
+__attribute__ ((__optimize__ ("no-inline")))
+func (long int cond)
+{
+  if (cond)
+foo ();
+  return cond;
+}
+
+/* Ensure hashst comes after mflr and hashchk comes after ld 0,16(1).  */
+/* { dg-final { scan-assembler "mflr 0.*hashst 0," } } */
+/* { dg-final { scan-assembler "ld 0,16\\\(1\\\).*hashchk 0," } } */


Re: [PATCH] Enable vectorization for _Float16 floor/ceil/trunc/nearbyint/rint operations.

2021-10-27 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 25, 2021 at 4:24 PM liuhongt  wrote:
>
>   Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>   Ok for trunk?
>
I'm going to check in this patch if there's no objection.
> gcc/ChangeLog:
>
> PR target/102464
> * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF): New
> function type.
> (V16HF_FTYPE_V16HF): Ditto.
> (V32HF_FTYPE_V32HF): Ditto.
> (V8HF_FTYPE_V8HF_ROUND): Ditto.
> (V16HF_FTYPE_V16HF_ROUND): Ditto.
> (V32HF_FTYPE_V32HF_ROUND): Ditto.
> * config/i386/i386-builtin.def ( IX86_BUILTIN_FLOORPH,
> IX86_BUILTIN_CEILPH, IX86_BUILTIN_TRUNCPH,
> IX86_BUILTIN_FLOORPH256, IX86_BUILTIN_CEILPH256,
> IX86_BUILTIN_TRUNCPH256, IX86_BUILTIN_FLOORPH512,
> IX86_BUILTIN_CEILPH512, IX86_BUILTIN_TRUNCPH512): New builtin.
> * config/i386/i386-builtins.c
> (ix86_builtin_vectorized_function): Enable vectorization for
> HFmode FLOOR/CEIL/TRUNC operation.
> * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle
> new builtins.
> * config/i386/sse.md (rint2, nearbyint2): Extend
> to vector HFmodes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr102464-vrndscaleph.c: New test.
> ---
>  gcc/config/i386/i386-builtin-types.def|   7 ++
>  gcc/config/i386/i386-builtin.def  |  11 ++
>  gcc/config/i386/i386-builtins.c   |  42 +++
>  gcc/config/i386/i386-expand.c |   3 +
>  gcc/config/i386/sse.md|  12 +-
>  .../gcc.target/i386/pr102464-vrndscaleph.c| 115 ++
>  6 files changed, 184 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-vrndscaleph.c
>
> diff --git a/gcc/config/i386/i386-builtin-types.def 
> b/gcc/config/i386/i386-builtin-types.def
> index 4c355c587b5..e33f06ab30b 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -1380,3 +1380,10 @@ DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, UHI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT)
> +
> +DEF_FUNCTION_TYPE (V8HF, V8HF)
> +DEF_FUNCTION_TYPE (V16HF, V16HF)
> +DEF_FUNCTION_TYPE (V32HF, V32HF)
> +DEF_FUNCTION_TYPE_ALIAS (V8HF_FTYPE_V8HF, ROUND)
> +DEF_FUNCTION_TYPE_ALIAS (V16HF_FTYPE_V16HF, ROUND)
> +DEF_FUNCTION_TYPE_ALIAS (V32HF_FTYPE_V32HF, ROUND)
> diff --git a/gcc/config/i386/i386-builtin.def 
> b/gcc/config/i386/i386-builtin.def
> index 99217d08d37..d9eee3f373c 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -958,6 +958,10 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, 
> CODE_FOR_sse4_1_roundpd_vec_pack_sfix, "__buil
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv2df2, 
> "__builtin_ia32_roundpd_az", IX86_BUILTIN_ROUNDPD_AZ, UNKNOWN, (int) 
> V2DF_FTYPE_V2DF)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv2df2_vec_pack_sfix, 
> "__builtin_ia32_roundpd_az_vec_pack_sfix", 
> IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX, UNKNOWN, (int) V4SI_FTYPE_V2DF_V2DF)
>
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_floorph", 
> IX86_BUILTIN_FLOORPH, (enum rtx_code) ROUND_FLOOR, (int) 
> V8HF_FTYPE_V8HF_ROUND)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_ceilph", 
> IX86_BUILTIN_CEILPH, (enum rtx_code) ROUND_CEIL, (int) V8HF_FTYPE_V8HF_ROUND)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_truncph", 
> IX86_BUILTIN_TRUNCPH, (enum rtx_code) ROUND_TRUNC, (int) 
> V8HF_FTYPE_V8HF_ROUND)
> +
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> "__builtin_ia32_floorps", IX86_BUILTIN_FLOORPS, (enum rtx_code) ROUND_FLOOR, 
> (int) V4SF_FTYPE_V4SF_ROUND)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> "__builtin_ia32_ceilps", IX86_BUILTIN_CEILPS, (enum rtx_code) ROUND_CEIL, 
> (int) V4SF_FTYPE_V4SF_ROUND)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> "__builtin_ia32_truncps", IX86_BUILTIN_TRUNCPS, (enum rtx_code) ROUND_TRUNC, 
> (int) V4SF_FTYPE_V4SF_ROUND)
> @@ -1090,6 +1094,10 @@ BDESC (OPTION_MASK_ISA_AVX, 0, 
> CODE_FOR_roundv4df2_vec_pack_sfix, "__builtin_ia3
>  BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_roundpd_vec_pack_sfix256, 
> "__builtin_ia32_floorpd_vec_pack_sfix256", 
> IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX256, (enum rtx_code) ROUND_FLOOR, (int) 
> V8SI_FTYPE_V4DF_V4DF_ROUND)
>  BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_roundpd_vec_pack_sfix256, 
> "__builtin_ia32_ceilpd_vec_pack_sfix256", 
> IX86_BUILTIN_CEILPD_VEC_PACK_SFIX256, (enum rtx_code) ROUND_CEIL, (int) 
> V8SI_FTYPE_V4DF_V4DF_ROUND)
>
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> C

Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.

2021-10-27 Thread guojiufu via Gcc-patches



I just had a test on ppc64le, this patch pass bootstrap and regtest.
Is this patch OK for trunk?

Thanks for any comments.

BR,
Jiufu

On 2021-10-18 21:37, Jiufu Guo wrote:

With reference the discussions in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578672.html

Base on the patches in above discussion, we may draft a patch to fix 
the

issue.

In this patch, to make sure it is ok to change '{b0,s0} op {b1,s1}' to
'{b0,s0-s1} op {b1,0}', we also compute the condition which could 
assume

both 2 ivs are not overflow/wrap: the niter "of '{b0,s0-s1} op {b1,0}'"
< the niter "of untill wrap for iv0 or iv1".

Does this patch make sense?

BR,
Jiufu Guo

gcc/ChangeLog:

PR tree-optimization/100740
* tree-ssa-loop-niter.c (number_of_iterations_cond): Add
assume condition for combining of two IVs

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr100740.c: New test.
---
 gcc/tree-ssa-loop-niter.c | 103 +++---
 .../gcc.c-torture/execute/pr100740.c  |  11 ++
 2 files changed, 99 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 75109407124..f2987a4448d 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1863,29 +1863,102 @@ number_of_iterations_cond (class loop *loop,

  provided that either below condition is satisfied:

-   a) the test is NE_EXPR;
-   b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
+   a) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
+   b) assumptions in below table also need to be satisfied.
+
+   | iv0 | iv1 | assum (iv0step > iv1->step;
+   The second three rows: iv0->step < iv1->step.

  This rarely occurs in practice, but it is simple enough to 
manage.  */

   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
 {
+  if (TREE_CODE (iv0->step) != INTEGER_CST
+ || TREE_CODE (iv1->step) != INTEGER_CST)
+   return false;
+  if (!iv0->no_overflow || !iv1->no_overflow)
+   return false;
+
   tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
-  tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
-  iv0->step, iv1->step);
-
-  /* No need to check sign of the new step since below code takes 
care

-of this well.  */
-  if (code != NE_EXPR
- && (TREE_CODE (step) != INTEGER_CST
- || !iv0->no_overflow || !iv1->no_overflow))
+  tree step
+	= fold_binary_to_constant (MINUS_EXPR, step_type, iv0->step, 
iv1->step);

+
+  if (code != NE_EXPR && tree_int_cst_sign_bit (step))
return false;

-  iv0->step = step;
-  if (!POINTER_TYPE_P (type))
-   iv0->no_overflow = false;
+  bool positive0 = !tree_int_cst_sign_bit (iv0->step);
+  bool positive1 = !tree_int_cst_sign_bit (iv1->step);

-  iv1->step = build_int_cst (step_type, 0);
-  iv1->no_overflow = true;
+  /* Cases in rows 2 and 4 of above table.  */
+  if ((positive0 && !positive1) || (!positive0 && positive1))
+   {
+ iv0->step = step;
+ iv1->step = build_int_cst (step_type, 0);
+ return number_of_iterations_cond (loop, type, iv0, code, iv1,
+   niter, only_exit, every_iteration);
+   }
+
+  affine_iv i_0, i_1;
+  class tree_niter_desc num;
+  i_0 = *iv0;
+  i_1 = *iv1;
+  i_0.step = step;
+  i_1.step = build_int_cst (step_type, 0);
+  if (!number_of_iterations_cond (loop, type, &i_0, code, &i_1, 
&num,

+ only_exit, every_iteration))
+   return false;
+
+  affine_iv i0, i1;
+  class tree_niter_desc num_wrap;
+  i0 = *iv0;
+  i1 = *iv1;
+
+  /* Reset iv0 and iv1 to calculate the niter which cause 
overflow.  */

+  if (tree_int_cst_lt (i1.step, i0.step))
+   {
+ if (positive0 && positive1)
+   i0.step = build_int_cst (step_type, 0);
+ else if (!positive0 && !positive1)
+   i1.step = build_int_cst (step_type, 0);
+ if (code == NE_EXPR)
+   code = LT_EXPR;
+   }
+  else
+   {
+ if (positive0 && positive1)
+   i1.step = build_int_cst (step_type, 0);
+ else if (!positive0 && !positive1)
+   i0.step = build_int_cst (step_type, 0);
+ gcc_assert (code == NE_EXPR);
+ code = GT_EXPR;
+   }
+
+  /* Calculate the niter which cause overflow.  */
+  if (!number_of_iterations_cond (loop, type, &i0, code, &i1, 
&num_wrap,

+ only_exit, every_iteration))
+   return false;
+
+  /* Make assumption there is no overflow. */
+  tree assum
+

Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread Kewen.Lin via Gcc-patches
on 2021/10/28 上午9:43, David Edelsohn wrote:
> On Wed, Oct 27, 2021 at 9:30 PM Kewen.Lin  wrote:
>>
>> Hi David,
>>
>> Thanks for the review!
>>
>> on 2021/10/27 下午9:12, David Edelsohn wrote:
>>> On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:

 Hi,

 As PR102767 shows, the commit r12-3482 exposed one ICE in function
 rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
 on rs6000 (See define_expand "movmisalign"), so it return true in
 rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
 the cost querying rs6000_builtin_vectorization_cost, we don't have
 the arms to handle the V1TI input under (TARGET_VSX &&
 TARGET_ALLOW_MOVMISALIGN).

 The proposed fix is to add the consideration for V1TI, simply make it
 as the cost for doubleword which is apparently bigger than the cost of
 scalar, won't have the vectorization to happen, just to keep consistency
 and avoid ICE.  Another thought is to not support movmisalign for V1TI,
 but it sounds like a bad idea since it doesn't match the reality.

 Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
 powerpc64-linux-gnu P8.

 Is it ok for trunk?

 BR,
 Kewen
 -
 gcc/ChangeLog:

 PR target/102767
 * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
 Consider
 V1T1 mode for unaligned load and store.

 gcc/testsuite/ChangeLog:

 PR target/102767
 * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.

 diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
 index b7ea1483da5..73d3e06c3fc 100644
 --- a/gcc/config/rs6000/rs6000.c
 +++ b/gcc/config/rs6000/rs6000.c
 @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
 vect_cost_for_stmt type_of_cost,
 if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
   {
 elements = TYPE_VECTOR_SUBPARTS (vectype);
 -   if (elements == 2)
 +   /* See PR102767, consider V1TI to keep consistency.  */
 +   if (elements == 2 || elements == 1)
   /* Double word aligned.  */
   return 4;

 @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
 vect_cost_for_stmt type_of_cost,

  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
{
 -elements = TYPE_VECTOR_SUBPARTS (vectype);
 -if (elements == 2)
 -  /* Double word aligned.  */
 -  return 2;
 +   elements = TYPE_VECTOR_SUBPARTS (vectype);
 +   /* See PR102767, consider V1TI to keep consistency.  */
 +   if (elements == 2 || elements == 1)
 + /* Double word aligned.  */
 + return 2;
>>>
>>> This section of the patch incorrectly changes the indentation.  Please
>>> use the correct indentation.
>>>
>>
>> The indentation change is intentional since the original identation is
>> wrong (more than 8 spaces leading the lines), there are more wrong
>> identation lines above the first changed line, but I thought it seems a
>> bad idea to fix them too when they are unrelated to what this patch
>> wants to fix, so I left them alone.
>>
>> With the above clarification, may I push this patch without any updates
>> for the mentioned indentation issue?
> 
> If you correct the indentation, you should adjust it for the entire
> block, not just the lines that you change.  If you want to fix the
> entire block to TAB+spaces as well, okay.  You didn't mention that you
> were fixing the indentation in the explanation of the patch.
> 

Sorry for not mentioning that.  Got it, I'll reformat the entire block then,
also with additional notes in the commit log.

Thanks again.

BR,
Kewen

> Thank, David
> 
>>

  if (elements == 4)
{
 diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
 b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
 new file mode 100644
 index 000..a4122482989
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
 @@ -0,0 +1,21 @@
 +! { dg-require-effective-target powerpc_vsx_ok }
 +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
 +
 +INTERFACE
 +  FUNCTION elemental_mult (a, b, c)
 +type(*), DIMENSION(..) :: a, b, c
 +  END
 +END INTERFACE
 +
 +allocatable  z
 +integer, dimension(2,2) :: a, b
 +call test_CFI_address
 +contains
 +  subroutine test_CFI_address
 +if (elemental_mult (z, x, y) .ne. 0) stop
 +a = reshape ([4,3,2,1], [2,2])
 +b = reshape ([2,3,4,5], [2,2])
 +if (elemental_mult (i, a, b) .ne. 0) stop
 +  end
 +end
 +

>>>
>>> The patch is okay with the indentatio

Re: [RFC] Don't move cold code out of loop by checking bb count

2021-10-27 Thread Xionghu Luo via Gcc-patches



On 2021/10/27 20:54, Jan Hubicka wrote:
>> Hi,
>>
>> On 2021/9/28 20:09, Richard Biener wrote:
>>> On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:

 Update the patch to v3, not sure whether you prefer the paste style
 and continue to link the previous thread as Segher dislikes this...


 [PATCH v3] Don't move cold code out of loop by checking bb count


 Changes:
 1. Handle max_loop in determine_max_movement instead of
 outermost_invariant_loop.
 2. Remove unnecessary changes.
 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
 can_sm_ref_p.
 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
 infinite loop when implementing v1 and the iteration is missed to be
 updated actually.

 v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
 v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html

 There was a patch trying to avoid move cold block out of loop:

 https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html

 Richard suggested to "never hoist anything from a bb with lower execution
 frequency to a bb with higher one in LIM invariantness_dom_walker
 before_dom_children".

 In gimple LIM analysis, add find_coldest_out_loop to move invariants to
 expected target loop, if profile count of the loop bb is colder
 than target loop preheader, it won't be hoisted out of loop.
 Likely for store motion, if all locations of the REF in loop is cold,
 don't do store motion of it.

 SPEC2017 performance evaluation shows 1% performance improvement for
 intrate GEOMEAN and no obvious regression for others.  Especially,
 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
 largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
 on P8LE.

 gcc/ChangeLog:

 * loop-invariant.c (find_invariants_bb): Check profile count
 before motion.
 (find_invariants_body): Add argument.
 * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
 (determine_max_movement): Use find_coldest_out_loop.
 (move_computations_worker): Adjust and fix iteration udpate.
 (execute_sm_exit): Check pointer validness.
 (class ref_in_loop_hot_body): New functor.
 (ref_in_loop_hot_body::operator): New.
 (can_sm_ref_p): Use for_all_locs_in_loop.

 gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/recip-3.c: Adjust.
 * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
 * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
 * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
 ---
  gcc/loop-invariant.c   | 10 ++--
  gcc/tree-ssa-loop-im.c | 61 --
  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
  7 files changed, 165 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c

 diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
 index fca0c2b24be..5c3be7bf0eb 100644
 --- a/gcc/loop-invariant.c
 +++ b/gcc/loop-invariant.c
 @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
 always_reached, bool always_executed)
 call.  */

  static void
 -find_invariants_bb (basic_block bb, bool always_reached, bool 
 always_executed)
 +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
 +   bool always_executed)
  {
rtx_insn *insn;
 +  basic_block preheader = loop_preheader_edge (loop)->src;
 +
 +  if (preheader->count > bb->count)
 +return;

FOR_BB_INSNS (bb, insn)
  {
 @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
 *body,
unsigned i;

for (i = 0; i < loop->num_nodes; i++)
 -find_invariants_bb (body[i],
 -   bitmap_bit_p (always_reached, i),
 +find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
 bitmap_bit_p (always_executed, i));
  }

 diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
 index 4b187c2cdaf..655fab03442 100644
 --- a/gcc/tree-ssa-loop-im.c
 +++ b/gcc/tree-ssa-loop-im.c
 @@ -417,6 +417,28 @@ movement_possibility (gimple 

Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread David Edelsohn via Gcc-patches
On Wed, Oct 27, 2021 at 9:30 PM Kewen.Lin  wrote:
>
> Hi David,
>
> Thanks for the review!
>
> on 2021/10/27 下午9:12, David Edelsohn wrote:
> > On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> As PR102767 shows, the commit r12-3482 exposed one ICE in function
> >> rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
> >> on rs6000 (See define_expand "movmisalign"), so it return true in
> >> rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
> >> the cost querying rs6000_builtin_vectorization_cost, we don't have
> >> the arms to handle the V1TI input under (TARGET_VSX &&
> >> TARGET_ALLOW_MOVMISALIGN).
> >>
> >> The proposed fix is to add the consideration for V1TI, simply make it
> >> as the cost for doubleword which is apparently bigger than the cost of
> >> scalar, won't have the vectorization to happen, just to keep consistency
> >> and avoid ICE.  Another thought is to not support movmisalign for V1TI,
> >> but it sounds like a bad idea since it doesn't match the reality.
> >>
> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> >> powerpc64-linux-gnu P8.
> >>
> >> Is it ok for trunk?
> >>
> >> BR,
> >> Kewen
> >> -
> >> gcc/ChangeLog:
> >>
> >> PR target/102767
> >> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
> >> Consider
> >> V1T1 mode for unaligned load and store.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> PR target/102767
> >> * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index b7ea1483da5..73d3e06c3fc 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
> >> vect_cost_for_stmt type_of_cost,
> >> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
> >>   {
> >> elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> -   if (elements == 2)
> >> +   /* See PR102767, consider V1TI to keep consistency.  */
> >> +   if (elements == 2 || elements == 1)
> >>   /* Double word aligned.  */
> >>   return 4;
> >>
> >> @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
> >> vect_cost_for_stmt type_of_cost,
> >>
> >>  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
> >>{
> >> -elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> -if (elements == 2)
> >> -  /* Double word aligned.  */
> >> -  return 2;
> >> +   elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> +   /* See PR102767, consider V1TI to keep consistency.  */
> >> +   if (elements == 2 || elements == 1)
> >> + /* Double word aligned.  */
> >> + return 2;
> >
> > This section of the patch incorrectly changes the indentation.  Please
> > use the correct indentation.
> >
>
> The indentation change is intentional since the original identation is
> wrong (more than 8 spaces leading the lines), there are more wrong
> identation lines above the first changed line, but I thought it seems a
> bad idea to fix them too when they are unrelated to what this patch
> wants to fix, so I left them alone.
>
> With the above clarification, may I push this patch without any updates
> for the mentioned indentation issue?

If you correct the indentation, you should adjust it for the entire
block, not just the lines that you change.  If you want to fix the
entire block to TAB+spaces as well, okay.  You didn't mention that you
were fixing the indentation in the explanation of the patch.

Thank, David

>
> >>
> >>  if (elements == 4)
> >>{
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
> >> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> >> new file mode 100644
> >> index 000..a4122482989
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> >> @@ -0,0 +1,21 @@
> >> +! { dg-require-effective-target powerpc_vsx_ok }
> >> +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
> >> +
> >> +INTERFACE
> >> +  FUNCTION elemental_mult (a, b, c)
> >> +type(*), DIMENSION(..) :: a, b, c
> >> +  END
> >> +END INTERFACE
> >> +
> >> +allocatable  z
> >> +integer, dimension(2,2) :: a, b
> >> +call test_CFI_address
> >> +contains
> >> +  subroutine test_CFI_address
> >> +if (elemental_mult (z, x, y) .ne. 0) stop
> >> +a = reshape ([4,3,2,1], [2,2])
> >> +b = reshape ([2,3,4,5], [2,2])
> >> +if (elemental_mult (i, a, b) .ne. 0) stop
> >> +  end
> >> +end
> >> +
> >>
> >
> > The patch is okay with the indentation correction.
> >
> > Thanks, David
> >
>
> Thanks!
>
> BR,
> Kewen


[PATCH] rs6000: MMA test case emits wrong code when building a vector pair

2021-10-27 Thread Peter Bergner via Gcc-patches
PR102976 shows a test case where we generate wrong code when building
a vector pair from 2 vector registers.  The bug here is that with unlucky
register assignments, we can clobber one of the input operands before
we write both registers of the output operand.  The solution is to use
early-clobbers in the assemble pair and accumulator patterns.

This passed bootstrap and regtesting with no regressions and our
OpenBLAS team has confirmed it fixes the issues they reported.
Ok for mainline?

Ok for GCC 11 too after a few days on trunk?

Peter


gcc/
PR target/102976
* config/rs6000/mma.md (*vsx_assemble_pair): Add early-clobber for
output operand.
(*mma_assemble_acc): Likewise.

gcc/testsuite/
PR target/102976
* gcc.target/powerpc/pr102976.c: New test.

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 1990a2183f6..f0ea99963f7 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -339,7 +339,7 @@ (define_expand "vsx_assemble_pair"
 })
 
 (define_insn_and_split "*vsx_assemble_pair"
-  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+  [(set (match_operand:OO 0 "vsx_register_operand" "=&wa")
(unspec:OO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
(match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")]
UNSPEC_MMA_ASSEMBLE))]
@@ -405,7 +405,7 @@ (define_expand "mma_assemble_acc"
 })
 
 (define_insn_and_split "*mma_assemble_acc"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
(unspec:XO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
(match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")
(match_operand:V16QI 3 "mma_assemble_input_operand" "mwa")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102976.c 
b/gcc/testsuite/gcc.target/powerpc/pr102976.c
new file mode 100644
index 000..a8de8f056f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102976.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#include 
+void
+bug (__vector_pair *dst)
+{
+  register vector unsigned char vec0 asm ("vs44");
+  register vector unsigned char vec1 asm ("vs32");
+  __builtin_vsx_build_pair (dst, vec0, vec1);
+}
+
+/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */
+/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */


Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread Kewen.Lin via Gcc-patches
Hi David,

Thanks for the review!

on 2021/10/27 下午9:12, David Edelsohn wrote:
> On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR102767 shows, the commit r12-3482 exposed one ICE in function
>> rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
>> on rs6000 (See define_expand "movmisalign"), so it return true in
>> rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
>> the cost querying rs6000_builtin_vectorization_cost, we don't have
>> the arms to handle the V1TI input under (TARGET_VSX &&
>> TARGET_ALLOW_MOVMISALIGN).
>>
>> The proposed fix is to add the consideration for V1TI, simply make it
>> as the cost for doubleword which is apparently bigger than the cost of
>> scalar, won't have the vectorization to happen, just to keep consistency
>> and avoid ICE.  Another thought is to not support movmisalign for V1TI,
>> but it sounds like a bad idea since it doesn't match the reality.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> PR target/102767
>> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
>> Consider
>> V1T1 mode for unaligned load and store.
>>
>> gcc/testsuite/ChangeLog:
>>
>> PR target/102767
>> * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index b7ea1483da5..73d3e06c3fc 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
>> vect_cost_for_stmt type_of_cost,
>> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
>>   {
>> elements = TYPE_VECTOR_SUBPARTS (vectype);
>> -   if (elements == 2)
>> +   /* See PR102767, consider V1TI to keep consistency.  */
>> +   if (elements == 2 || elements == 1)
>>   /* Double word aligned.  */
>>   return 4;
>>
>> @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
>> vect_cost_for_stmt type_of_cost,
>>
>>  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
>>{
>> -elements = TYPE_VECTOR_SUBPARTS (vectype);
>> -if (elements == 2)
>> -  /* Double word aligned.  */
>> -  return 2;
>> +   elements = TYPE_VECTOR_SUBPARTS (vectype);
>> +   /* See PR102767, consider V1TI to keep consistency.  */
>> +   if (elements == 2 || elements == 1)
>> + /* Double word aligned.  */
>> + return 2;
> 
> This section of the patch incorrectly changes the indentation.  Please
> use the correct indentation.
> 

The indentation change is intentional since the original identation is
wrong (more than 8 spaces leading the lines), there are more wrong
identation lines above the first changed line, but I thought it seems a
bad idea to fix them too when they are unrelated to what this patch
wants to fix, so I left them alone.

With the above clarification, may I push this patch without any updates
for the mentioned indentation issue?

>>
>>  if (elements == 4)
>>{
>> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
>> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
>> new file mode 100644
>> index 000..a4122482989
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
>> @@ -0,0 +1,21 @@
>> +! { dg-require-effective-target powerpc_vsx_ok }
>> +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
>> +
>> +INTERFACE
>> +  FUNCTION elemental_mult (a, b, c)
>> +type(*), DIMENSION(..) :: a, b, c
>> +  END
>> +END INTERFACE
>> +
>> +allocatable  z
>> +integer, dimension(2,2) :: a, b
>> +call test_CFI_address
>> +contains
>> +  subroutine test_CFI_address
>> +if (elemental_mult (z, x, y) .ne. 0) stop
>> +a = reshape ([4,3,2,1], [2,2])
>> +b = reshape ([2,3,4,5], [2,2])
>> +if (elemental_mult (i, a, b) .ne. 0) stop
>> +  end
>> +end
>> +
>>
> 
> The patch is okay with the indentation correction.
> 
> Thanks, David
> 

Thanks!

BR,
Kewen


Re: [PATCH] AVX512FP16: Optimize _Float16 reciprocal for div and sqrt

2021-10-27 Thread Hongtao Liu via Gcc-patches
On Tue, Oct 26, 2021 at 5:51 PM Hongyu Wang via Gcc-patches
 wrote:
>
> Hi,
>
> For _Float16 type, add insn and expanders to optimize x / y to
> x * rcp (y), and x / sqrt (y) to x * rsqrt (y).
> As Half float only have minor precision difference between div and
> mul * rcp, there is no need for Newton-Rhapson approximation.
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> Ok for master?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386.c (use_rsqrt_p): Add mode parameter, enable
>   HFmode rsqrt without TARGET_SSE_MATH.
> (ix86_optab_supported_p): Refactor rint, adjust floor, ceil,
> btrunc condition to be restricted by -ftrapping-math, adjust
> use_rsqrt_p function call.
> * config/i386/i386.md (rcphf2): New define_insn.
> (rsqrthf2): Likewise.
> * config/i386/sse.md (div3): Change VF2H to VF2.
> (div3): New expander for HF mode.
> (rsqrt2): Likewise.
> (*avx512fp16_vmrcpv8hf2): New define_insn for rpad pass.
> (*avx512fp16_vmrsqrtv8hf2): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512fp16-recip-1.c: New test.
> * gcc.target/i386/avx512fp16-recip-2.c: Ditto.
> * gcc.target/i386/pr102464.c: Add -fno-trapping-math.
> ---
>  gcc/config/i386/i386.c| 29 +++---
>  gcc/config/i386/i386.md   | 44 -
>  gcc/config/i386/sse.md| 63 +++-
>  .../gcc.target/i386/avx512fp16-recip-1.c  | 43 
>  .../gcc.target/i386/avx512fp16-recip-2.c  | 97 +++
>  gcc/testsuite/gcc.target/i386/pr102464.c  |  2 +-
>  6 files changed, 258 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-recip-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-recip-2.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 299e1ab2621..c5789365d3b 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -18905,9 +18905,10 @@ ix86_vectorize_builtin_scatter (const_tree vectype,
> 1.0/sqrt.  */
>
>  static bool
> -use_rsqrt_p ()
> +use_rsqrt_p (machine_mode mode)
>  {
> -  return (TARGET_SSE && TARGET_SSE_MATH
> +  return ((mode == HFmode
> +  || (TARGET_SSE && TARGET_SSE_MATH))
>   && flag_finite_math_only
>   && !flag_trapping_math
>   && flag_unsafe_math_optimizations);
> @@ -23603,29 +23604,27 @@ ix86_optab_supported_p (int op, machine_mode mode1, 
> machine_mode,
>return opt_type == OPTIMIZE_FOR_SPEED;
>
>  case rint_optab:
> -  if (mode1 == HFmode)
> -   return true;
> -  else if (SSE_FLOAT_MODE_P (mode1)
> -  && TARGET_SSE_MATH
> -  && !flag_trapping_math
> -  && !TARGET_SSE4_1)
> +  if (SSE_FLOAT_MODE_P (mode1)
> + && TARGET_SSE_MATH
> + && !flag_trapping_math
> + && !TARGET_SSE4_1
> + && mode1 != HFmode)
> return opt_type == OPTIMIZE_FOR_SPEED;
>return true;
>
>  case floor_optab:
>  case ceil_optab:
>  case btrunc_optab:
> -  if (mode1 == HFmode)
> -   return true;
> -  else if (SSE_FLOAT_MODE_P (mode1)
> -  && TARGET_SSE_MATH
> -  && !flag_trapping_math
> -  && TARGET_SSE4_1)
> +  if (((SSE_FLOAT_MODE_P (mode1)
> +   && TARGET_SSE_MATH
> +   && TARGET_SSE4_1)
> +  || mode1 == HFmode)
> + && !flag_trapping_math)
> return true;
>return opt_type == OPTIMIZE_FOR_SPEED;
>
>  case rsqrt_optab:
> -  return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p ();
> +  return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1);
>
>  default:
>return true;
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index e733a40fc90..11535df5425 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -8417,11 +8417,27 @@
> (match_operand:XF 2 "register_operand")))]
>"TARGET_80387")
>
> +/* There is no more precision loss than Newton-Rhapson approximation
> +  when using HFmode rcp/rsqrt, so do the transformation directly under
> +  TARGET_RECIP_DIV and fast-math.  */
>  (define_expand "divhf3"
>[(set (match_operand:HF 0 "register_operand")
> (div:HF (match_operand:HF 1 "register_operand")
>(match_operand:HF 2 "nonimmediate_operand")))]
> -  "TARGET_AVX512FP16")
> +  "TARGET_AVX512FP16"
> +{
> +  if (TARGET_RECIP_DIV
> +  && optimize_insn_for_speed_p ()
> +  && flag_finite_math_only && !flag_trapping_math
> +  && flag_unsafe_math_optimizations)
> +{
> +  rtx op = gen_reg_rtx (HFmode);
> +  operands[2] = force_reg (HFmode, operands[2]);
> +  emit_insn (gen_rcphf2 (op, operands[2]));
> +  emit_insn (gen_mulhf3 (operands[0], operands[1], op));
> +  DONE;
> +}
> +})
>
>  (define_expand "div3"
>[(set (match

Re: [COMMITTED] Kill second order relations in the path solver.

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 27 Oct 2021 20:13:21 +0200
Aldy Hernandez via Gcc-patches  wrote:

[would have to think about this some more but it's late here. Nits:]

> diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
> index 2acf375ca9a..0ad4f7a9495 100644
> --- a/gcc/value-relation.cc
> +++ b/gcc/value-relation.cc
> @@ -1297,8 +1297,9 @@ path_oracle::killing_def (tree ssa)
>fprintf (dump_file, "\n");
>  }
>  
> +  unsigned v = SSA_NAME_VERSION (ssa);
>bitmap b = BITMAP_ALLOC (&m_bitmaps);
> -  bitmap_set_bit (b, SSA_NAME_VERSION (ssa));
> +  bitmap_set_bit (b, v);
>equiv_chain *ptr = (equiv_chain *) obstack_alloc (&m_chain_obstack,
>   sizeof (equiv_chain));
>ptr->m_names = b;
> @@ -1306,6 +1307,24 @@ path_oracle::killing_def (tree ssa)
>ptr->m_next = m_equiv.m_next;
>m_equiv.m_next = ptr;
>bitmap_ior_into (m_equiv.m_names, b);
> +
> +  // Walk the relation list an remove SSA from any relations.

s/an /and /

> +  if (!bitmap_bit_p (m_relations.m_names, v))
> +return;
> +
> +  bitmap_clear_bit (m_relations.m_names, v);

IIRC bitmap_clear_bit returns true if the bit was set, false otherwise,
so should be used as if(!bitmap_clear_bit) above.
I would not be surprised if this generates better code as we probably
do not grok to optimize the !bit_p else clear_bit combo. Shame (?).

> +  relation_chain **prev = &(m_relations.m_head);

s/[()]//
thanks,

> +  relation_chain *next = NULL;
> +  for (relation_chain *ptr = m_relations.m_head; ptr; ptr = next)
> +{
> +  gcc_checking_assert (*prev == ptr);
> +  next = ptr->m_next;
> +  if (SSA_NAME_VERSION (ptr->op1 ()) == v
> +   || SSA_NAME_VERSION (ptr->op2 ()) == v)
> + *prev = ptr->m_next;
> +  else
> + prev = &(ptr->m_next);
> +}
>  }
>  
>  // Register relation K between SSA1 and SSA2, resolving unknowns by



Re: RISCV: Add zmmul extension

2021-10-27 Thread Jim Wilson
On Wed, Oct 27, 2021 at 12:14 AM Kito Cheng  wrote:

> Otherwise it is LGTM, but I'm just surprised it's still 0.1 and not frozen
> yet.
>

We should have binutils support first before we have gcc support.
Otherwise that may lead to binutils errors later when zmmul gets passed
down to binutils.  I didn't see a binutils patch yet.

Jim


Re: [PATCH] rs6000: Fix bootstrap (libffi)

2021-10-27 Thread Segher Boessenkool
Hi!

On Wed, Oct 27, 2021 at 11:44:59AM -0700, H.J. Lu wrote:
> On Mon, Oct 25, 2021 at 4:39 PM Segher Boessenkool
>  wrote:
> > This fixes bootstrap for the current problems building libffi.
> >
> > I'll work on getting this into upstream as well.  If the maintainers
> > want it done differently, at least we have bootstrap working again
> > until then.

> I am checking in this patch:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582717.html

Ah thanks :-)  I thought I'd get it fixed upstream soon, but that might
not happen (or not in time, etc.)  This is a good idea no matter what.


Segher


Re: dejagnu version update?

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On Sat, 4 Aug 2018 18:32:24 +0200
Bernhard Reutner-Fischer  wrote:

> On Tue, 16 May 2017 at 21:08, Mike Stump  wrote:
> >
> > On May 16, 2017, at 5:16 AM, Jonathan Wakely  wrote: 
> >  
> > > The change I care about in 1.5.3  
> >
> > So, we haven't talked much about the version people want most.  If we 
> > update, might as well get something that more people care about.  1.5.3 is 
> > in ubuntu LTS 16.04 and Fedora 24, so it's been around awhile.  SUSU is 
> > said to be using 1.6, in the post 1.4.4 systems.  People stated they want 
> > 1.5.2 and 1.5.3, so, I'm inclined to say, let's shoot for 1.5.3 when we do 
> > update.
> >
> > As for the machines in the FSF compile farm, nah, tail wagging the dog.  
> > I'd rather just update the requirement, and the owners or users of those 
> > machines can install a new dejagnu, if they are using one that is too old 
> > and they want to support testing gcc.  
> 
> So.. let me ping that, again, now that another year has passed :)

or another 3 or 4 :)
> 
> PS: Recap: https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html was
> later applied as
> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
> and was part of the dejagnu-1.5.2 release from 2015. Jonathan requires
> 1.5.3 for libstdc++ testing.
(i.e.
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347
 )
> The libdirs fix would allow us to remove the 150 occurrences of the
> load_gcc_lib hack, refer to the patch to the fortran list back then.
> AFAIR this is still not fixed: +# BUG: gcc-dg calls
> gcc-set-multilib-library-path but does not load gcc-defs!
> 
> debian-stable (i think 9 ATM), Ubuntu LTS ship versions recent enough
> to contain both fixes. Commercial distros seem to ship fixed versions,
> too.

It seems in May 2020 there was a thread on gcc with about the same
subject: https://gcc.gnu.org/pipermail/gcc/2020-May/232427.html
where Mike suggests to have approved to bump the required minimum
version to 1.5.3.
So who's in the position to update the
https://gcc.gnu.org/install/prerequisites.html
to s/1.4.4/1.5.3/g && git commit -m 'bump dejagnu required version' ?

Just asking patiently and politely.
I don't want to rush anybody into such a bump :)

But as you may remember, folks routinely run afoul of using too old
versions (without the 5256bd8 multilib prepending for example, recently
someone doing ARM stuff IIRC) so a bump would just be fair IMHO.

Maybe now, for gcc-12, is the time to bump prerequisites to 1.5.3?

thanks and sorry for my impatience (and, once again, the noise).
cheers,


[r12-4744 Regression] FAIL: gcc.dg/guality/pr41616-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION execution test on Linux/x86_64

2021-10-27 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

2f0b6a971a051f6e687a15dd2fa4bf431381e551 is the first bad commit
commit 2f0b6a971a051f6e687a15dd2fa4bf431381e551
Author: Aldy Hernandez 
Date:   Wed Oct 27 18:22:29 2021 +0200

Reorder relation calculating code in the path solver.

caused

FAIL: gcc.dg/guality/pr41616-1.c   -O2  -DPREVENT_OPTIMIZATION  execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  execution 
test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4744/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] or1k: Add return address argument to _mcount call

2021-10-27 Thread Stafford Horne via Gcc-patches
This fixes an issue in the glibc port I am working on where the build
fails due to the warning:

  error: calling ‘__builtin_return_address’ with a nonzero argument is unsafe 
[-Werror=frame-address]

This is due to how the current implementation of _mcount in glibc uses
__builtin_return_address with a count argument of 1.

Fix that by passing the value of LR_REGNUM to the _mcount function,
effectivtly providing the value _mcount is after.

This is an ABI change, but I think it's OK because the glibc port for
or1k is not yet upstreamed.  Also, I think just adding an argument
should not break anything anyway.

gcc/ChangeLog:

* config/or1k/or1k.h (PROFILE_HOOK): Add return address argument
to _mcount.
---
 gcc/config/or1k/or1k.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h
index fe01ab81ead..4603cb67160 100644
--- a/gcc/config/or1k/or1k.h
+++ b/gcc/config/or1k/or1k.h
@@ -387,9 +387,10 @@ do {\
profiling a function entry.  */
 #define PROFILE_HOOK(LABEL)\
   {\
-rtx fun;   \
+rtx fun, ra;   \
+ra = get_hard_reg_initial_val (Pmode, LR_REGNUM);  \
 fun = gen_rtx_SYMBOL_REF (Pmode, "_mcount");   \
-emit_library_call (fun, LCT_NORMAL, VOIDmode); \
+emit_library_call (fun, LCT_NORMAL, VOIDmode, ra, Pmode);  \
   }
 
 /* All the work is done in PROFILE_HOOK, but this is still required.  */
-- 
2.31.1



Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-27 Thread Jason Merrill via Gcc-patches

On 10/27/21 17:10, Patrick Palka wrote:

On Wed, 27 Oct 2021, Jason Merrill wrote:


On 10/27/21 14:54, Patrick Palka wrote:

On Tue, 26 Oct 2021, Jakub Jelinek wrote:


On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote:

The performance impact of the other calls to
cxx_eval_outermost_const_expr
from p_c_e_1 is probably already mostly mitigated by the constexpr call
cache and the fact that we try to evaluate all calls to constexpr
functions during cp_fold_function anyway (at least with -O).  So trial


constexpr function bodies don't go through cp_fold_function
(intentionally,
so that we don't optimize away UB), the bodies are copied before the trees
of the
normal copy are folded.


Ah right, I had forgotten about that..

Here's another approach that doesn't need to remove trial evaluation for
&&/||.  The idea is to first quietly check if the second operand is
potentially constant _before_ performing trial evaluation of the first
operand.  This speeds up the case we care about (both operands are
potentially constant) without regressing any diagnostics.  We have to be
careful about emitting bogus diagnostics when tf_error is set, hence the
first hunk below which makes p_c_e_1 always proceed quietly first, and
replay noisily in case of error (similar to how satisfaction works).

Would something like this be preferable?


Seems plausible, though doubling the number of stack frames is a downside.


Whoops, good point..  The noisy -> quiet adjustment only needs to
be performed during the outermost call to p_c_e_1, and not also during
each recursive call.  The revised diff below fixes this thinko, and so
only a single extra stack frame is needed AFAICT.


What did you think of Jakub's suggestion of linearizing the terms?


IIUC that would fix the quadraticness, but it wouldn't address that
we end up evaluating the entire expression twice, once during the trial
evaluation of each term from p_c_e_1 and again during the proper
evaluation of the entire expression.  It'd be nice if we could somehow
avoid the double evaluation, as in the approach below (or in the first
patch).


OK with more comments to explain the tf_error hijinks.


-- >8 --

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1): When tf_error is
set, proceed quietly first and return true if successful.
: When tf_error is not set, check potentiality
of the second operand before performing trial evaluation of the
first operand rather than after.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6f83d303cdd..7855a948baf 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -8892,13 +8892,16 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
tmp = boolean_false_node;
  truth:
{
-   tree op = TREE_OPERAND (t, 0);
-   if (!RECUR (op, rval))
+   tree op0 = TREE_OPERAND (t, 0);
+   tree op1 = TREE_OPERAND (t, 1);
+   if (!RECUR (op0, rval))
  return false;
+   if (!(flags & tf_error) && RECUR (op1, rval))
+ return true;
if (!processing_template_decl)
- op = cxx_eval_outermost_constant_expr (op, true);
-   if (tree_int_cst_equal (op, tmp))
- return RECUR (TREE_OPERAND (t, 1), rval);
+ op0 = cxx_eval_outermost_constant_expr (op0, true);
+   if (tree_int_cst_equal (op0, tmp))
+ return (flags & tf_error) ? RECUR (op1, rval) : false;
else
  return true;
}
@@ -9107,6 +9110,14 @@ bool
  potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool 
now,
 tsubst_flags_t flags)
  {
+  if (flags & tf_error)
+{
+  flags &= ~tf_error;
+  if (potential_constant_expression_1 (t, want_rval, strict, now, flags))
+   return true;
+  flags |= tf_error;
+}
+
tree target = NULL_TREE;
return potential_constant_expression_1 (t, want_rval, strict, now,
  flags, &target);




Re: [PATCH,FORTRAN] Fix memory leak of gsymbol

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
ping
[I'll rebase and retest this too since it's been a while.
Ok if it passes?]

On Sun, 21 Oct 2018 16:04:34 +0200
Bernhard Reutner-Fischer  wrote:

> Hi!
> 
> Regtested on x86_64-unknown-linux, installing on
> aldot/fortran-fe-stringpool.
> 
> We did not free global symbols. For a simplified abstract_type_3.f03
> valgrind reports:
> 
> 96 bytes in 1 blocks are still reachable in loss record 461 of 602
>at 0x48377D5: calloc (vg_replace_malloc.c:711)
>by 0x21257C3: xcalloc (xmalloc.c:162)
>by 0x98611B: gfc_get_gsymbol(char const*) (symbol.c:4341)
>by 0x932C58: parse_module() (parse.c:5912)
>by 0x9336F8: gfc_parse_file() (parse.c:6236)
>by 0x991449: gfc_be_parse_file() (f95-lang.c:204)
>by 0x11D8EDE: compile_file() (toplev.c:455)
>by 0x11DB9C3: do_compile() (toplev.c:2170)
>by 0x11DBCAF: toplev::main(int, char**) (toplev.c:2305)
>by 0x2045D37: main (main.c:39)
> 
> This patch reduces leaks to
> 
>  LEAK SUMMARY:
> definitely lost: 344 bytes in 1 blocks
> indirectly lost: 3,024 bytes in 4 blocks
>   possibly lost: 0 bytes in 0 blocks
> -   still reachable: 1,576,174 bytes in 2,277 blocks
> +   still reachable: 1,576,078 bytes in 2,276 blocks
>  suppressed: 0 bytes in 0 blocks
> 
> gcc/fortran/ChangeLog:
> 
> 2018-10-21  Bernhard Reutner-Fischer  
> 
>   * parse.c (clean_up_modules): Free gsym.
> ---
>  gcc/fortran/parse.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
> index b7265c42f58..f7c369a17ac 100644
> --- a/gcc/fortran/parse.c
> +++ b/gcc/fortran/parse.c
> @@ -6066,7 +6066,7 @@ resolve_all_program_units (gfc_namespace 
> *gfc_global_ns_list)
>  
>  
>  static void
> -clean_up_modules (gfc_gsymbol *gsym)
> +clean_up_modules (gfc_gsymbol *&gsym)
>  {
>if (gsym == NULL)
>  return;
> @@ -6074,14 +6074,18 @@ clean_up_modules (gfc_gsymbol *gsym)
>clean_up_modules (gsym->left);
>clean_up_modules (gsym->right);
>  
> -  if (gsym->type != GSYM_MODULE || !gsym->ns)
> +  if (gsym->type != GSYM_MODULE)
>  return;
>  
> -  gfc_current_ns = gsym->ns;
> -  gfc_derived_types = gfc_current_ns->derived_types;
> -  gfc_done_2 ();
> -  gsym->ns = NULL;
> -  return;
> +  if (gsym->ns)
> +{
> +  gfc_current_ns = gsym->ns;
> +  gfc_derived_types = gfc_current_ns->derived_types;
> +  gfc_done_2 ();
> +  gsym->ns = NULL;
> +}
> +  free (gsym);
> +  gsym = NULL;
>  }
>  
>  



Re: [PATCH,FORTRAN] Fix memory leak in finalization wrappers

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
Ping
[hmz. it's been a while, I'll rebase and retest this one.
Ok if it passes?]

On Mon, 15 Oct 2018 10:23:06 +0200
Bernhard Reutner-Fischer  wrote:

> If a finalization is not required we created a namespace containing
> formal arguments for an internal interface definition but never used
> any of these. So the whole sub_ns namespace was not wired up to the
> program and consequently was never freed. The fix is to simply not
> generate any finalization wrappers if we know that it will be unused.
> Note that this reverts back to the original r190869
> (8a96d64282ac534cb597f446f02ac5d0b13249cc) handling for this case
> by reverting this specific part of r194075
> (f1ee56b4be7cc3892e6ccc75d73033c129098e87) for PR fortran/37336.
> 
> Regtests cleanly, installed to the fortran-fe-stringpool branch, sent
> here for reference and later inclusion.
> I might plug a few more leaks in preparation of switching to hash-maps.
> I fear that the leaks around interfaces are another candidate ;)
> 
> Should probably add a tag for the compile-time leak PR68800 shouldn't i.
> 
> valgrind summary for e.g.
> gfortran.dg/abstract_type_3.f03 and gfortran.dg/abstract_type_4.f03
> where ".orig" is pristine trunk and ".mine" contains this fix:
> 
> at3.orig.vg:LEAK SUMMARY:
> at3.orig.vg-   definitely lost: 8,460 bytes in 11 blocks
> at3.orig.vg-   indirectly lost: 13,288 bytes in 55 blocks
> at3.orig.vg- possibly lost: 0 bytes in 0 blocks
> at3.orig.vg-   still reachable: 572,278 bytes in 2,142 blocks
> at3.orig.vg-suppressed: 0 bytes in 0 blocks
> at3.orig.vg-
> at3.orig.vg-Use --track-origins=yes to see where uninitialised values come 
> from
> at3.orig.vg-ERROR SUMMARY: 38 errors from 33 contexts (suppressed: 0 from 0)
> --
> at3.mine.vg:LEAK SUMMARY:
> at3.mine.vg-   definitely lost: 344 bytes in 1 blocks
> at3.mine.vg-   indirectly lost: 7,192 bytes in 18 blocks
> at3.mine.vg- possibly lost: 0 bytes in 0 blocks
> at3.mine.vg-   still reachable: 572,278 bytes in 2,142 blocks
> at3.mine.vg-suppressed: 0 bytes in 0 blocks
> at3.mine.vg-
> at3.mine.vg-ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
> at3.mine.vg-ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
> at4.orig.vg:LEAK SUMMARY:
> at4.orig.vg-   definitely lost: 13,751 bytes in 12 blocks
> at4.orig.vg-   indirectly lost: 11,976 bytes in 60 blocks
> at4.orig.vg- possibly lost: 0 bytes in 0 blocks
> at4.orig.vg-   still reachable: 572,278 bytes in 2,142 blocks
> at4.orig.vg-suppressed: 0 bytes in 0 blocks
> at4.orig.vg-
> at4.orig.vg-Use --track-origins=yes to see where uninitialised values come 
> from
> at4.orig.vg-ERROR SUMMARY: 18 errors from 16 contexts (suppressed: 0 from 0)
> --
> at4.mine.vg:LEAK SUMMARY:
> at4.mine.vg-   definitely lost: 3,008 bytes in 3 blocks
> at4.mine.vg-   indirectly lost: 4,056 bytes in 11 blocks
> at4.mine.vg- possibly lost: 0 bytes in 0 blocks
> at4.mine.vg-   still reachable: 572,278 bytes in 2,142 blocks
> at4.mine.vg-suppressed: 0 bytes in 0 blocks
> at4.mine.vg-
> at4.mine.vg-ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
> at4.mine.vg-ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
> 
> gcc/fortran/ChangeLog:
> 
> 2018-10-12  Bernhard Reutner-Fischer  
> 
>   * class.c (generate_finalization_wrapper): Do leak finalization
>   wrappers if they will not be used.
>   * expr.c (gfc_free_actual_arglist): Formatting fix.
>   * gfortran.h (gfc_free_symbol): Pass argument by reference.
>   (gfc_release_symbol): Likewise.
>   (gfc_free_namespace): Likewise.
>   * symbol.c (gfc_release_symbol): Adjust acordingly.
>   (free_components): Set procedure pointer components
>   of derived types to NULL after freeing.
>   (free_tb_tree): Likewise.
>   (gfc_free_symbol): Set sym to NULL after freeing.
>   (gfc_free_namespace): Set namespace to NULL after freeing.
> ---
>  gcc/fortran/class.c| 25 +
>  gcc/fortran/expr.c |  2 +-
>  gcc/fortran/gfortran.h |  6 +++---
>  gcc/fortran/symbol.c   | 19 ++-
>  4 files changed, 23 insertions(+), 29 deletions(-)
> 
> diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
> index 69c95fc5dfa..e0bb381a55f 100644
> --- a/gcc/fortran/class.c
> +++ b/gcc/fortran/class.c
> @@ -1533,7 +1533,6 @@ generate_finalization_wrapper (gfc_symbol *derived, 
> gfc_namespace *ns,
>gfc_code *last_code, *block;
>const char *name;
>bool finalizable_comp = false;
> -  bool expr_null_wrapper = false;
>gfc_expr *ancestor_wrapper = NULL, *rank;
>gfc_iterator *iter;
>  
> @@ -1561,13 +1560,17 @@ generate_finalization_wrapper (gfc_symbol *derived, 
> gfc_namespace *ns,
>  }
>  
>/* No wrapper of the ancestor and no own FINAL subroutines and allocatable
> - components: Return a NULL() expression; we defer this a bit to have have
> + components: Return a NULL() expression; we defer this a bit to have
>

[PATCH,Fortran 0/1] Correct CAF locations in simplify

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
Hi!

I found this lying around in an oldish tree.
Regtest running over night, ok for trunk if it passes?

Bernhard Reutner-Fischer (1):
  Tweak locations around CAF simplify

 gcc/fortran/simplify.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

-- 
2.33.0



[PATCH,Fortran 1/1] Tweak locations around CAF simplify

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
From: Bernhard Reutner-Fischer 

addresses: FIXME: gfc_current_locus is wrong
by using the locus of the current intrinsic.
Regtests clean, ok for trunk?

gcc/fortran/ChangeLog:

2018-09-20  Bernhard Reutner-Fischer  

* simplify.c (gfc_simplify_failed_or_stopped_images): Use
current intrinsic where locus.
(gfc_simplify_get_team): Likewise.
(gfc_simplify_num_images): Likewise.
(gfc_simplify_image_status): Likewise.
(gfc_simplify_this_image): Likewise.
---
 gcc/fortran/simplify.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index d675f2c3aef..46e88bb2bf1 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -2985,8 +2985,9 @@ gfc_simplify_failed_or_stopped_images (gfc_expr *team 
ATTRIBUTE_UNUSED,
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
+
   return &gfc_bad_expr;
 }
 
@@ -2999,7 +3000,8 @@ gfc_simplify_failed_or_stopped_images (gfc_expr *team 
ATTRIBUTE_UNUSED,
   else
actual_kind = gfc_default_integer_kind;
 
-  result = gfc_get_array_expr (BT_INTEGER, actual_kind, 
&gfc_current_locus);
+  result = gfc_get_array_expr (BT_INTEGER, actual_kind,
+ gfc_current_intrinsic_where);
   result->rank = 1;
   return result;
 }
@@ -3015,15 +3017,16 @@ gfc_simplify_get_team (gfc_expr *level ATTRIBUTE_UNUSED)
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return &gfc_bad_expr;
 }
 
   if (flag_coarray == GFC_FCOARRAY_SINGLE)
 {
   gfc_expr *result;
-  result = gfc_get_array_expr (BT_INTEGER, gfc_default_integer_kind, 
&gfc_current_locus);
+  result = gfc_get_array_expr (BT_INTEGER, gfc_default_integer_kind,
+ gfc_current_intrinsic_where);
   result->rank = 0;
   return result;
 }
@@ -6340,7 +6343,8 @@ gfc_simplify_num_images (gfc_expr *distance 
ATTRIBUTE_UNUSED, gfc_expr *failed)
 
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return &gfc_bad_expr;
 }
 
@@ -6350,9 +6354,8 @@ gfc_simplify_num_images (gfc_expr *distance 
ATTRIBUTE_UNUSED, gfc_expr *failed)
   if (failed && failed->expr_type != EXPR_CONSTANT)
 return NULL;
 
-  /* FIXME: gfc_current_locus is wrong.  */
   result = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
- &gfc_current_locus);
+ gfc_current_intrinsic_where);
 
   if (failed && failed->value.logical != 0)
 mpz_set_si (result->value.integer, 0);
@@ -8345,8 +8348,8 @@ gfc_simplify_image_status (gfc_expr *image, gfc_expr 
*team ATTRIBUTE_UNUSED)
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return &gfc_bad_expr;
 }
 
@@ -8383,9 +8386,8 @@ gfc_simplify_this_image (gfc_expr *coarray, gfc_expr *dim,
   if (coarray == NULL || !gfc_is_coarray (coarray))
 {
   gfc_expr *result;
-  /* FIXME: gfc_current_locus is wrong.  */
   result = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
- &gfc_current_locus);
+ gfc_current_intrinsic_where);
   mpz_set_si (result->value.integer, 1);
   return result;
 }
-- 
2.33.0



[PATCH,Fortran] Fortran: Delete unused decl in gfortran.h

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
From: Bernhard Reutner-Fischer 

Hi!

Delete some more declarations without definitions and make some
functions static.
Bootstrapped and regtested on x86_64-unknown-linux without regressions.
Ok for trunk?

gcc/fortran/ChangeLog:

* decl.c (gfc_insert_kind_parameter_exprs): Make static.
* expr.c (gfc_build_init_expr): Make static
(gfc_build_default_init_expr): Move below its static helper.
* gfortran.h (gfc_insert_kind_parameter_exprs, gfc_add_saved_common,
gfc_add_common, gfc_use_derived_tree, gfc_free_charlen,
gfc_get_ultimate_derived_super_type,
gfc_resolve_oacc_parallel_loop_blocks, gfc_build_init_expr,
gfc_iso_c_sub_interface): Delete.
* symbol.c (gfc_new_charlen, gfc_get_derived_super_type): Make
static.
---
 gcc/fortran/decl.c |  2 +-
 gcc/fortran/expr.c | 20 ++--
 gcc/fortran/gfortran.h |  9 -
 gcc/fortran/symbol.c   |  4 ++--
 4 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 2788348d1be..e9e23fe1acb 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -3713,7 +3713,7 @@ insert_parameter_exprs (gfc_expr* e, gfc_symbol* sym 
ATTRIBUTE_UNUSED,
 }
 
 
-bool
+static bool
 gfc_insert_kind_parameter_exprs (gfc_expr *e)
 {
   return gfc_traverse_expr (e, NULL, &insert_parameter_exprs, 0);
diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 4dea840e348..087d822021a 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4587,21 +4587,12 @@ gfc_check_assign_symbol (gfc_symbol *sym, gfc_component 
*comp, gfc_expr *rvalue)
   return true;
 }
 
-/* Invoke gfc_build_init_expr to create an initializer expression, but do not
- * require that an expression be built.  */
-
-gfc_expr *
-gfc_build_default_init_expr (gfc_typespec *ts, locus *where)
-{
-  return gfc_build_init_expr (ts, where, false);
-}
-
 /* Build an initializer for a local integer, real, complex, logical, or
character variable, based on the command line flags finit-local-zero,
finit-integer=, finit-real=, finit-logical=, and finit-character=.
With force, an initializer is ALWAYS generated.  */
 
-gfc_expr *
+static gfc_expr *
 gfc_build_init_expr (gfc_typespec *ts, locus *where, bool force)
 {
   gfc_expr *init_expr;
@@ -4758,6 +4749,15 @@ gfc_build_init_expr (gfc_typespec *ts, locus *where, 
bool force)
   return init_expr;
 }
 
+/* Invoke gfc_build_init_expr to create an initializer expression, but do not
+ * require that an expression be built.  */
+
+gfc_expr *
+gfc_build_default_init_expr (gfc_typespec *ts, locus *where)
+{
+  return gfc_build_init_expr (ts, where, false);
+}
+
 /* Apply an initialization expression to a typespec. Can be used for symbols or
components. Similar to add_init_expr_to_sym in decl.c; could probably be
combined with some effort.  */
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index f7662c59a5d..8c11cf6d18d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3116,7 +3116,6 @@ struct gfc_vect_builtin_tuple
 extern hash_map *gfc_vectorized_builtins;
 
 /* Handling Parameterized Derived Types  */
-bool gfc_insert_kind_parameter_exprs (gfc_expr *);
 bool gfc_insert_parameter_exprs (gfc_expr *, gfc_actual_arglist *);
 match gfc_get_pdt_instance (gfc_actual_arglist *, gfc_symbol **,
gfc_actual_arglist **);
@@ -3348,11 +3347,9 @@ bool gfc_add_threadprivate (symbol_attribute *, const 
char *, locus *);
 bool gfc_add_omp_declare_target (symbol_attribute *, const char *, locus *);
 bool gfc_add_omp_declare_target_link (symbol_attribute *, const char *,
  locus *);
-bool gfc_add_saved_common (symbol_attribute *, locus *);
 bool gfc_add_target (symbol_attribute *, locus *);
 bool gfc_add_dummy (symbol_attribute *, const char *, locus *);
 bool gfc_add_generic (symbol_attribute *, const char *, locus *);
-bool gfc_add_common (symbol_attribute *, locus *);
 bool gfc_add_in_common (symbol_attribute *, const char *, locus *);
 bool gfc_add_in_equivalence (symbol_attribute *, const char *, locus *);
 bool gfc_add_data (symbol_attribute *, const char *, locus *);
@@ -3387,7 +3384,6 @@ bool gfc_copy_attr (symbol_attribute *, symbol_attribute 
*, locus *);
 int gfc_copy_dummy_sym (gfc_symbol **, gfc_symbol *, int);
 bool gfc_add_component (gfc_symbol *, const char *, gfc_component **);
 gfc_symbol *gfc_use_derived (gfc_symbol *);
-gfc_symtree *gfc_use_derived_tree (gfc_symtree *);
 gfc_component *gfc_find_component (gfc_symbol *, const char *, bool, bool,
gfc_ref **);
 
@@ -3428,7 +3424,6 @@ void gfc_undo_symbols (void);
 void gfc_commit_symbols (void);
 void gfc_commit_symbol (gfc_symbol *);
 gfc_charlen *gfc_new_charlen (gfc_namespace *, gfc_charlen *);
-void gfc_free_charlen (gfc_charlen *, gfc_charlen *);
 void gfc_free_namespace (gfc_namespace *);
 
 void gfc_symbol_init_2 (void);
@@ -3448,7 +

Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-27 Thread Patrick Palka via Gcc-patches
On Wed, 27 Oct 2021, Jason Merrill wrote:

> On 10/27/21 14:54, Patrick Palka wrote:
> > On Tue, 26 Oct 2021, Jakub Jelinek wrote:
> > 
> > > On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote:
> > > > The performance impact of the other calls to
> > > > cxx_eval_outermost_const_expr
> > > > from p_c_e_1 is probably already mostly mitigated by the constexpr call
> > > > cache and the fact that we try to evaluate all calls to constexpr
> > > > functions during cp_fold_function anyway (at least with -O).  So trial
> > > 
> > > constexpr function bodies don't go through cp_fold_function
> > > (intentionally,
> > > so that we don't optimize away UB), the bodies are copied before the trees
> > > of the
> > > normal copy are folded.
> > 
> > Ah right, I had forgotten about that..
> > 
> > Here's another approach that doesn't need to remove trial evaluation for
> > &&/||.  The idea is to first quietly check if the second operand is
> > potentially constant _before_ performing trial evaluation of the first
> > operand.  This speeds up the case we care about (both operands are
> > potentially constant) without regressing any diagnostics.  We have to be
> > careful about emitting bogus diagnostics when tf_error is set, hence the
> > first hunk below which makes p_c_e_1 always proceed quietly first, and
> > replay noisily in case of error (similar to how satisfaction works).
> > 
> > Would something like this be preferable?
> 
> Seems plausible, though doubling the number of stack frames is a downside.

Whoops, good point..  The noisy -> quiet adjustment only needs to
be performed during the outermost call to p_c_e_1, and not also during
each recursive call.  The revised diff below fixes this thinko, and so
only a single extra stack frame is needed AFAICT.

> 
> What did you think of Jakub's suggestion of linearizing the terms?

IIUC that would fix the quadraticness, but it wouldn't address that
we end up evaluating the entire expression twice, once during the trial
evaluation of each term from p_c_e_1 and again during the proper
evaluation of the entire expression.  It'd be nice if we could somehow
avoid the double evaluation, as in the approach below (or in the first
patch).

-- >8 --

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1): When tf_error is
set, proceed quietly first and return true if successful.
: When tf_error is not set, check potentiality
of the second operand before performing trial evaluation of the
first operand rather than after.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6f83d303cdd..7855a948baf 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -8892,13 +8892,16 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
   tmp = boolean_false_node;
 truth:
   {
-   tree op = TREE_OPERAND (t, 0);
-   if (!RECUR (op, rval))
+   tree op0 = TREE_OPERAND (t, 0);
+   tree op1 = TREE_OPERAND (t, 1);
+   if (!RECUR (op0, rval))
  return false;
+   if (!(flags & tf_error) && RECUR (op1, rval))
+ return true;
if (!processing_template_decl)
- op = cxx_eval_outermost_constant_expr (op, true);
-   if (tree_int_cst_equal (op, tmp))
- return RECUR (TREE_OPERAND (t, 1), rval);
+ op0 = cxx_eval_outermost_constant_expr (op0, true);
+   if (tree_int_cst_equal (op0, tmp))
+ return (flags & tf_error) ? RECUR (op1, rval) : false;
else
  return true;
   }
@@ -9107,6 +9110,14 @@ bool
 potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now,
 tsubst_flags_t flags)
 {
+  if (flags & tf_error)
+{
+  flags &= ~tf_error;
+  if (potential_constant_expression_1 (t, want_rval, strict, now, flags))
+   return true;
+  flags |= tf_error;
+}
+
   tree target = NULL_TREE;
   return potential_constant_expression_1 (t, want_rval, strict, now,
  flags, &target);



Re: [PATCH] c++: Implement DR2351 - void{} [PR102820]

2021-10-27 Thread Jason Merrill via Gcc-patches

On 10/21/21 04:42, Jakub Jelinek wrote:

Hi!

Here is an attempt to implement DR2351 - void{} - where void{} after
pack expansion is considered valid and the same thing as void().
For templates, dunno if we have some better way to check if a CONSTRUCTOR
might be empty after pack expansion.  Would that only if the constructor
only contains EXPR_PACK_EXPANSION elements and nothing else, or something
else too?


I think that's the only case.  For template args there's the 
pack_expansion_args_count function, but I don't think there's anything 
similar for constructor elts; please feel free to add it.



With the patch as is we wouldn't diagnose
template 
void
bar (T... t)
{
   void{1, t...};
}
at parsing time, only at instantiation time, even when it will always
expand to at least one CONSTRUCTOR elt.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2021-10-21  Jakub Jelinek  

PR c++/102820
* semantics.c (finish_compound_literal): Implement DR2351 - void{}.
If type is cv void and compound_literal has no elements, return
void_node.  If type is cv void and compound_literal is instantiation
dependent, handle it like other dependent compound literals.

* g++.dg/cpp0x/dr2351.C: New test.

--- gcc/cp/semantics.c.jj   2021-10-15 11:58:45.079131947 +0200
+++ gcc/cp/semantics.c  2021-10-20 17:00:38.586705600 +0200
@@ -3104,9 +3104,20 @@ finish_compound_literal (tree type, tree
  
if (!TYPE_OBJ_P (type))

  {
-  if (complain & tf_error)
-   error ("compound literal of non-object type %qT", type);
-  return error_mark_node;
+  /* DR2351 */
+  if (VOID_TYPE_P (type) && CONSTRUCTOR_NELTS (compound_literal) == 0)
+   return void_node;
+  else if (VOID_TYPE_P (type)
+  && processing_template_decl
+  && instantiation_dependent_expression_p (compound_literal))
+   /* If there are packs in compound_literal, it could
+  be void{} after pack expansion.  */;
+  else
+   {
+ if (complain & tf_error)
+   error ("compound literal of non-object type %qT", type);
+ return error_mark_node;
+   }
  }
  
if (template_placeholder_p (type))

--- gcc/testsuite/g++.dg/cpp0x/dr2351.C.jj  2021-10-20 17:06:02.399162937 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/dr2351.C 2021-10-20 17:05:54.294276511 +0200
@@ -0,0 +1,36 @@
+// DR2351
+// { dg-do compile { target c++11 } }
+
+void
+foo ()
+{
+  void{};
+  void();
+}
+
+template 
+void
+bar (T... t)
+{
+  void{t...};
+  void(t...);
+}
+
+void
+baz ()
+{
+  bar ();
+}
+
+template 
+void
+qux (T... t)
+{
+  void{t...};  // { dg-error "compound literal of non-object type" }
+}
+
+void
+corge ()
+{
+  qux (1, 2);
+}

Jakub





Re: [PATCH 3/5] gcc: Add --nostdlib++ option

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 27 Oct 2021 21:05:03 +0100
Richard Purdie via Gcc-patches  wrote:

> OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime libraries
> separately from the compiler and slightly differently to the standard gcc 
> build.
> 
> In general this works well but in trying to build them separately we run into
> an issue since we're using our gcc, not xgcc and there is no way to tell 
> configure
> to use libgcc but not look for libstdc++.
> 
> This adds such an option allowing such configurations to work.

But shouldn't it be called --nostdlibc++ then?
thanks,


Re: [PATCH 2/5] gcc: Fix "argument list too long" from install-plugins

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 27 Oct 2021 21:05:02 +0100
Richard Purdie via Gcc-patches  wrote:

> When building in longer build paths (200+ characters), the
> "echo $(PLUGIN_HEADERS)" from the install-plugins target would cause an
> "argument list too long error" on some systems.
> 
> Avoid this by calling make's sort function on the list which removes
> duplicates and stops the overflow from reaching the echo command.
> The original sort is left to handle the the .h and .def files.

you could as well $(subst $(srcdir),,$(wildcard $(addprefix
$(srcdir),*.h *.def))) and subst '' with '\012' wrapped in an outer
sort to completely do away with the echo and shell sort i suppose.
Just an idea.
thanks,
> 
> 2021-10-26 Richard Purdie 
> 
> gcc/ChangeLog:
> 
> * Makefile.in: Fix "argument list too long" from install-plugins
> 
> Signed-off-by: Richard Purdie 
> ---
>  gcc/Makefile.in | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 658093c11c0..89482c6dd4e 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -3685,7 +3685,7 @@ install-plugin: installdirs lang.install-plugin 
> s-header-vars install-gengtype
>  # We keep the directory structure for files in config, common/config or
>  # c-family and .def files. All other files are flattened to a single 
> directory.
>   $(mkinstalldirs) $(DESTDIR)$(plugin_includedir)
> - headers=`echo $(PLUGIN_HEADERS) $$(cd $(srcdir); echo *.h *.def) | tr ' 
> ' '\012' | sort -u`; \
> + headers=`echo $(sort $(PLUGIN_HEADERS)) $$(cd $(srcdir); echo *.h 
> *.def) | tr ' ' '\012' | sort -u`; \
>   srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*|]/&/g'`; \
>   for file in $$headers; do \
> if [ -f $$file ] ; then \



Re: [PATCH] c++: CTAD within template argument [PR102933]

2021-10-27 Thread Jason Merrill via Gcc-patches

On 10/26/21 13:44, Patrick Palka wrote:

Here when checking for erroneous occurrences of 'auto' inside a template
argument (which is allowed by the concepts TS for class templates),
extract_autos_r picks up the CTAD placeholder for X{T{0}} which causes
check_auto_in_tmpl_args to reject this valid template argument.  This
patch fixes this by making extract_autos_r ignore CTAD placeholders.


It also seems questionable that check_auto_in_tmpl_args is looking into 
non-type arguments, which won't have the bad autos this is looking for.



However, it seems we don't need to call check_auto_in_tmpl_args at all
outside of the concepts TS since using 'auto' as a type-id is otherwise
rejected more generally at parse time.  So this patch guards calls to
check_auto_in_tmpl_args with flag_concepts_ts instead of flag_concepts.

Relatedly, I think the concepts code paths in do_auto_deduction and
type_uses_auto are also necessary only for the concepts TS, so this
patch also restricts these code paths accordingly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 11?


For 11 (and possibly trunk) maybe return false from check_auto... if 
!flag_concepts_ts rather than asserting and changing the call sites. 
That one change is OK for 11, the whole patch is OK for trunk.


The comment on the test or assert could be elaborated to explain as you 
do above that any bad autos will have been rejected already by the parser.



PR c++/102933

gcc/cp/ChangeLog:

* parser.c (cp_parser_template_id): Call check_auto_in_tmpl_args
only for the concepts TS not also for standard concepts.
(cp_parser_simple_type_specifier): Adjust diagnostic for using
auto in parameter declaration.
* pt.c (tsubst_qualified_id): Call check_auto_in_tmpl_args only
for the concepts TS not also for standard concepts.
(extract_autos_r): Ignore CTAD placeholders.
(extract_autos): Use range-based for.
(do_auto_deduction): Use extract_autos only for the concepts TS
and not also for standard concepts.
(type_uses_auto): Likewise with for_each_template_parm.
(check_auto_in_tmpl_args): Assert that this function is used only
for the concepts tS.  Simplify.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class50.C: New test.
* g++.dg/cpp2a/nontype-class50a.C: New test.
---
  gcc/cp/parser.c   |  4 ++--
  gcc/cp/pt.c   | 24 +--
  gcc/testsuite/g++.dg/cpp2a/nontype-class50.C  | 13 ++
  gcc/testsuite/g++.dg/cpp2a/nontype-class50a.C |  5 
  4 files changed, 32 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class50.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class50a.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 49d951cfb19..5052f534d40 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18168,7 +18168,7 @@ cp_parser_template_id (cp_parser *parser,
   types.  We reject them in functions, but if what we have is an
   identifier, even with none_type we can't conclude it's NOT a
   type, we have to wait for template substitution.  */
-  if (flag_concepts && check_auto_in_tmpl_args (templ, arguments))
+  if (flag_concepts_ts && check_auto_in_tmpl_args (templ, arguments))
  template_id = error_mark_node;
/* Build a representation of the specialization.  */
else if (identifier_p (templ))
@@ -19505,7 +19505,7 @@ cp_parser_simple_type_specifier (cp_parser* parser,
  else if (!flag_concepts)
pedwarn (token->location, 0,
 "use of % in parameter declaration "
-"only available with %<-fconcepts-ts%>");
+"only available with %<-std=c++20%> or %<-fconcepts%>");
}
else
type = make_auto ();
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 287cf4ce9d0..3321601e6ff 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -16456,7 +16456,7 @@ tsubst_qualified_id (tree qualified_id, tree args,
 we want to catch is when we couldn't tell then, and can now,
 namely when templ prior to substitution was an
 identifier.  */
-  if (flag_concepts && check_auto_in_tmpl_args (expr, template_args))
+  if (flag_concepts_ts && check_auto_in_tmpl_args (expr, template_args))
return error_mark_node;
  
if (variable_template_p (expr))

@@ -28560,7 +28560,7 @@ static int
  extract_autos_r (tree t, void *data)
  {
hash_table &hash = *(hash_table*)data;
-  if (is_auto (t))
+  if (is_auto (t) && !template_placeholder_p (t))
  {
/* All the autos were built with index 0; fix that up now.  */
tree *p = hash.find_slot (t, INSERT);
@@ -28594,10 +28594,8 @@ extract_autos (tree type)
for_each_template_parm (type, extract_autos_r, &hash, &visited, true);
  
tree tree_vec = make_tree_vec (hash.elements());

-  for

Re: [Patch, Fortran] PR 86935: Bad locus in ASSOCIATE statement

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 27 Oct 2021 21:44:52 +0200
Harald Anlauf via Fortran  wrote:

> Hi Bernhard,
> 
> Am 27.10.21 um 19:40 schrieb Bernhard Reutner-Fischer via Fortran:
> > AFAICS current trunk still has this issue.
> > Any takers?
> > thanks,  
> 
> can you create a PR tracking this issue?

now https://gcc.gnu.org/PR102973

> 
> AFAICS PR86935 has been fixed for gcc-9+.

Yes, it is a pre existing possible bug that caught my eye when i looked
at that patch, so admittedly unrelated to PR86935.

thanks,


Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-27 Thread Jason Merrill via Gcc-patches

On 10/27/21 14:54, Patrick Palka wrote:

On Tue, 26 Oct 2021, Jakub Jelinek wrote:


On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote:

The performance impact of the other calls to cxx_eval_outermost_const_expr
from p_c_e_1 is probably already mostly mitigated by the constexpr call
cache and the fact that we try to evaluate all calls to constexpr
functions during cp_fold_function anyway (at least with -O).  So trial


constexpr function bodies don't go through cp_fold_function (intentionally,
so that we don't optimize away UB), the bodies are copied before the trees of 
the
normal copy are folded.


Ah right, I had forgotten about that..

Here's another approach that doesn't need to remove trial evaluation for
&&/||.  The idea is to first quietly check if the second operand is
potentially constant _before_ performing trial evaluation of the first
operand.  This speeds up the case we care about (both operands are
potentially constant) without regressing any diagnostics.  We have to be
careful about emitting bogus diagnostics when tf_error is set, hence the
first hunk below which makes p_c_e_1 always proceed quietly first, and
replay noisily in case of error (similar to how satisfaction works).

Would something like this be preferable?


Seems plausible, though doubling the number of stack frames is a downside.

What did you think of Jakub's suggestion of linearizing the terms?


-- >8 --

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1): When tf_error is
set, proceed quietly first and return true if successful.
: When tf_error is not set, check potentiality
of the second operand before performing trial evaluation of the
first operand rather than after.


diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6f83d303cdd..821bd41d994 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -8056,6 +8056,14 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
  #define RECUR(T,RV) \
potential_constant_expression_1 ((T), (RV), strict, now, flags, jump_target)
  
+  if (flags & tf_error)

+{
+  flags &= ~tf_error;
+  if (RECUR (t, want_rval))
+   return true;
+  flags |= tf_error;
+}
+
enum { any = false, rval = true };
int i;
tree tmp;
@@ -8892,13 +8900,16 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
tmp = boolean_false_node;
  truth:
{
-   tree op = TREE_OPERAND (t, 0);
-   if (!RECUR (op, rval))
+   tree op0 = TREE_OPERAND (t, 0);
+   tree op1 = TREE_OPERAND (t, 1);
+   if (!RECUR (op0, rval))
  return false;
+   if (!(flags & tf_error) && RECUR (op1, rval))
+ return true;
if (!processing_template_decl)
- op = cxx_eval_outermost_constant_expr (op, true);
-   if (tree_int_cst_equal (op, tmp))
- return RECUR (TREE_OPERAND (t, 1), rval);
+ op0 = cxx_eval_outermost_constant_expr (op0, true);
+   if (tree_int_cst_equal (op0, tmp))
+ return (flags & tf_error) ? RECUR (op1, rval) : false;
else
  return true;
}





[PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets

2021-10-27 Thread Richard Purdie via Gcc-patches
During cross compiling, CPP is being set to the target compiler even for
build targets. As an example, when building a cross compiler targetting
mingw, the config.log for libiberty in
build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log
shows:

configure:3786: checking how to run the C preprocessor
configure:3856: result: x86_64-pokysdk-mingw32-gcc -E 
--sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32
configure:3876: x86_64-pokysdk-mingw32-gcc -E 
--sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c
configure:3876: $? = 0

This is libiberty being built for the build environment, not the target one
(i.e. in build-x86_64-linux). As such it should be using the build environment's
gcc and not the target one. In the mingw case the system headers are quite
different leading to build failures related to not being able to include a
process.h file for pem-unix.c.

Further analysis shows the same issue occuring for CPPFLAGS too.

Fix this by adding support for CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD which
for example, avoids mixing the mingw headers for host binaries on linux
systems.

2021-10-27 Richard Purdie 

ChangeLog:

* Makefile.tpl: Add CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD support
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Add CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD support

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Use CPPFLAGS_FOR_BUILD for GMPINC

Signed-off-by: Richard Purdie 
---
 Makefile.in  | 6 ++
 Makefile.tpl | 6 ++
 configure| 4 
 configure.ac | 4 
 gcc/configure| 2 +-
 gcc/configure.ac | 2 +-
 6 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index 34b2d89660d..d13f6c353ee 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -154,6 +154,8 @@ BUILD_EXPORTS = \
CC="$(CC_FOR_BUILD)"; export CC; \
CFLAGS="$(CFLAGS_FOR_BUILD)"; export CFLAGS; \
CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
+   CPP="$(CPP_FOR_BUILD)"; export CPP; \
+   CPPFLAGS="$(CPPFLAGS_FOR_BUILD)"; export CPPFLAGS; \
CXX="$(CXX_FOR_BUILD)"; export CXX; \
CXXFLAGS="$(CXXFLAGS_FOR_BUILD)"; export CXXFLAGS; \
GFORTRAN="$(GFORTRAN_FOR_BUILD)"; export GFORTRAN; \
@@ -202,6 +204,8 @@ HOST_EXPORTS = \
AR="$(AR)"; export AR; \
AS="$(AS)"; export AS; \
CC_FOR_BUILD="$(CC_FOR_BUILD)"; export CC_FOR_BUILD; \
+   CPP_FOR_BUILD="$(CPP_FOR_BUILD)"; export CPP_FOR_BUILD; \
+   CPPFLAGS_FOR_BUILD="$(CPPFLAGS_FOR_BUILD)"; export CPPFLAGS_FOR_BUILD; \
CXX_FOR_BUILD="$(CXX_FOR_BUILD)"; export CXX_FOR_BUILD; \
DLLTOOL="$(DLLTOOL)"; export DLLTOOL; \
DSYMUTIL="$(DSYMUTIL)"; export DSYMUTIL; \
@@ -360,6 +364,8 @@ AR_FOR_BUILD = @AR_FOR_BUILD@
 AS_FOR_BUILD = @AS_FOR_BUILD@
 CC_FOR_BUILD = @CC_FOR_BUILD@
 CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
+CPP_FOR_BUILD = @CPP_FOR_BUILD@
+CPPFLAGS_FOR_BUILD = @CPPFLAGS_FOR_BUILD@
 CXXFLAGS_FOR_BUILD = @CXXFLAGS_FOR_BUILD@
 CXX_FOR_BUILD = @CXX_FOR_BUILD@
 DLLTOOL_FOR_BUILD = @DLLTOOL_FOR_BUILD@
diff --git a/Makefile.tpl b/Makefile.tpl
index 08e68e83ea8..213052f8226 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -157,6 +157,8 @@ BUILD_EXPORTS = \
CC="$(CC_FOR_BUILD)"; export CC; \
CFLAGS="$(CFLAGS_FOR_BUILD)"; export CFLAGS; \
CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
+   CPP="$(CPP_FOR_BUILD)"; export CPP; \
+   CPPFLAGS="$(CPPFLAGS_FOR_BUILD)"; export CPPFLAGS; \
CXX="$(CXX_FOR_BUILD)"; export CXX; \
CXXFLAGS="$(CXXFLAGS_FOR_BUILD)"; export CXXFLAGS; \
GFORTRAN="$(GFORTRAN_FOR_BUILD)"; export GFORTRAN; \
@@ -205,6 +207,8 @@ HOST_EXPORTS = \
AR="$(AR)"; export AR; \
AS="$(AS)"; export AS; \
CC_FOR_BUILD="$(CC_FOR_BUILD)"; export CC_FOR_BUILD; \
+   CPP_FOR_BUILD="$(CPP_FOR_BUILD)"; export CPP_FOR_BUILD; \
+   CPPFLAGS_FOR_BUILD="$(CPPFLAGS_FOR_BUILD)"; export CPPFLAGS_FOR_BUILD; \
CXX_FOR_BUILD="$(CXX_FOR_BUILD)"; export CXX_FOR_BUILD; \
DLLTOOL="$(DLLTOOL)"; export DLLTOOL; \
DSYMUTIL="$(DSYMUTIL)"; export DSYMUTIL; \
@@ -363,6 +367,8 @@ AR_FOR_BUILD = @AR_FOR_BUILD@
 AS_FOR_BUILD = @AS_FOR_BUILD@
 CC_FOR_BUILD = @CC_FOR_BUILD@
 CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
+CPP_FOR_BUILD = @CPP_FOR_BUILD@
+CPPFLAGS_FOR_BUILD = @CPPFLAGS_FOR_BUILD@
 CXXFLAGS_FOR_BUILD = @CXXFLAGS_FOR_BUILD@
 CXX_FOR_BUILD = @CXX_FOR_BUILD@
 DLLTOOL_FOR_BUILD = @DLLTOOL_FOR_BUILD@
diff --git a/configure b/configure
index 785498efff5..58979d6e3b1 100755
--- a/configure
+++ b/configure
@@ -655,6 +655,8 @@ DSYMUTIL_FOR_BUILD
 DLLTOOL_FOR_BUILD
 CXX_FOR_BUILD
 CXXFLAGS_FOR_BUILD
+CPPFLAGS_FOR_BUILD
+CPP_FOR_BUILD
 CFLAGS_FOR_BUILD
 CC_FOR_BUILD
 AS_FOR_BUILD
@@ -4090,6 +4092,7 @@ if test "${build}" != "${host}" ; then
   AR_FOR_BUILD=${AR_FOR_BUILD-ar}
   AS_FOR_BUILD=${AS_FOR_BUILD-as}
   CC_FOR_BUILD=${CC_FOR_BUILD-gcc}
+  CP

[PATCH 0/5] OpenEmbedded/Yocto Project gcc patches

2021-10-27 Thread Richard Purdie via Gcc-patches
OpenEmbedded/Yocto Project extensively uses gcc to cross compile in many
different and interesting places. On the most part we're very happy,
thanks! We do have a small collection of patches and I believe it would
be beneficial to share some of them.

I've picked some of the simpler ones and have tried to start that with
this series. I did send the first four once already but have reworked 
the first patch as advised (it became clear we had further related fixes 
too which I've merged in).

It may also be interesting for gcc developers to know that Yocto Project
is running the gcc test suite for each of the major architectures we target
under qemu (some in system, some in user mode) and collecting test results
with a view to trying to ensure we don't regress over time. As time
permits, we hope to track down failures and improve the pass rates too.

A report from our recent release is here:

http://downloads.yoctoproject.org/releases/yocto/yocto-3.4/testreport.txt

but is large so I placed some examples here:

=
qemuppc PTest Result Summary
=
-
Recipe   | Passed   | Failed   | Skipped 
-
binutils | 221  | 2| 13  
binutils-gas | 343  | 2| 3   
binutils-ld  | 1386 | 7| 352 
gcc-g++-user | 195119   | 37   | 9389
gcc-libatomic-user   | 44   | 0| 5   
gcc-libgomp-user | 2788 | 2| 370 
gcc-libitm-user  | 46   | 0| 2   
gcc-libstdc++-v3-user| 14152| 7| 723 
gcc-user | 134355   | 200  | 4104
glibc-user   | 3869 | 210  | 76  
-

=
qemux86 PTest Result Summary
=
-
Recipe   | Passed   | Failed   | Skipped 
-
binutils | 230  | 2| 13  
binutils-gas | 1485 | 0| 1   
binutils-ld  | 1641 | 7| 354 
gcc  | 124619   | 132  | 26160   
gcc-g++  | 186472   | 56   | 19103   
gcc-libatomic| 22   | 1| 27  
gcc-libgomp  | 1427 | 2| 1671
gcc-libitm   | 24   | 1| 24  
gcc-libstdc++-v3 | 9102 | 33   | 5216
glibc| 4230 | 203  | 40  

Cheers,

Richard

Richard Purdie (5):
  Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets
  gcc: Fix "argument list too long" from install-plugins
  gcc: Add --nostdlib++ option
  gcc/nios2: Define the musl linker
  gcc: Pass sysroot options to cpp for preprocessed source

 Makefile.in  | 6 ++
 Makefile.tpl | 6 ++
 configure| 4 
 configure.ac | 4 
 gcc/Makefile.in  | 2 +-
 gcc/c-family/c.opt   | 4 
 gcc/config/nios2/linux.h | 1 +
 gcc/configure| 2 +-
 gcc/configure.ac | 2 +-
 gcc/cp/g++spec.c | 1 +
 gcc/cp/lang-specs.h  | 2 +-
 gcc/doc/invoke.texi  | 8 +++-
 gcc/gcc.c| 3 ++-
 13 files changed, 39 insertions(+), 6 deletions(-)

-- 
2.25.1



[PATCH 5/5] gcc: Pass sysroot options to cpp for preprocessed source

2021-10-27 Thread Richard Purdie via Gcc-patches
OpenEmbedded/Yocto Project extensively uses the --sysroot support within gcc.
We discovered that when compiling preprocessed source (.i or .ii files), the
compiler will try and access the builtin sysroot location rather than the
--sysroot option specified on the commandline. If access to that directory is
permission denied (unreadable), gcc will error. This is particularly problematic
when ccache is involved.

This patch adds %I to the cpp-output spec macro so the default substitutions for
-iprefix, -isystem, -isysroot happen and the correct sysroot is used.

2021-10-27 Richard Purdie 

gcc/cp/ChangeLog:

* lang-specs.h: Pass sysroot options to cpp for preprocessed source

gcc/ChangeLog:

* gcc.c: Pass sysroot options to cpp for preprocessed source

Signed-off-by: Richard Purdie 
---
 gcc/cp/lang-specs.h | 2 +-
 gcc/gcc.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/lang-specs.h b/gcc/cp/lang-specs.h
index 8902ae1d2ed..e99e2fcd6ad 100644
--- a/gcc/cp/lang-specs.h
+++ b/gcc/cp/lang-specs.h
@@ -116,7 +116,7 @@ along with GCC; see the file COPYING3.  If not see
   {".ii", "@c++-cpp-output", 0, 0, 0},
   {"@c++-cpp-output",
   "%{!E:%{!M:%{!MM:"
-  "  cc1plus -fpreprocessed %i %(cc1_options) %2"
+  "  cc1plus -fpreprocessed %i %I %(cc1_options) %2"
   "  %{!fsyntax-only:"
   "%{fmodule-only:%{!S:-o %g.s%V}}"
   "%{!fmodule-only:%{!fmodule-header*:%(invoke_as)}}}"
diff --git a/gcc/gcc.c b/gcc/gcc.c
index abb900a4247..51176becb86 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1472,7 +1472,7 @@ static const struct compiler default_compilers[] =
   %W{o*:--output-pch=%*}}%V}}}", 
0, 0, 0},
   {".i", "@cpp-output", 0, 0, 0},
   {"@cpp-output",
-   "%{!M:%{!MM:%{!E:cc1 -fpreprocessed %i %(cc1_options) 
%{!fsyntax-only:%(invoke_as)", 0, 0, 0},
+   "%{!M:%{!MM:%{!E:cc1 -fpreprocessed %i %I %(cc1_options) 
%{!fsyntax-only:%(invoke_as)", 0, 0, 0},
   {".s", "@assembler", 0, 0, 0},
   {"@assembler",
"%{!M:%{!MM:%{!E:%{!S:as %(asm_debug) %(asm_options) %i %A ", 0, 0, 0},
-- 
2.25.1



[PATCH 3/5] gcc: Add --nostdlib++ option

2021-10-27 Thread Richard Purdie via Gcc-patches
OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime libraries
separately from the compiler and slightly differently to the standard gcc build.

In general this works well but in trying to build them separately we run into
an issue since we're using our gcc, not xgcc and there is no way to tell 
configure
to use libgcc but not look for libstdc++.

This adds such an option allowing such configurations to work.

2021-10-26 Richard Purdie 

gcc/c-family/ChangeLog:

* c.opt: Add --nostdlib++ option

gcc/cp/ChangeLog:

* g++spec.c (lang_specific_driver): Add --nostdlib++ option

gcc/ChangeLog:

* doc/invoke.texi: Document --nostdlib++ option
* gcc.c: Add --nostdlib++ option

Signed-off-by: Richard Purdie 
---
 gcc/c-family/c.opt  | 4 
 gcc/cp/g++spec.c| 1 +
 gcc/doc/invoke.texi | 8 +++-
 gcc/gcc.c   | 1 +
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 06457ac739e..ee742c831fd 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2190,6 +2190,10 @@ nostdinc++
 C++ ObjC++
 Do not search standard system include directories for C++.
 
+nostdlib++
+Driver
+Do not link standard C++ runtime library
+
 o
 C ObjC C++ ObjC++ Joined Separate
 ; Documented in common.opt
diff --git a/gcc/cp/g++spec.c b/gcc/cp/g++spec.c
index 3c9bd1490b4..818beb61cee 100644
--- a/gcc/cp/g++spec.c
+++ b/gcc/cp/g++spec.c
@@ -159,6 +159,7 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
   switch (decoded_options[i].opt_index)
{
case OPT_nostdlib:
+   case OPT_nostdlib__:
case OPT_nodefaultlibs:
  library = -1;
  break;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71992b8c597..d89b08b3080 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -239,6 +239,7 @@ in the following sections.
 -fno-weak  -nostdinc++ @gol
 -fvisibility-inlines-hidden @gol
 -fvisibility-ms-compat @gol
+-nostdlib++ @gol
 -fext-numeric-literals @gol
 -flang-info-include-translate@r{[}=@var{header}@r{]} @gol
 -flang-info-include-translate-not @gol
@@ -638,7 +639,7 @@ Objective-C and Objective-C++ Dialects}.
 -pie  -pthread  -r  -rdynamic @gol
 -s  -static  -static-pie  -static-libgcc  -static-libstdc++ @gol
 -static-libasan  -static-libtsan  -static-liblsan  -static-libubsan @gol
--shared  -shared-libgcc  -symbolic @gol
+-shared  -shared-libgcc  -symbolic -nostdlib++ @gol
 -T @var{script}  -Wl,@var{option}  -Xlinker @var{option} @gol
 -u @var{symbol}  -z @var{keyword}}
 
@@ -16134,6 +16135,11 @@ Specify that the program entry point is @var{entry}.  
The argument is
 interpreted by the linker; the GNU linker accepts either a symbol name
 or an address.
 
+@item -nostdlib++
+@opindex nostdlib++
+Do not use the standard system C++ runtime libraries when linking.
+Only the libraries you specify will be passed to the linker.
+
 @item -pie
 @opindex pie
 Produce a dynamically linked position independent executable on targets
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 506c2acc282..abb900a4247 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1167,6 +1167,7 @@ proper position among the other output files.  */
 %(mflib) " STACK_SPLIT_SPEC "\
 %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} " SANITIZER_SPEC " \
 %{!nostdlib:%{!r:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}}\
+%{!nostdlib++:}\
 %{!nostdlib:%{!r:%{!nostartfiles:%E}}} %{T*}  \n%(post_link) }}"
 #endif
 
-- 
2.25.1



[PATCH 4/5] gcc/nios2: Define the musl linker

2021-10-27 Thread Richard Purdie via Gcc-patches
Add a definition of the musl linker used on the nios2 platform.

2021-10-26 Richard Purdie 

gcc/ChangeLog:

* config/nios2/linux.h (MUSL_DYNAMIC_LINKER): Add musl linker

Signed-off-by: Richard Purdie 
---
 gcc/config/nios2/linux.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/nios2/linux.h b/gcc/config/nios2/linux.h
index 08edf1521f6..15696d86241 100644
--- a/gcc/config/nios2/linux.h
+++ b/gcc/config/nios2/linux.h
@@ -30,6 +30,7 @@
 #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{pthread:-D_REENTRANT}"
 
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-nios2.so.1"
+#define MUSL_DYNAMIC_LINKER  "/lib/ld-musl-nios2.so.1"
 
 #undef LINK_SPEC
 #define LINK_SPEC LINK_SPEC_ENDIAN \
-- 
2.25.1



[PATCH 2/5] gcc: Fix "argument list too long" from install-plugins

2021-10-27 Thread Richard Purdie via Gcc-patches
When building in longer build paths (200+ characters), the
"echo $(PLUGIN_HEADERS)" from the install-plugins target would cause an
"argument list too long error" on some systems.

Avoid this by calling make's sort function on the list which removes
duplicates and stops the overflow from reaching the echo command.
The original sort is left to handle the the .h and .def files.

2021-10-26 Richard Purdie 

gcc/ChangeLog:

* Makefile.in: Fix "argument list too long" from install-plugins

Signed-off-by: Richard Purdie 
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 658093c11c0..89482c6dd4e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3685,7 +3685,7 @@ install-plugin: installdirs lang.install-plugin 
s-header-vars install-gengtype
 # We keep the directory structure for files in config, common/config or
 # c-family and .def files. All other files are flattened to a single directory.
$(mkinstalldirs) $(DESTDIR)$(plugin_includedir)
-   headers=`echo $(PLUGIN_HEADERS) $$(cd $(srcdir); echo *.h *.def) | tr ' 
' '\012' | sort -u`; \
+   headers=`echo $(sort $(PLUGIN_HEADERS)) $$(cd $(srcdir); echo *.h 
*.def) | tr ' ' '\012' | sort -u`; \
srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*|]/&/g'`; \
for file in $$headers; do \
  if [ -f $$file ] ; then \
-- 
2.25.1



Re: [PATCH] libcody: add mostlyclean Makefile target

2021-10-27 Thread Eric Gallager via Gcc-patches
On Tue, Oct 26, 2021 at 5:47 AM Martin Liška  wrote:
>
> On 10/25/21 18:10, Eric Gallager wrote:
> > On Mon, Oct 25, 2021 at 7:35 AM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> The patch adds missing Makefile mostlyclean.
> >>
> >> Ready to be installed?
> >> Thanks,
> >> Martin
> >>
> >
> > Generally the way the various "*clean" targets are arranged, in order
> > of cleanliness, from least clean to most clean, is:
> > mostlyclean
> > clean
> > distclean
> > maintainer-clean
> > ...with each target depending on the previous one in the order. So
> > thus, instead of mostlyclean depending on clean, it'd be the other way
> > around, with clean depending on mostlyclean. See how the gcc/
> > subdirectory does it, for example. See the "Standard Targets for
> > Users" section of the GNU Coding Standards:
> > https://www.gnu.org/prep/standards/html_node/Standard-Targets.html#Standard-Targets
>
> Thank you for the explanation.
>
> There's updated version of the patch.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Hi, the patch looks ok to me, although I can't approve it... Who is
the libcody maintainer, anyways? Nathan? Should that be listed in the
MAINTAINERS file?
Thanks,
Eric Gallager

> Thanks,
> Martin
>
> >
> >>  PR other/102657
> >>
> >> libcody/ChangeLog:
> >>
> >>  * Makefile.in: Add mostlyclean Makefile target.
> >> ---
> >>libcody/Makefile.in | 4 +++-
> >>1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/libcody/Makefile.in b/libcody/Makefile.in
> >> index b8b45a2e310..d8f1e8216d4 100644
> >> --- a/libcody/Makefile.in
> >> +++ b/libcody/Makefile.in
> >> @@ -111,7 +111,7 @@ maintainer-clean:: distclean
> >>clean::
> >>  rm -f $(shell find $(srcdir) -name '*~')
> >>
> >> -.PHONY: all check clean distclean maintainer-clean
> >> +.PHONY: all check clean distclean maintainer-clean mostlyclean
> >>
> >>CXXFLAGS/ := -I$(srcdir)
> >>LIBCODY.O := buffer.o client.o fatal.o netclient.o netserver.o \
> >> @@ -127,6 +127,8 @@ clean::
> >>  rm -f $(LIBCODY.O) $(LIBCODY.O:.o=.d)
> >>  rm -f libcody.a
> >>
> >> +mostlyclean: clean
> >> +
> >>CXXFLAGS/fatal.cc = -DSRCDIR='"$(srcdir)"'
> >>
> >>fatal.o: Makefile revision
> >> --
> >> 2.33.1
> >>


[PATCH] PR fortran/69419 - ICE: tree check: expected array_type, have real_type in gfc_conv_array_initializer, at fortran/trans-array.c:5618

2021-10-27 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

when debugging the testcase, I noticed that a coarray declaration in
a COMMON statement wrongly set the dimension attribute instead of the
codimension.  As a consequence, subsequent checks that catch this
invalid situation would not trigger.

I see two possible solutions:

- in gfc_match_common, replace

  /* Deal with an optional array specification after the
 symbol name.  */
  m = gfc_match_array_spec (&as, true, true);

  by

  m = gfc_match_array_spec (&as, true, false);

  which in turn would lead to a syntax error.  Interestingly, the Intel
  compiler also takes this route and gives a syntax error.

- check the resulting as->corank and emit an error as in the attached
  patch.

The attached patch regtests fine on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

Fortran: a symbol in a COMMON cannot be a coarray

gcc/fortran/ChangeLog:

	PR fortran/69419
	* match.c (gfc_match_common): Check array spec of a symbol in a
	COMMON object list and reject it if it is a coarray.

gcc/testsuite/ChangeLog:

	PR fortran/69419
	* gfortran.dg/pr69419.f90: New test.

diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index 53a575e616e..df97620634d 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -5314,6 +5314,13 @@ gfc_match_common (void)
 		  goto cleanup;
 		}

+	  if (as->corank)
+		{
+		  gfc_error ("Symbol %qs in COMMON at %C cannot be a "
+			 "coarray", sym->name);
+		  goto cleanup;
+		}
+
 	  if (!gfc_add_dimension (&sym->attr, sym->name, NULL))
 		goto cleanup;

diff --git a/gcc/testsuite/gfortran.dg/pr69419.f90 b/gcc/testsuite/gfortran.dg/pr69419.f90
new file mode 100644
index 000..7329808611c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr69419.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! { dg-options "-fcoarray=lib" }
+! PR fortran/69419 - ICE on invalid coarray in common
+
+blockdata b
+   real x   ! { dg-error "must be in COMMON" }
+   common /c/ x[*]  ! { dg-error "cannot be a coarray" }
+   data x /1.0/
+end


Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-27 Thread Patrick Palka via Gcc-patches
On Tue, 26 Oct 2021, Jakub Jelinek wrote:

> On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote:
> > The performance impact of the other calls to cxx_eval_outermost_const_expr
> > from p_c_e_1 is probably already mostly mitigated by the constexpr call
> > cache and the fact that we try to evaluate all calls to constexpr
> > functions during cp_fold_function anyway (at least with -O).  So trial
> 
> constexpr function bodies don't go through cp_fold_function (intentionally,
> so that we don't optimize away UB), the bodies are copied before the trees of 
> the
> normal copy are folded.

Ah right, I had forgotten about that..

Here's another approach that doesn't need to remove trial evaluation for
&&/||.  The idea is to first quietly check if the second operand is
potentially constant _before_ performing trial evaluation of the first
operand.  This speeds up the case we care about (both operands are
potentially constant) without regressing any diagnostics.  We have to be
careful about emitting bogus diagnostics when tf_error is set, hence the
first hunk below which makes p_c_e_1 always proceed quietly first, and
replay noisily in case of error (similar to how satisfaction works).

Would something like this be preferable?

-- >8 --

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1): When tf_error is
set, proceed quietly first and return true if successful.
: When tf_error is not set, check potentiality
of the second operand before performing trial evaluation of the
first operand rather than after.


diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6f83d303cdd..821bd41d994 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -8056,6 +8056,14 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
 #define RECUR(T,RV) \
   potential_constant_expression_1 ((T), (RV), strict, now, flags, jump_target)
 
+  if (flags & tf_error)
+{
+  flags &= ~tf_error;
+  if (RECUR (t, want_rval))
+   return true;
+  flags |= tf_error;
+}
+
   enum { any = false, rval = true };
   int i;
   tree tmp;
@@ -8892,13 +8900,16 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
   tmp = boolean_false_node;
 truth:
   {
-   tree op = TREE_OPERAND (t, 0);
-   if (!RECUR (op, rval))
+   tree op0 = TREE_OPERAND (t, 0);
+   tree op1 = TREE_OPERAND (t, 1);
+   if (!RECUR (op0, rval))
  return false;
+   if (!(flags & tf_error) && RECUR (op1, rval))
+ return true;
if (!processing_template_decl)
- op = cxx_eval_outermost_constant_expr (op, true);
-   if (tree_int_cst_equal (op, tmp))
- return RECUR (TREE_OPERAND (t, 1), rval);
+ op0 = cxx_eval_outermost_constant_expr (op0, true);
+   if (tree_int_cst_equal (op0, tmp))
+ return (flags & tf_error) ? RECUR (op1, rval) : false;
else
  return true;
   }



Re: [PATCH] rs6000: Fix bootstrap (libffi)

2021-10-27 Thread H.J. Lu via Gcc-patches
On Mon, Oct 25, 2021 at 4:39 PM Segher Boessenkool
 wrote:
>
> This fixes bootstrap for the current problems building libffi.
>
> I'll work on getting this into upstream as well.  If the maintainers
> want it done differently, at least we have bootstrap working again
> until then.
>
> Tested on powerpc64-linux {-m32,-m64}.
>
>
> Segher
>
>
> 2021-10-25  Segher Boessenkool  
>
> libffi/
> * src/powerpc/linux64.S: Enable AltiVec insns.
> * src/powerpc/linux64_closure.S: Ditto.
> ---
>  libffi/src/powerpc/linux64.S | 2 ++
>  libffi/src/powerpc/linux64_closure.S | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/libffi/src/powerpc/linux64.S b/libffi/src/powerpc/linux64.S
> index e92d64af34fd..1f876ea39edd 100644
> --- a/libffi/src/powerpc/linux64.S
> +++ b/libffi/src/powerpc/linux64.S
> @@ -29,6 +29,8 @@
>  #include 
>  #include 
>
> +   .machine altivec
> +
>  #ifdef POWERPC64
> .hidden ffi_call_LINUX64
> .globl  ffi_call_LINUX64
> diff --git a/libffi/src/powerpc/linux64_closure.S 
> b/libffi/src/powerpc/linux64_closure.S
> index 3469a2cbb01e..199981db3307 100644
> --- a/libffi/src/powerpc/linux64_closure.S
> +++ b/libffi/src/powerpc/linux64_closure.S
> @@ -30,6 +30,8 @@
>
> .file   "linux64_closure.S"
>
> +   .machine altivec
> +
>  #ifdef POWERPC64
> FFI_HIDDEN (ffi_closure_LINUX64)
> .globl  ffi_closure_LINUX64
> --
> 1.8.3.1
>

I am checking in this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582717.html

-- 
H.J.


[PATCH] libffi: Update LOCAL_PATCHES

2021-10-27 Thread H.J. Lu via Gcc-patches
Add

commit 90205f67e465ae7dfcf733c2b2b177ca7ff68da0
Author: Segher Boessenkool 
Date:   Mon Oct 25 23:29:26 2021 +

rs6000: Fix bootstrap (libffi)

This fixes bootstrap for the current problems building libffi.

to LOCAL_PATCHES.

* LOCAL_PATCHES: Add commit 90454a90082.
---
 libffi/LOCAL_PATCHES | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libffi/LOCAL_PATCHES b/libffi/LOCAL_PATCHES
index a377c28ce8d..f9e74660950 100644
--- a/libffi/LOCAL_PATCHES
+++ b/libffi/LOCAL_PATCHES
@@ -1,2 +1,3 @@
 5be7b66998127286fada45e4f23bd8a2056d553e
 4824ed41ba7cd63e60fd9f8769a58b79935a90d1
+90205f67e465ae7dfcf733c2b2b177ca7ff68da0
-- 
2.32.0



[pushed] Darwin, config: Amend for Darwin 21 / macOS 12.

2021-10-27 Thread Iain Sandoe via Gcc-patches
From: Saagar Jha 

Patch from the Arm64 Darwin branch, originally by Saagar Jha.

It seems that the OS major version is now tracking the kernel
major version - 9.  Minor version has been set to kernel
minor - 1 as for Darwin20.

Tested on x86-64-darwin21, darwin20, darwin19, i686-darwin9,
x86_64-linux-gnu. Pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 
Signed-off-by: Saagar Jha 

gcc/ChangeLog:

* config.gcc: Adjust for Darwin21.
* config/darwin-c.c (macosx_version_as_macro): Likewise.
* config/darwin-driver.c (validate_macosx_version_min):
Likewise.
(darwin_find_version_from_kernel): Likewise.
---
 gcc/config.gcc |  6 +++---
 gcc/config/darwin-c.c  |  2 +-
 gcc/config/darwin-driver.c | 10 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index efd1f42ac23..b1082cdbab1 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -681,9 +681,9 @@ case ${target} in
   *-*-darwin[4-9]* | *-*-darwin1[0-9]*)
 macos_min=`expr $darwin_maj - 4`
 ;;
-  *-*-darwin20*)
-# Darwin 20 corresponds to macOS 11.
-macos_maj=11
+  *-*-darwin2*)
+# Darwin 20 corresponds to macOS 11, Darwin 21 to macOS 12.
+macos_maj=`expr $darwin_maj - 9`
 def_ld64=609.0
 ;;
   *-*-darwin)
diff --git a/gcc/config/darwin-c.c b/gcc/config/darwin-c.c
index 951a998775f..62d28fcea50 100644
--- a/gcc/config/darwin-c.c
+++ b/gcc/config/darwin-c.c
@@ -691,7 +691,7 @@ macosx_version_as_macro (void)
   if (!version_array)
 goto fail;
 
-  if (version_array[MAJOR] < 10 || version_array[MAJOR] > 11)
+  if (version_array[MAJOR] < 10 || version_array[MAJOR] > 12)
 goto fail;
 
   if (version_array[MAJOR] == 10 && version_array[MINOR] < 10)
diff --git a/gcc/config/darwin-driver.c b/gcc/config/darwin-driver.c
index a036e091c48..4f0c6bad61f 100644
--- a/gcc/config/darwin-driver.c
+++ b/gcc/config/darwin-driver.c
@@ -64,17 +64,17 @@ validate_macosx_version_min (const char *version_str)
 
   major = strtoul (version_str, &end, 10);
 
-  if (major < 10 || major > 11 ) /* MacOS 10 and 11 are known. */
+  if (major < 10 || major > 12 ) /* macOS 10, 11, and 12 are known. */
 return NULL;
 
   /* Skip a separating period, if there's one.  */
   version_str = end + ((*end == '.') ? 1 : 0);
 
-  if (major == 11 && *end != '\0' && !ISDIGIT (version_str[0]))
- /* For MacOS 11, we allow just the major number, but if the minor is
+  if (major > 10 && *end != '\0' && !ISDIGIT (version_str[0]))
+ /* For macOS 11+, we allow just the major number, but if the minor is
there it must be numeric.  */
 return NULL;
-  else if (major == 11 && *end == '\0')
+  else if (major > 10 && *end == '\0')
 /* We will rewrite 11 =>  11.0.0.  */
 need_rewrite = true;
   else if (major == 10 && (*end == '\0' || !ISDIGIT (version_str[0])))
@@ -172,7 +172,7 @@ darwin_find_version_from_kernel (void)
   if (minor_vers > 0)
minor_vers -= 1; /* Kernel 20.3 => macOS 11.2.  */
   /* It's not yet clear whether patch level will be considered.  */
-  asprintf (&new_flag, "11.%02d.00", minor_vers);
+  asprintf (&new_flag, "%d.%02d.00", major_vers - 9, minor_vers);
 }
   else if (major_vers - 4 <= 4)
 /* On 10.4 and earlier, the old linker is used which does not
-- 
2.24.3 (Apple Git-128)



[committed] hppa: Fix warnings building linux-atomic.c and fptr.c on hppa64-linux

2021-10-27 Thread John David Anglin

This change fixes a couple of warnings observed building libgcc on hppa64-linux.

The hppa64-linux target uses OPDs and doesn't require any special code to 
canonicalize
function pointers for comparison.  I removed inclusion of pa/t-linux from 
tmake_file and
adjusted pa/t-linux64 to fix this issue.

I defined types u8, u16 and u64 in linux-atomic.s to fix the type mismatch 
warning from
linux-atomic.c.

Tested on hppa-unknown-linux-gnu and hppa64-unknown-linux-gnu.

Committed to active branches.

Dave
---
Fix warnings building linux-atomic.c and fptr.c on hppa64-linux

The file fptr.c is specific to 32-bit hppa-linux and should not be
included in LIB2ADD on hppa64-linux.

There is a builtin type mismatch in linux-atomic.c using the type
long long unsigned int for 64-bit atomic operations on hppa64-linux.

2021-10-27  John David Anglin  

libgcc/ChangeLog:

* config.host (hppa*64*-*-linux*): Don't add pa/t-linux to
tmake_file.
* config/pa/linux-atomic.c: Define u8, u16 and u64 types.
Use them in FETCH_AND_OP_2, OP_AND_FETCH_2, COMPARE_AND_SWAP_2,
SYNC_LOCK_TEST_AND_SET_2 and SYNC_LOCK_RELEASE_1 macros.
* config/pa/t-linux64 (LIB1ASMSRC): New define.
(LIB1ASMFUNCS): Revise.
(HOST_LIBGCC2_CFLAGS): Add "-DLINUX=1".

diff --git a/libgcc/config.host b/libgcc/config.host
index 6c34b13d611..85de83da766 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -619,7 +619,7 @@ h8300-*-linux*)
tm_file="$tm_file h8300/h8300-lib.h"
;;
 hppa*64*-*-linux*)
-   tmake_file="$tmake_file pa/t-linux pa/t-linux64"
+   tmake_file="$tmake_file pa/t-linux64"
extra_parts="crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o"
;;
 hppa*-*-linux*)
diff --git a/libgcc/config/pa/linux-atomic.c b/libgcc/config/pa/linux-atomic.c
index c882b55a127..500a3652499 100644
--- a/libgcc/config/pa/linux-atomic.c
+++ b/libgcc/config/pa/linux-atomic.c
@@ -28,6 +28,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define EBUSY   16
 #define ENOSYS 251

+typedef unsigned char u8;
+typedef short unsigned int u16;
+#ifdef __LP64__
+typedef long unsigned int u64;
+#else
+typedef long long unsigned int u64;
+#endif
+
 /* PA-RISC 2.0 supports out-of-order execution for loads and stores.
Thus, we need to synchonize memory accesses.  For more info, see:
"Advanced Performance Features of the 64-bit PA-8000" by Doug Hunt.
@@ -117,26 +125,26 @@ __kernel_cmpxchg2 (volatile void *mem, const void 
*oldval, const void *newval,
 return tmp;
\
   }

-FETCH_AND_OP_2 (add,   , +, long long unsigned int, 8, 3)
-FETCH_AND_OP_2 (sub,   , -, long long unsigned int, 8, 3)
-FETCH_AND_OP_2 (or,, |, long long unsigned int, 8, 3)
-FETCH_AND_OP_2 (and,   , &, long long unsigned int, 8, 3)
-FETCH_AND_OP_2 (xor,   , ^, long long unsigned int, 8, 3)
-FETCH_AND_OP_2 (nand, ~, &, long long unsigned int, 8, 3)
-
-FETCH_AND_OP_2 (add,   , +, short unsigned int, 2, 1)
-FETCH_AND_OP_2 (sub,   , -, short unsigned int, 2, 1)
-FETCH_AND_OP_2 (or,, |, short unsigned int, 2, 1)
-FETCH_AND_OP_2 (and,   , &, short unsigned int, 2, 1)
-FETCH_AND_OP_2 (xor,   , ^, short unsigned int, 2, 1)
-FETCH_AND_OP_2 (nand, ~, &, short unsigned int, 2, 1)
-
-FETCH_AND_OP_2 (add,   , +, unsigned char, 1, 0)
-FETCH_AND_OP_2 (sub,   , -, unsigned char, 1, 0)
-FETCH_AND_OP_2 (or,, |, unsigned char, 1, 0)
-FETCH_AND_OP_2 (and,   , &, unsigned char, 1, 0)
-FETCH_AND_OP_2 (xor,   , ^, unsigned char, 1, 0)
-FETCH_AND_OP_2 (nand, ~, &, unsigned char, 1, 0)
+FETCH_AND_OP_2 (add,   , +, u64, 8, 3)
+FETCH_AND_OP_2 (sub,   , -, u64, 8, 3)
+FETCH_AND_OP_2 (or,, |, u64, 8, 3)
+FETCH_AND_OP_2 (and,   , &, u64, 8, 3)
+FETCH_AND_OP_2 (xor,   , ^, u64, 8, 3)
+FETCH_AND_OP_2 (nand, ~, &, u64, 8, 3)
+
+FETCH_AND_OP_2 (add,   , +, u16, 2, 1)
+FETCH_AND_OP_2 (sub,   , -, u16, 2, 1)
+FETCH_AND_OP_2 (or,, |, u16, 2, 1)
+FETCH_AND_OP_2 (and,   , &, u16, 2, 1)
+FETCH_AND_OP_2 (xor,   , ^, u16, 2, 1)
+FETCH_AND_OP_2 (nand, ~, &, u16, 2, 1)
+
+FETCH_AND_OP_2 (add,   , +, u8, 1, 0)
+FETCH_AND_OP_2 (sub,   , -, u8, 1, 0)
+FETCH_AND_OP_2 (or,, |, u8, 1, 0)
+FETCH_AND_OP_2 (and,   , &, u8, 1, 0)
+FETCH_AND_OP_2 (xor,   , ^, u8, 1, 0)
+FETCH_AND_OP_2 (nand, ~, &, u8, 1, 0)

 #define OP_AND_FETCH_2(OP, PFX_OP, INF_OP, TYPE, WIDTH, INDEX) \
   TYPE HIDDEN  \
@@ -154,26 +162,26 @@ FETCH_AND_OP_2 (nand, ~, &, unsigned char, 1, 0)
 return PFX_OP (tmp INF_OP val);\
   }

-OP_AND_FETCH_2 (add,   , +, long long unsigned int, 8, 3)
-OP_AND_FETCH_2 (sub,   , -, long long unsigned int, 8, 3)
-OP_AND_FETCH_2 (or,, |, long long unsigned int, 8, 3)
-OP_AND_FETCH_2 (and,   , &, long long unsigned int, 8, 3)
-OP_AND_FETCH_2 (xor,   , ^, long long unsigned int, 8, 3)
-OP_AND_FETCH_2 (nand, ~, &, long long unsigned in

[COMMITTED] Kill second order relations in the path solver.

2021-10-27 Thread Aldy Hernandez via Gcc-patches
My upcoming work replacing the VRP threaders with a fully resolving
backward threader has tripped over various corner cases in the path
sensitive relation oracle.  This patch kills second order relations when
we kill a relation.

Tested on x86-64 and ppc64le Linux.

Co-authored-by: Andrew MacLeod 

gcc/ChangeLog:

* value-relation.cc (path_oracle::killing_def): Kill second
order relations.
---
 gcc/value-relation.cc | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 2acf375ca9a..0ad4f7a9495 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -1297,8 +1297,9 @@ path_oracle::killing_def (tree ssa)
   fprintf (dump_file, "\n");
 }
 
+  unsigned v = SSA_NAME_VERSION (ssa);
   bitmap b = BITMAP_ALLOC (&m_bitmaps);
-  bitmap_set_bit (b, SSA_NAME_VERSION (ssa));
+  bitmap_set_bit (b, v);
   equiv_chain *ptr = (equiv_chain *) obstack_alloc (&m_chain_obstack,
sizeof (equiv_chain));
   ptr->m_names = b;
@@ -1306,6 +1307,24 @@ path_oracle::killing_def (tree ssa)
   ptr->m_next = m_equiv.m_next;
   m_equiv.m_next = ptr;
   bitmap_ior_into (m_equiv.m_names, b);
+
+  // Walk the relation list an remove SSA from any relations.
+  if (!bitmap_bit_p (m_relations.m_names, v))
+return;
+
+  bitmap_clear_bit (m_relations.m_names, v);
+  relation_chain **prev = &(m_relations.m_head);
+  relation_chain *next = NULL;
+  for (relation_chain *ptr = m_relations.m_head; ptr; ptr = next)
+{
+  gcc_checking_assert (*prev == ptr);
+  next = ptr->m_next;
+  if (SSA_NAME_VERSION (ptr->op1 ()) == v
+ || SSA_NAME_VERSION (ptr->op2 ()) == v)
+   *prev = ptr->m_next;
+  else
+   prev = &(ptr->m_next);
+}
 }
 
 // Register relation K between SSA1 and SSA2, resolving unknowns by
-- 
2.31.1



[COMMITTED] Kill known equivalences before a new assignment in the path solver.

2021-10-27 Thread Aldy Hernandez via Gcc-patches
Every time we have a killing statement, we must also kill the relations
seen so far.  This is similar to what we did for the equivs inherent in
PHIs along a path.

Tested on x86-64 and ppc64le Linux.

gcc/ChangeLog:

* gimple-range-path.cc
  (path_range_query::range_defined_in_block): Call
  killing_def.
---
 gcc/gimple-range-path.cc | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 2f570a13e05..d8c2a9b6a86 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -288,8 +288,14 @@ path_range_query::range_defined_in_block (irange &r, tree 
name, basic_block bb)
 
   if (gimple_code (def_stmt) == GIMPLE_PHI)
 ssa_range_in_phi (r, as_a (def_stmt));
-  else if (!range_of_stmt (r, def_stmt, name))
-r.set_varying (TREE_TYPE (name));
+  else
+{
+  if (name)
+   get_path_oracle ()->killing_def (name);
+
+  if (!range_of_stmt (r, def_stmt, name))
+   r.set_varying (TREE_TYPE (name));
+}
 
   if (bb)
 m_non_null.adjust_range (r, name, bb);
-- 
2.31.1



[COMMITTED] Reorder relation calculating code in the path solver.

2021-10-27 Thread Aldy Hernandez via Gcc-patches
Enabling the fully resolving threader triggers various relation
ordering issues that have previously been dormant because the VRP
hybrid threader (forward threader based) never gives us long enough
paths for this to matter.  The new threader spares no punches in
finding non-obvious paths, so getting the relations right is
paramount.

This patch fixes a couple oversights that have gone undetected.

First, some background.  There are 3 types of relations along a path:

a) Relations inherent in a PHI.
b) Relations as a side-effect of evaluating a statement.
c) Outgoing relations between blocks in a path.

We must calculate these in their proper order, otherwise we can run
into ordering issues.  The current ordering is wrong, as we
precalculate PHIs for _all_ blocks before anything else, and then
proceed to register the relations throughout the path.  Also, we fail
to realize that a PHI whose argument is also defined in the PHIs block
cannot be registered as an equivalence without causing more ordering
issues.

This patch fixes all the problems described above.  With it we get a
handful more net threads, but most importantly, we disallow some
threads that were wrong.

Tested on x86-64 and ppc64le Linux on the usual regstrap, plus by
comparing the different thread counts before and after this patch.

gcc/ChangeLog:

* gimple-range-fold.cc (fold_using_range::range_of_range_op): Dump
operands as well as relation.
* gimple-range-path.cc
(path_range_query::compute_ranges_in_block): Compute PHI relations
first.  Compute outgoing relations at the end.
(path_range_query::compute_ranges): Remove call to compute_relations.
(path_range_query::compute_relations): Remove.
(path_range_query::maybe_register_phi_relation): New.
(path_range_query::compute_phi_relations): Abstract out
registering one PHI relation to...
(path_range_query::compute_outgoing_relations): ...here.
* gimple-range-path.h (class path_range_query): Remove
compute_relations.
Add maybe_register_phi_relation.
---
 gcc/gimple-range-fold.cc |   2 +
 gcc/gimple-range-path.cc | 107 ---
 gcc/gimple-range-path.h  |   3 +-
 3 files changed, 58 insertions(+), 54 deletions(-)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index ed2fbe121cf..2fab904e6b0 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -620,7 +620,9 @@ fold_using_range::range_of_range_op (irange &r, gimple *s, 
fur_source &src)
  if (dump_file && (dump_flags & TDF_DETAILS) && rel != VREL_NONE)
{
  fprintf (dump_file, " folding with relation ");
+ print_generic_expr (dump_file, op1, TDF_SLIM);
  print_relation (dump_file, rel);
+ print_generic_expr (dump_file, op2, TDF_SLIM);
  fputc ('\n', dump_file);
}
  // Fold range, and register any dependency if available.
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 557338993ae..2f570a13e05 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -316,6 +316,9 @@ path_range_query::compute_ranges_in_block (basic_block bb)
   int_range_max r, cached_range;
   unsigned i;
 
+  if (m_resolve && !at_entry ())
+compute_phi_relations (bb, prev_bb ());
+
   // Force recalculation of any names in the cache that are defined in
   // this block.  This can happen on interdependent SSA/phis in loops.
   EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
@@ -341,7 +344,8 @@ path_range_query::compute_ranges_in_block (basic_block bb)
 return;
 
   // Solve imports that are exported to the next block.
-  edge e = find_edge (bb, next_bb ());
+  basic_block next = next_bb ();
+  edge e = find_edge (bb, next);
   EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
 {
   tree name = ssa_name (i);
@@ -369,6 +373,9 @@ path_range_query::compute_ranges_in_block (basic_block bb)
}
}
 }
+
+  if (m_resolve)
+compute_outgoing_relations (bb, next);
 }
 
 // Adjust all pointer imports in BB with non-null information.
@@ -485,7 +492,6 @@ path_range_query::compute_ranges (const vec 
&path,
 {
   add_copies_to_imports ();
   get_path_oracle ()->reset_path ();
-  compute_relations (path);
 }
 
   if (DEBUG_SOLVER)
@@ -527,7 +533,12 @@ path_range_query::compute_ranges (const vec 
&path,
 }
 
   if (DEBUG_SOLVER)
-dump (dump_file);
+{
+  fprintf (dump_file, "\npath_oracle:\n");
+  get_path_oracle ()->dump (dump_file);
+  fprintf (dump_file, "\n");
+  dump (dump_file);
+}
 }
 
 // A folding aid used to register and query relations along a path.
@@ -624,49 +635,23 @@ path_range_query::range_of_stmt (irange &r, gimple *stmt, 
tree)
   return true;
 }
 
-// Compute relations on a path.  This involves two parts: relations
-// along the conditionals joining a path, and relations determined by
-

Re: Merge from trunk to gccgo branch

2021-10-27 Thread Ian Lance Taylor via Gcc-patches
I merged trunk revision 99b1021d21e5812ed01221d8fca8e8a32488a934 to
the gccgo branch.

Ian


Re: [Patch, Fortran] PR 86935: Bad locus in ASSOCIATE statement

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
AFAICS current trunk still has this issue.
Any takers?
thanks,

On Sun, 2 Sep 2018 17:16:07 +0200
Bernhard Reutner-Fischer  wrote:

>i spotted one
> (pre-existing) possible inconsistency that i did overlook back then:
> 
> gfc_match_associate () reads
> ...
>   if (gfc_match (" %e", &newAssoc->target) != MATCH_YES)
> {
>   /* Have another go, allowing for procedure pointer selectors.  */
>   gfc_matching_procptr_assignment = 1;
>   if (gfc_match (" %e", &newAssoc->target) != MATCH_YES)
> {
>   gfc_error ("Invalid association target at %C");
>   goto assocListError;
> }
>   gfc_matching_procptr_assignment = 0;
> }
> 
> i.e. we retry a match, but in the second attempt we turn on procptr
> assignment matching and if that works, we turn procptr assignment
> matching off again.
> But if we fail that retry, we forget to turn it off again.
> I suppose we should:
> 
> $ svn diff -x -p gcc/fortran/match.c
> Index: gcc/fortran/match.c
> ===
> --- gcc/fortran/match.c (revision 264040)
> +++ gcc/fortran/match.c (working copy)
> @@ -1898,13 +1898,16 @@ gfc_match_associate (void)
>if (gfc_match (" %e", &newAssoc->target) != MATCH_YES)
>   {
> /* Have another go, allowing for procedure pointer selectors.  */
> +   match m;
> +
> gfc_matching_procptr_assignment = 1;
> -   if (gfc_match (" %e", &newAssoc->target) != MATCH_YES)
> +   m = gfc_match (" %e", &newAssoc->target);
> +   gfc_matching_procptr_assignment = 0;
> +   if (m != MATCH_YES)
>   {
> gfc_error ("Invalid association target at %C");
> goto assocListError;
>   }
> -   gfc_matching_procptr_assignment = 0;
>   }
>newAssoc->where = gfc_current_locus;
> 
> 
> Untested. Maybe someone wants to give it a whirl...
> If it wrecks havoc then leaving it set deliberately deserves at least a 
> comment.
> 
> PS: It would be nice to get rid of gfc_matching_procptr_assignment,
> gfc_matching_ptr_assignment, gfc_matching_prefix, FWIW.
> cheers,
> >
> > Thanks everyone!
> >
> > Cheers,
> > Janus  



Re: [PATCH] First refactor of vect_analyze_loop

2021-10-27 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This refactors the main loop analysis part in vect_analyze_loop,
> re-purposing the existing vect_reanalyze_as_main_loop for this
> to reduce code duplication.  Failure flow is a bit tricky since
> we want to extract info from the analyzed loop but I wanted to
> share the destruction part.  Thus I add some std::function and
> lambda to funnel post-analysis for the case we want that
> (when analyzing from the main iteration but not when re-analyzing
> an epilogue as main).

Thanks for cleaning this up.

FWIW, as I mentioned on irc, I think the loop could be simplified quite
a bit if we were prepared to analyse loops both as an epilogue and
(independently) as a main loop.

I think the geology of the code is something like this:

layer 1:
  Original loop that tries fallback vector modes if the autodetected
  one fails.

layer 2:
  Add support for simdlen.  This required continuing after finding
  a match in case a later mode corresponded with the simdlen.

layer 3:
  Add epilogue vinfos.

layer 4:
  Restructure to support layers 5 and 6.

layer 5:
  Add support for multiple vector sizes in a loop.  This needed extra
  code to avoid redundant analysis attempts.

layer 6:
  Add VECT_COMPARE_COSTS (first cut).  At the time this was relatively
  simple [bcc7e346bf9b5dc77797ea949d6adc740deb30ca] since it just meant
  tweaking the “continuing” condition from (2).

  However, a (deliberate) wart was that it only tried treating each
  mode as a replacement for the loop_vinfo at the end of the current
  list (if the main loop is the head of the list and epilogues follow).

  This was supposed to be a compile-time improvement, since it meant
  we still only analysed with each mode once.

layer 7:
  Reanalyze a replacement epilogue loop as a main loop before comparing
  it with the existing main loop.  This prevented a wrong code bug but
  defeated part of the compile-time optimisation from (6).

So it's already necessary to analyse a loop as both an epilogue loop
and a main loop in some cases.

The requirement to analyse loops only once also prevents us from being
able to vectorise the epilogue of an omp simdlen loop, because for
something like -mpreferred-vector-width=256, we'd try AVX256 before
AVX512, even if the simdlen forced AVX512.

> I realize this probably doesn't help the unroll case yet, but it
> looked like an improvement.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
> 2021-10-27  Richard Biener  
>
>   * tree-vect-loop.c: Include .
>   (vect_reanalyze_as_main_loop): Rename to...
>   (vect_analyze_loop_1): ... this and generalize to be
>   able to use it twice ...
>   (vect_analyze_loop): ... here.
> ---
>  gcc/tree-vect-loop.c | 202 ++-
>  1 file changed, 102 insertions(+), 100 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 961c1623f81..9a62475a69f 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
>  .  */
>  
>  #define INCLUDE_ALGORITHM
> +#define INCLUDE_FUNCTIONAL
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -2898,43 +2899,63 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
>return true;
>  }
>  
> -/* If LOOP_VINFO is already a main loop, return it unmodified.  Otherwise
> -   try to reanalyze it as a main loop.  Return the loop_vinfo on success
> -   and null on failure.  */
> +/* Analyze LOOP with VECTOR_MODE and as epilogue if MAIN_LOOP_VINFO is
> +   not NULL.  Process the analyzed loop with PROCESS even if analysis
> +   failed.  Sets *N_STMTS and FATAL according to the analysis.
> +   Return the loop_vinfo on success and wrapped null on failure.  */
>  
> -static loop_vec_info
> -vect_reanalyze_as_main_loop (loop_vec_info loop_vinfo, unsigned int *n_stmts)
> +static opt_loop_vec_info
> +vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared,
> +  machine_mode vector_mode, loop_vec_info main_loop_vinfo,
> +  unsigned int *n_stmts, bool &fatal,
> +  std::function process = nullptr)
>  {
> -  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> -return loop_vinfo;
> +  /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
> +  opt_loop_vec_info loop_vinfo = vect_analyze_loop_form (loop, shared);
> +  if (!loop_vinfo)
> +{
> +  if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "bad loop form.\n");
> +  gcc_checking_assert (main_loop_vinfo == NULL);
> +  return loop_vinfo;
> +}
> +  loop_vinfo->vector_mode = vector_mode;
>  
> -  if (dump_enabled_p ())
> -dump_printf_loc (MSG_NOTE, vect_location,
> -  "* Reanalyzing as a main loop with vector mode %s\n",
> -  GET_MODE_NAME (loop_vinfo->vector_

Re: [V2/PATCH] Fix tree-optimization/102216: missed optimization causing Warray-bounds

2021-10-27 Thread Martin Sebor via Gcc-patches

On 10/27/21 3:59 AM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

The problem here is tree-ssa-forwprop.c likes to produce
&MEM  [(void *)_4 + 152B] which is the same as
_4 p+ 152 which the rest of GCC likes better.
This implements this transformation back to pointer plus to
improve better code generation later on.


Since the purpose of this transformation is to avoid a bogus
-Warray-bounds can you please include a test case showing
the difference it makes? (I.e., one that warns without
the patch and doesn't with it.  The test in the patch doesn't
trigger a warning for me.)

Thanks
Martin



OK? Bootstrapped and tested on aarch64-linux-gnu.

Changes from v1:
* v2: Add comments.

gcc/ChangeLog:

PR tree-optimization/102216
* tree-ssa-forwprop.c (rewrite_assign_addr): New function.
(forward_propagate_addr_expr_1): Use rewrite_assign_addr
when rewriting into the addr_expr into an assignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/102216
* g++.dg/tree-ssa/pr102216.C: New test.
---
  gcc/testsuite/g++.dg/tree-ssa/pr102216.C | 22 +
  gcc/tree-ssa-forwprop.c  | 58 ++--
  2 files changed, 67 insertions(+), 13 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr102216.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr102216.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
new file mode 100644
index 000..b903e4eb57d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
@@ -0,0 +1,22 @@
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+void link_error ();
+void g ()
+{
+  const char **language_names;
+
+  language_names = new const char *[6];
+
+  const char **language_names_p = language_names;
+
+  language_names_p++;
+  language_names_p++;
+  language_names_p++;
+
+  if ( (language_names_p) - (language_names+3) != 0)
+link_error();
+  delete[] language_names;
+}
+/* We should have removed the link_error on the gimple level as GCC should
+   be able to tell that language_names_p is the same as language_names+3.  */
+/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" } } */
+
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index a830bab78ba..e4331c60525 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -637,6 +637,47 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
return 0;
  }
  
+/* Rewrite the DEF_RHS as needed into the (plain) use statement.  */

+
+static void
+rewrite_assign_addr (gimple_stmt_iterator *use_stmt_gsi, tree def_rhs)
+{
+  tree def_rhs_base;
+  poly_int64 def_rhs_offset;
+
+  /* Get the base and offset.  */
+  if ((def_rhs_base = get_addr_base_and_unit_offset (TREE_OPERAND (def_rhs, 0),
+&def_rhs_offset)))
+{
+  tree new_ptr;
+  poly_offset_int off = 0;
+
+  /* If the base was a MEM, then add the offset to the other
+ offset and adjust the base. */
+  if (TREE_CODE (def_rhs_base) == MEM_REF)
+   {
+ off += mem_ref_offset (def_rhs_base);
+ new_ptr = TREE_OPERAND (def_rhs_base, 0);
+   }
+  else
+   new_ptr = build_fold_addr_expr (def_rhs_base);
+
+  /* If we have the new base is not an address express, then use a p+ 
expression
+ as the new expression instead of &MEM[x, offset]. */
+  if (TREE_CODE (new_ptr) != ADDR_EXPR)
+   {
+ tree offset = wide_int_to_tree (sizetype, off);
+ def_rhs = build2 (POINTER_PLUS_EXPR, TREE_TYPE (def_rhs), new_ptr, 
offset);
+   }
+}
+
+  /* Replace the rhs with the new expression.  */
+  def_rhs = unshare_expr (def_rhs);
+  gimple_assign_set_rhs_from_tree (use_stmt_gsi, def_rhs);
+  gimple *use_stmt = gsi_stmt (*use_stmt_gsi);
+  update_stmt (use_stmt);
+}
+
  /* We've just substituted an ADDR_EXPR into stmt.  Update all the
 relevant data structures to match.  */
  
@@ -696,8 +737,8 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,

if (single_use_p
  && useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (def_rhs)))
{
- gimple_assign_set_rhs1 (use_stmt, unshare_expr (def_rhs));
- gimple_assign_set_rhs_code (use_stmt, TREE_CODE (def_rhs));
+ rewrite_assign_addr (use_stmt_gsi, def_rhs);
+ gcc_assert (gsi_stmt (*use_stmt_gsi) == use_stmt);
  return true;
}
  
@@ -741,14 +782,7 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,

if (forward_propagate_addr_expr (lhs, new_def_rhs, single_use_p))
return true;
  
-  if (useless_type_conversion_p (TREE_TYPE (lhs),

-TREE_TYPE (new_def_rhs)))
-   gimple_assign_set_rhs_with_ops (use_stmt_gsi, TREE_CODE (new_def_rhs),
-   new_def_rhs);
-  else if (is_gimple_min_invariant (new_def_rhs))
-   gimple_assign_set_rhs_with_ops (use_stmt_gsi, NOP_EXPR, new_def_rhs);
-  else
-

Re: [r12-4725 Regression] FAIL: libgomp.c/doacross-1.c (test for excess errors) on Linux/x86_64

2021-10-27 Thread Martin Sebor via Gcc-patches

On 10/27/21 9:48 AM, Tobias Burnus wrote:

On 27.10.21 17:36, Martin Sebor via Gcc-patches wrote:


On 10/27/21 7:30 AM, Jakub Jelinek wrote:

On Tue, Oct 26, 2021 at 10:22:19PM -0700, sunil.k.pandey via
Gcc-patches wrote:

FAIL: libgomp.c/doacross-1.c (test for excess errors)


I don't see this failure in my logs (or the other one) or any
evidence of the libhomp tests having run.  Does the libgomp
test suite need something special to enable?


I don't know whether it can be disabled - but I bet you have build
libgomp. Thus:

Did you run "make check" in the main build directory or in $(BUILD)/gcc?
– only the former runs it.


Thanks.  I figured out why I didn't see it.  I was looking at
the wrong log file, one from testing just the one patch for
the atomic built-ins, rather than the one for all three that
I pushed yesterday (including the one to make a greater use
of the ranger).  The warning only shows with all of them
applied.

Martin



You can run it directly (from the main $(BUILD) dir) as "make
check-target-libgomp" – or just got to $(BUILD)/*/libgomp/ and run "make
check" there. – In the latter directory, you an also use RUNTESTFLAGS=
to run only a specific test.

(The * above is the target triplet; here, it is x86_64-pc-linux-gnu.)

Besides libgomp, there are some other libraries with testsuites outside
gcc/testsuite, like libstdc++-v3/testsuite or libatomic/testsuite or ...

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955




Re: [PATCH] middle-end/57245 - honor -frounding-math in real truncation

2021-10-27 Thread Richard Biener via Gcc-patches
On October 27, 2021 4:44:53 PM GMT+02:00, Jakub Jelinek  
wrote:
>On Wed, Oct 27, 2021 at 04:29:38PM +0200, Richard Biener wrote:
>> So something like the following below?  Note I have to fix 
>> simplify_const_unary_operation to not perform the invalid constant
>> folding with (not worrying about the exact conversion case - I doubt
>> any of the constant folding is really relevant on RTL these days,
>> maybe we should simply punt for all unary float-float ops when either
>> mode has sign dependent rounding modes)
>> 
>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>> index bbbd6b74942..9522a31570e 100644
>> --- a/gcc/simplify-rtx.c
>> +++ b/gcc/simplify-rtx.c
>> @@ -2068,6 +2073,9 @@ simplify_const_unary_operation (enum rtx_code code, 
>> machine_mode mode,
>>  and the operand is a signaling NaN.  */
>>   if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
>> return NULL_RTX;
>> + /* Or if flag_rounding_math is on.  */
>> + if (HONOR_SIGN_DEPENDENT_ROUNDING (mode))
>> +   return NULL_RTX;
>>   d = real_value_truncate (mode, d);
>>   break;
>
>Won't this stop folding of truncations that are never a problem?
>I mean at least if the wider float mode constant is exactly representable
>in the narrower float mode, no matter what rounding mode is used the value
>will be always the same...
>And people use
>  float f = 1.0;
>or
>  float f = 1.25;
>etc. a lot.

Yes, but I do expect any such opportunities to be realized on GENERIC/GIMPLE? 

>So perhaps again
>   if (HONOR_SIGN_DEPENDENT_ROUNDING (mode)
>   && !exact_real_truncate (mode, &d))
> return NULL_RTX;
>?

Sure, for this case it's short and straight forward. 

>
>> /* PR57245 */
>> /* { dg-do run } */
>> /* { dg-require-effective-target fenv } */
>> /* { dg-additional-options "-frounding-math" } */
>> 
>> #include 
>> #include 
>> 
>> int
>> main ()
>> {
>
>Roughly yes.  Some tests also do #ifdef FE_*, so in your case
>> #if __DBL_MANT_DIG__ == 53 && __FLT_MANT_DIG__ == 24
>+#ifdef FE_UPWARD

Ah, OK. Will fix. 

Richard. 

>>   fesetround (FE_UPWARD);
>>   float f = 1.3;
>>   if (f != 0x1.4ep+0f)
>> __builtin_abort ();
>+#endif
>+#ifdef FE_TONEAREST
>etc.
>>   fesetround (FE_TONEAREST);
>>   /* Use different actual values so the bogus CSE we perform does not
>>  break things.  */
>>   f = 1.33;
>>   if (f != 0x1.547ae2p+0f)
>> abort ();
>>   fesetround (FE_DOWNWARD);
>>   f = 1.333;
>>   if (f != 0x1.553f7cp+0f)
>> abort ();
>>   fesetround (FE_TOWARDZERO);
>>   f = 1.;
>>   if (f != 0x1.555326p+0f)
>> abort ();
>> #endif
>>   return 0;
>> }
>
>   Jakub
>



Re: [r12-4725 Regression] FAIL: libgomp.c/doacross-1.c (test for excess errors) on Linux/x86_64

2021-10-27 Thread Tobias Burnus

On 27.10.21 17:36, Martin Sebor via Gcc-patches wrote:


On 10/27/21 7:30 AM, Jakub Jelinek wrote:

On Tue, Oct 26, 2021 at 10:22:19PM -0700, sunil.k.pandey via
Gcc-patches wrote:

FAIL: libgomp.c/doacross-1.c (test for excess errors)


I don't see this failure in my logs (or the other one) or any
evidence of the libhomp tests having run.  Does the libgomp
test suite need something special to enable?


I don't know whether it can be disabled - but I bet you have build
libgomp. Thus:

Did you run "make check" in the main build directory or in $(BUILD)/gcc?
– only the former runs it.

You can run it directly (from the main $(BUILD) dir) as "make
check-target-libgomp" – or just got to $(BUILD)/*/libgomp/ and run "make
check" there. – In the latter directory, you an also use RUNTESTFLAGS=
to run only a specific test.

(The * above is the target triplet; here, it is x86_64-pc-linux-gnu.)

Besides libgomp, there are some other libraries with testsuites outside
gcc/testsuite, like libstdc++-v3/testsuite or libatomic/testsuite or ...

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-10-27 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, October 26, 2021 3:46 PM
> To: Tamar Christina 
> Cc: Tamar Christina via Gcc-patches ; Richard
> Earnshaw ; nd ; Marcus
> Shawcroft 
> Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants
> and operations
> 
> Tamar Christina  writes:
> > Hi,
> >
> > Following the discussion below here's a revised patch.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Looks good functionally, just got some comments about the implementation.
> 
> > @@ -14006,8 +14007,52 @@ cost_plus:
> >  mode, MULT, 1, speed);
> >return true;
> >  }
> > +   break;
> > +case CONST_VECTOR:
> > +   {
> > + rtx gen_insn = aarch64_simd_make_constant (x, true);
> > + /* Not a valid const vector.  */
> > + if (!gen_insn)
> > +   break;
> >
> > -  /* Fall through.  */
> > + switch (GET_CODE (gen_insn))
> > + {
> > + case CONST_VECTOR:
> > +   /* Load using MOVI/MVNI.  */
> > +   if (aarch64_simd_valid_immediate (x, NULL))
> > + *cost += extra_cost->vect.movi;
> > +   else /* Load using constant pool.  */
> > + *cost += extra_cost->ldst.load;
> > +   break;
> > + /* Load using a DUP.  */
> > + case VEC_DUPLICATE:
> > +   gcc_unreachable ();
> > +   break;
> > + default:
> > +   *cost += extra_cost->ldst.load;
> > +   break;
> > + }
> > + return true;
> > +   }
> 
> This might be a problem (if it is a problem) with some of the existing cases
> too, but: is using += rather than = the right behaviour here?
> It maens that we add our cost on top of whatever the target-independent
> rtx_costs thought was a good default choice, whereas it looks like these table
> entries specify the correct full cost.
> 
> If it's not clear-cut, then I think using = would be better.

Switched to =

> 
> Also, going back to an earlier part of the thread, I think the “inner”
> CONST_VECTOR case is now a correct replacement for the “outer”
> CONST_VECTOR case, meaning we don't need the
> aarch64_simd_make_constant bits.  I.e. I think we can make the top-level
> case:
> 
> case CONST_VECTOR:
>   /* Load using MOVI/MVNI.  */
>   if (aarch64_simd_valid_immediate (x, NULL))
> *cost = extra_cost->vect.movi;
>   else /* Load using constant pool.  */
> *cost = extra_cost->ldst.load;
>   break;
> 
> > +case VEC_CONCAT:
> > +   /* depending on the operation, either DUP or INS.
> > +  For now, keep default costing.  */
> > +   break;
> > +case VEC_DUPLICATE:
> > +   *cost += extra_cost->vect.dup;
> > +   return true;
> 
> For this I think we should do:
> 
>   *cost = extra_cost->vect.dup;
>   return false;
> 
> so that we cost the operand of the vec_duplicate as well.
> This will have no effect if the operand is a REG, but would affect more
> complex expressions.
> 

Unfortunately returning false here had a negative effect on SVE, where the RTL 
for
Something some instructions have a complex vec_duplicate.

As an example

(note 11 8 12 2 NOTE_INSN_DELETED)
(zero_extend:DI (unspec:SI [
  
(const_int 0 [0])   
  
(const_int 2 [0x2]) 
  
(const_int 1 [0x1]) 
  
] UNSPEC_SVE_CNT_PAT))) "cntd_pat.c":10:153 8829 
{aarch64_sve_cnt_pat}
 (nil)) 
  

No longer gets pushed into a plus operator by the combiner due the costing

rejecting combination of insns 11, 12 and 13 
original costs 4 + 8 + 8 = 20
replacement cost 24  

vs what it was originally

allowing combination of insns 11, 12 and 13
original costs 4 + 4 + 8 = 16
replacement cost 12

which happens because the costing for original costs don't take into effect 
that the instruction
that semantically handles this operation doesn't actually do any of this.

So now I have left it as true and added code for costing the VEC_SELECT of 0, 
which can happen if
Lowpart_subreg fails.

Ps. Can you also take a look at [PATCH 1/2][GCC][middle-end] Teach CSE to be 
able to do vector extracts.
I believe since you had a comment last on it no other reviewer will look at it. 
☹

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/aarch-common-protos.h (struct vector_cost_table): Add
movi, dup and extract costing fields.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs,
thunderx_extra_costs, thunderx2t99_extra_costs,
thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs): Use
them.
* config/arm/aa

Re: [r12-4725 Regression] FAIL: libgomp.c/doacross-1.c (test for excess errors) on Linux/x86_64

2021-10-27 Thread Martin Sebor via Gcc-patches

On 10/27/21 7:30 AM, Jakub Jelinek wrote:

On Tue, Oct 26, 2021 at 10:22:19PM -0700, sunil.k.pandey via Gcc-patches wrote:

FAIL: libgomp.c/doacross-1.c (test for excess errors)


I don't see this failure in my logs (or the other one) or any
evidence of the libhomp tests having run.  Does the libgomp
test suite need something special to enable?



At least this one is a clear false positive.
int a[256];
...
 #pragma omp for schedule(static, 1) ordered (1) nowait
 for (i = 0; i < 256; i++)
   {
 #pragma omp atomic write
 a[i] = 1;
 #pragma omp ordered depend(sink: i - 1)
 if (i)
   {
 #pragma omp atomic read
 l = a[i - 1];  // < Here is the false positive 
warning: '__atomic_load_4' writing 4 bytes into a region of size 0 overflows the 
destination [-Wstring-overflow=]
// note: at offset [-8589934592, -8] 
into destination object ‘a’ of size 1024
 if (l < 2)
   abort ();
   }
The loop iterates i from 0 to 255 and the if body is guarded with i != 0,
so __atomic_load_4 (&a[i - 1].
Due to the doacross loop vrp doesn't know that the loop iterates from 0 to
256, because different threads are given just some subset of that interval,
so it is effectively VARYING.


The warning is in the IL below:

   [local count: 30]:
  _865 = ivtmp.273_871 + 4294967294;
  _923 = (int) _865;
  _308 = (sizetype) _923;
  _707 = _308 * 4;
  _924 = &a + _707;
  _926 = __atomic_load_4 (_924, 0);

The code calls range_of_expr (vr, val, stmt) where val is _707
and stmt is the assignment _924 = &a + _707.  The result is
the VR_RANGE [-8589934592, -8].  The code is in get_range() in
tree-ssa-strlen.c of all places.  The warning uses the range
as is, treating it as signed.  The debug_ranger() output for
the block is below.  Am I missing something here?

=== BB 167 
Imports: _926
Exports: _926  l.0_927
 l.0_927 : _926(I)
_243int VARYING
ivtmp.272_874   unsigned int [2147483648, +INF]
Relational : (_865 != ivtmp.273_871)
 [local count: 30]:
_865 = ivtmp.273_871 + 4294967294;
_923 = (int) _865;
_308 = (sizetype) _923;
_707 = _308 * 4;
_924 = &a + _707;
_926 = __atomic_load_4 (_924, 0);
l.0_927 = (int) _926;
if (l.0_927 <= 1)
  goto ; [0.00%]
else
  goto ; [100.00%]

_308 : sizetype [18446744071562067968, 18446744073709551614]
_707 : sizetype [18446744065119617024, 18446744073709551608]
_923 : int [-INF, -2]
_924 : int * [1B, +INF]
167->13  (T) _926 :  unsigned int [0, 1][2147483648, +INF]
167->13  (T) l.0_927 :   int [-INF, 1]
167->166  (F) _926 : unsigned int [2, 2147483647]
167->166  (F) l.0_927 :  int [2, +INF]

Martin


Perhaps it derives some quite useless range
from the i - 1 or i + 1 expressions on signed integer, but that doesn't mean
the warnings should assume the value is likely to be out of bounds.
And there is no warning on the a[i] either (which is also in bounds, but
if for the atomic load the warning code thinks i - 1 can be in
[-8589934592, -8] range, why doesn't it think that i can be in
[-8589934588, -4] range?

Jakub





Re: [PATCH] middle-end/57245 - honor -frounding-math in real truncation

2021-10-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 27, 2021 at 04:29:38PM +0200, Richard Biener wrote:
> So something like the following below?  Note I have to fix 
> simplify_const_unary_operation to not perform the invalid constant
> folding with (not worrying about the exact conversion case - I doubt
> any of the constant folding is really relevant on RTL these days,
> maybe we should simply punt for all unary float-float ops when either
> mode has sign dependent rounding modes)
> 
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index bbbd6b74942..9522a31570e 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -2068,6 +2073,9 @@ simplify_const_unary_operation (enum rtx_code code, 
> machine_mode mode,
>  and the operand is a signaling NaN.  */
>   if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
> return NULL_RTX;
> + /* Or if flag_rounding_math is on.  */
> + if (HONOR_SIGN_DEPENDENT_ROUNDING (mode))
> +   return NULL_RTX;
>   d = real_value_truncate (mode, d);
>   break;

Won't this stop folding of truncations that are never a problem?
I mean at least if the wider float mode constant is exactly representable
in the narrower float mode, no matter what rounding mode is used the value
will be always the same...
And people use
  float f = 1.0;
or
  float f = 1.25;
etc. a lot.
So perhaps again
if (HONOR_SIGN_DEPENDENT_ROUNDING (mode)
&& !exact_real_truncate (mode, &d))
  return NULL_RTX;
?

> /* PR57245 */
> /* { dg-do run } */
> /* { dg-require-effective-target fenv } */
> /* { dg-additional-options "-frounding-math" } */
> 
> #include 
> #include 
> 
> int
> main ()
> {

Roughly yes.  Some tests also do #ifdef FE_*, so in your case
> #if __DBL_MANT_DIG__ == 53 && __FLT_MANT_DIG__ == 24
+#ifdef FE_UPWARD
>   fesetround (FE_UPWARD);
>   float f = 1.3;
>   if (f != 0x1.4ep+0f)
> __builtin_abort ();
+#endif
+#ifdef FE_TONEAREST
etc.
>   fesetround (FE_TONEAREST);
>   /* Use different actual values so the bogus CSE we perform does not
>  break things.  */
>   f = 1.33;
>   if (f != 0x1.547ae2p+0f)
> abort ();
>   fesetround (FE_DOWNWARD);
>   f = 1.333;
>   if (f != 0x1.553f7cp+0f)
> abort ();
>   fesetround (FE_TOWARDZERO);
>   f = 1.;
>   if (f != 0x1.555326p+0f)
> abort ();
> #endif
>   return 0;
> }

Jakub



Re: [PATCH] middle-end/57245 - honor -frounding-math in real truncation

2021-10-27 Thread Richard Biener via Gcc-patches
On Wed, 27 Oct 2021, Jakub Jelinek wrote:

> On Wed, Oct 27, 2021 at 03:20:29PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > The following honors -frounding-math when converting a FP constant
> > to another FP type.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> > 
> > I wonder what a good way to test this in a portable way, the bugreport
> > unfortunately didn't contain something executable and I don't see
> > much -frounding-math test coverage to copy from.
> 
> E.g. following tests call fesetround, use fenv effective target etc.:
> torture/fp-int-convert-float128-timode-3.c:  fesetround (FE_TOWARDZERO);
> torture/fp-int-convert-timode-2.c:  fesetround (FE_DOWNWARD);
> torture/fp-int-convert-timode-3.c:  fesetround (FE_UPWARD);
> torture/fp-int-convert-timode-4.c:  fesetround (FE_TOWARDZERO);
> 
> And the test can just hardcode one or more common float/double etc.
> configurations, checked using
> __{FLT,DBL}_{DIG,MANT_DIG,RADIX,MIN_EXP,MAX_EXP}__ etc. macros.
> Say just test double to float conversions of some specific values assuming
> float is IEEE754 single precicion and double is IEEE754 double precision
> in all the 4 rounding modes.

So something like the following below?  Note I have to fix 
simplify_const_unary_operation to not perform the invalid constant
folding with (not worrying about the exact conversion case - I doubt
any of the constant folding is really relevant on RTL these days,
maybe we should simply punt for all unary float-float ops when either
mode has sign dependent rounding modes)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index bbbd6b74942..9522a31570e 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2068,6 +2073,9 @@ simplify_const_unary_operation (enum rtx_code code, 
machine_mode mode,
 and the operand is a signaling NaN.  */
  if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
return NULL_RTX;
+ /* Or if flag_rounding_math is on.  */
+ if (HONOR_SIGN_DEPENDENT_ROUNDING (mode))
+   return NULL_RTX;
  d = real_value_truncate (mode, d);
  break;
case FLOAT_EXTEND:



/* PR57245 */
/* { dg-do run } */
/* { dg-require-effective-target fenv } */
/* { dg-additional-options "-frounding-math" } */

#include 
#include 

int
main ()
{
#if __DBL_MANT_DIG__ == 53 && __FLT_MANT_DIG__ == 24
  fesetround (FE_UPWARD);
  float f = 1.3;
  if (f != 0x1.4ep+0f)
__builtin_abort ();
  fesetround (FE_TONEAREST);
  /* Use different actual values so the bogus CSE we perform does not
 break things.  */
  f = 1.33;
  if (f != 0x1.547ae2p+0f)
abort ();
  fesetround (FE_DOWNWARD);
  f = 1.333;
  if (f != 0x1.553f7cp+0f)
abort ();
  fesetround (FE_TOWARDZERO);
  f = 1.;
  if (f != 0x1.555326p+0f)
abort ();
#endif
  return 0;
}


> > 2021-10-27  Richard Biener  
> > 
> > PR middle-end/57245
> > * fold-const.c (fold_convert_const_real_from_real): Honor
> > -frounding-math if the conversion is not exact.
> > ---
> >  gcc/fold-const.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> > index ff23f12f33c..c7aebf9cc7e 100644
> > --- a/gcc/fold-const.c
> > +++ b/gcc/fold-const.c
> > @@ -2139,6 +2139,12 @@ fold_convert_const_real_from_real (tree type, 
> > const_tree arg1)
> >&& REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1)))
> >  return NULL_TREE; 
> >  
> > +  /* With flag_rounding_math we shuld respect the current rounding mode
> 
> s/shuld/should/
> 
> > + unless the conversion is exact.  */
> > +  if (HONOR_SIGN_DEPENDENT_ROUNDING (arg1)
> > +  && !exact_real_truncate (TYPE_MODE (type), &TREE_REAL_CST (arg1)))
> > +return NULL_TREE;
> > +
> >real_convert (&value, TYPE_MODE (type), &TREE_REAL_CST (arg1));
> >t = build_real (type, value);
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] middle-end/57245 - honor -frounding-math in real truncation

2021-10-27 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 27, 2021 at 03:20:29PM +0200, Richard Biener via Gcc-patches wrote:
> The following honors -frounding-math when converting a FP constant
> to another FP type.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> I wonder what a good way to test this in a portable way, the bugreport
> unfortunately didn't contain something executable and I don't see
> much -frounding-math test coverage to copy from.

E.g. following tests call fesetround, use fenv effective target etc.:
torture/fp-int-convert-float128-timode-3.c:  fesetround (FE_TOWARDZERO);
torture/fp-int-convert-timode-2.c:  fesetround (FE_DOWNWARD);
torture/fp-int-convert-timode-3.c:  fesetround (FE_UPWARD);
torture/fp-int-convert-timode-4.c:  fesetround (FE_TOWARDZERO);

And the test can just hardcode one or more common float/double etc.
configurations, checked using
__{FLT,DBL}_{DIG,MANT_DIG,RADIX,MIN_EXP,MAX_EXP}__ etc. macros.
Say just test double to float conversions of some specific values assuming
float is IEEE754 single precicion and double is IEEE754 double precision
in all the 4 rounding modes.

> 2021-10-27  Richard Biener  
> 
>   PR middle-end/57245
>   * fold-const.c (fold_convert_const_real_from_real): Honor
>   -frounding-math if the conversion is not exact.
> ---
>  gcc/fold-const.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index ff23f12f33c..c7aebf9cc7e 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -2139,6 +2139,12 @@ fold_convert_const_real_from_real (tree type, 
> const_tree arg1)
>&& REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1)))
>  return NULL_TREE; 
>  
> +  /* With flag_rounding_math we shuld respect the current rounding mode

s/shuld/should/

> + unless the conversion is exact.  */
> +  if (HONOR_SIGN_DEPENDENT_ROUNDING (arg1)
> +  && !exact_real_truncate (TYPE_MODE (type), &TREE_REAL_CST (arg1)))
> +return NULL_TREE;
> +
>real_convert (&value, TYPE_MODE (type), &TREE_REAL_CST (arg1));
>t = build_real (type, value);

Jakub



Re: [r12-4725 Regression] FAIL: libgomp.c/doacross-1.c (test for excess errors) on Linux/x86_64

2021-10-27 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 10:22:19PM -0700, sunil.k.pandey via Gcc-patches wrote:
> FAIL: libgomp.c/doacross-1.c (test for excess errors)

At least this one is a clear false positive.
int a[256];
...
#pragma omp for schedule(static, 1) ordered (1) nowait
for (i = 0; i < 256; i++)
  {
#pragma omp atomic write
a[i] = 1;
#pragma omp ordered depend(sink: i - 1)
if (i)
  {
#pragma omp atomic read
l = a[i - 1];   // < Here is the false positive 
warning: '__atomic_load_4' writing 4 bytes into a region of size 0 overflows 
the destination [-Wstring-overflow=]
// note: at offset [-8589934592, -8] 
into destination object ‘a’ of size 1024
if (l < 2)
  abort ();
  }
The loop iterates i from 0 to 255 and the if body is guarded with i != 0,
so __atomic_load_4 (&a[i - 1].
Due to the doacross loop vrp doesn't know that the loop iterates from 0 to
256, because different threads are given just some subset of that interval,
so it is effectively VARYING.  Perhaps it derives some quite useless range
from the i - 1 or i + 1 expressions on signed integer, but that doesn't mean
the warnings should assume the value is likely to be out of bounds.
And there is no warning on the a[i] either (which is also in bounds, but
if for the atomic load the warning code thinks i - 1 can be in
[-8589934592, -8] range, why doesn't it think that i can be in
[-8589934588, -4] range?

Jakub



Re: [PATCH] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]

2021-10-27 Thread David Edelsohn via Gcc-patches
On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo  wrote:
>
> If the second operand of __builtin_shuffle is const vector 0, and with
> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
> patterns match and emit for VSX xxpermdi.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr102868.c: New test.
> ---
>  gcc/config/rs6000/rs6000.c  | 47 --
>  gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +
>  2 files changed, 97 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index d0730253bcc..5d802c1fa96 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
>  {OPTION_MASK_P8_VECTOR,
>   BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
>   : CODE_FOR_p8_vmrgew_v4sf_direct,
> - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
> + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
> +{OPTION_MASK_VSX,
> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
> +{OPTION_MASK_VSX,
> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
> +{OPTION_MASK_VSX,
> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
> +{OPTION_MASK_VSX,
> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};

If the insn_code is the same for big endian and little endian, why
does the new code test BYTES_BIG_ENDIAN to set the same value
(CODE_FOR_vsx_xxpermdi_v16qi)?

Thanks, David

>
>unsigned int i, j, elt, which;
>unsigned char perm[16];
> @@ -23169,6 +23185,27 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
>   machine_mode omode = insn_data[icode].operand[0].mode;
>   machine_mode imode = insn_data[icode].operand[1].mode;
>
> + rtx perm_idx = GEN_INT (0);
> + if (icode == CODE_FOR_vsx_xxpermdi_v16qi)
> +   {
> + int perm_val = 0;
> + if (one_vec)
> +   {
> + if (perm[0] == 8)
> +   perm_val |= 2;
> + if (perm[8] == 8)
> +   perm_val |= 1;
> +   }
> + else
> +   {
> + if (perm[0] != 0)
> +   perm_val |= 2;
> + if (perm[8] != 16)
> +   perm_val |= 1;
> +   }
> + perm_idx = GEN_INT (perm_val);
> +   }
> +
>   /* For little-endian, don't use vpkuwum and vpkuhum if the
>  underlying vector type is not V4SI and V8HI, respectively.
>  For example, using vpkuwum with a V8HI picks up the even
> @@ -23192,7 +23229,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
>/* For little-endian, the two input operands must be swapped
>   (or swapped back) to ensure proper right-to-left numbering
>   from 0 to 2N-1.  */
> - if (swapped ^ !BYTES_BIG_ENDIAN)
> + if (swapped ^ !BYTES_BIG_ENDIAN
> + && icode != CODE_FOR_vsx_xxpermdi_v16qi)
> std::swap (op0, op1);
>   if (imode != V16QImode)
> {
> @@ -23203,7 +23241,10 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
> x = target;
>   else
> x = gen_reg_rtx (omode);
> - emit_insn (GEN_FCN (icode) (x, op0, op1));
> + if (icode == CODE_FOR_vsx_xxpermdi_v16qi)
> +   emit_insn (GEN_FCN (icode) (x, op0, op1, perm_idx));
> + else
> +   emit_insn (GEN_FCN (icode) (x, op0, op1));
>   if (omode != V16QImode)
> emit_move_insn (target, gen_lowpart (V16QImode, x));
>   return true;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102868.c 
> b/gcc/testsuite/gcc.target/powerpc/pr102868.c
> new file mode 100644
> index 000..eb45d193f66
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102868.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#include 
> +vector float b = {0.0f, 0.0f, 0.0f, 0.0f};
> +
> +
> +vector float foo1 (vector float x)
> +{
> +  vector int c = {0, 1, 

Re: [PATCH 4/4] ipa-cp: Select saner profile count to base heuristics on

2021-10-27 Thread Martin Jambor
Hi,

On Mon, Oct 18 2021, Martin Jambor wrote:
>
[...]
>
>
> This is a follow-up small patch to address Honza's review of my
> previous patch to select saner profile count to base heuristics on.
> Currently the IPA-CP heuristics switch to PGO-mode only if there are
> PGO counters available for any part of the call graph.  This change
> makes it to switch to the PGO mode only if any of the incoming edges
> bringing in the constant in question had any ipa-quality counts on
> them.  Consequently, if a part of the program is built with
> -fprofile-use and another part without, IPA-CP will use
> estimated-frequency-based heuristics for the latter.
>
> I still wonder whether this should only happen with
> flag_profile_partial_training on.  It seems like we're behaving as if
> it was always on.
>

Honza approved this patch in a private conversation and so I have pushed
it to master as commit ab810952eb7c061e37054ddd1dfe0aa033365131.

Thanks,

Martin



[PATCH] middle-end/57245 - honor -frounding-math in real truncation

2021-10-27 Thread Richard Biener via Gcc-patches
The following honors -frounding-math when converting a FP constant
to another FP type.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

I wonder what a good way to test this in a portable way, the bugreport
unfortunately didn't contain something executable and I don't see
much -frounding-math test coverage to copy from.

Thanks,
Richard.

2021-10-27  Richard Biener  

PR middle-end/57245
* fold-const.c (fold_convert_const_real_from_real): Honor
-frounding-math if the conversion is not exact.
---
 gcc/fold-const.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ff23f12f33c..c7aebf9cc7e 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2139,6 +2139,12 @@ fold_convert_const_real_from_real (tree type, const_tree 
arg1)
   && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1)))
 return NULL_TREE; 
 
+  /* With flag_rounding_math we shuld respect the current rounding mode
+ unless the conversion is exact.  */
+  if (HONOR_SIGN_DEPENDENT_ROUNDING (arg1)
+  && !exact_real_truncate (TYPE_MODE (type), &TREE_REAL_CST (arg1)))
+return NULL_TREE;
+
   real_convert (&value, TYPE_MODE (type), &TREE_REAL_CST (arg1));
   t = build_real (type, value);
 
-- 
2.31.1


Re: [PATCH 4/4] ipa-cp: Select saner profile count to base heuristics on

2021-10-27 Thread Martin Jambor
Hi,

On Mon, Aug 23 2021, Martin Jambor wrote:
> When profile feedback is available, IPA-CP takes the count of the
> hottest node and then evaluates all call contexts relative to it.
> This means that typically almost no clones for specialized contexts
> are ever created because the maximum is some special function, called
> from everywhere (that is likely to get inlined anyway) and all the
> examined edges look cold compared to it.
>
> This patch changes the selection.  It simply sorts counts of all edges
> eligible for cloning in a vector and then picks the count in 90th
> percentile (the actual number is configurable via a parameter).
>
> I also tried more complex approaches which were summing the counts and
> picking the edge which together with all hotter edges accounted for a
> given portion of the total sum of all edge counts.  But first it was
> not apparently clear to me that they make more logical sense that the
> simple method and practically I always also had to ignore a few
> percent of the hottest edges with really extreme counts (looking at
> bash and python).  And when I had to do that anyway, it seemed simpler
> to just "ignore" more and take the first non-ignored count as the
> base.
>
> Nevertheless, if people think some more sophisticated method should be
> used anyway, I am willing to be persuaded.  But this patch is a clear
> improvement over the current situation.
>
> gcc/ChangeLog:
>
> 2021-08-23  Martin Jambor  
>
>   * params.opt (param_ipa_cp_profile_count_base): New parameter.
>   * ipa-cp.c (max_count): Replace with base_count, replace all
>   occurrences too, unless otherwise stated.
>   (ipcp_cloning_candidate_p): identify mostly-directly called
>   functions based on their counts, not max_count.
>   (compare_edge_profile_counts): New function.
>   (ipcp_propagate_stage): Instead of setting max_count, find the
>   appropriate edge count in a sorted vector of counts of eligible
>   edges and make it the base_count.

Honza approved this patch in a private conversation but then I noticed I
forgot to add an entry for the new parameter into invoke.texi, so I
fixed that problem (and checked the result with make info and make pdf)
and pushed the patch to master as commit
ab1008255e37b5b51a433ed69e04c06300543799.

Thanks,

Martin


Re: Ping^3: [PATCH v2 0/2] Fix vec_sel code generation and merge xxsel to vsel

2021-10-27 Thread David Edelsohn via Gcc-patches
This patch series is okay.

Thanks, David

On Thu, Oct 21, 2021 at 11:25 PM Xionghu Luo  wrote:
>
> Ping^3, thanks.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579637.html
>
>
> On 2021/10/15 14:28, Xionghu Luo via Gcc-patches wrote:
> > Ping^2, thanks.
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579637.html
> >
> >
> > On 2021/10/8 09:17, Xionghu Luo via Gcc-patches wrote:
> >> Ping, thanks.
> >>
> >>
> >> On 2021/9/17 13:25, Xionghu Luo wrote:
> >>> These two patches are updated version from:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579490.html
> >>>
> >>> Changes:
> >>> 1. Fix alignment error in md files.
> >>> 2. Replace rtx_equal_p with match_dup.
> >>> 3. Use register_operand instead of gpc_reg_operand to align with
> >>>vperm/xxperm.
> >>> 4. Regression tested pass on P8LE.
> >>>
> >>> Xionghu Luo (2):
> >>>   rs6000: Fix wrong code generation for vec_sel [PR94613]
> >>>   rs6000: Fold xxsel to vsel since they have same semantics
> >>>
> >>>  gcc/config/rs6000/altivec.md  | 84 ++-
> >>>  gcc/config/rs6000/rs6000-call.c   | 62 ++
> >>>  gcc/config/rs6000/rs6000.c| 19 ++---
> >>>  gcc/config/rs6000/vector.md   | 26 +++---
> >>>  gcc/config/rs6000/vsx.md  | 25 --
> >>>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
> >>>  gcc/testsuite/gcc.target/powerpc/pr94613.c| 47 +++
> >>>  7 files changed, 193 insertions(+), 72 deletions(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr94613.c
> >>>
> >>
> >
>
> --
> Thanks,
> Xionghu


Re: [PATCH 3/4] ipa-cp: Fix updating of profile counts and self-gen value evaluation

2021-10-27 Thread Martin Jambor
On Mon, Oct 18 2021, Martin Jambor wrote:
>
[...]
>
> IPA-CP does not do a reasonable job when it is updating profile counts
> after it has created clones of recursive functions.  This patch
> addresses that by:
>
> 1. Only updating counts for special-context clones.  When a clone is
> created for all contexts, the original is going to be dead and the
> cgraph machinery has copied counts to the new node which is the right
> thing to do.  Therefore updating counts has been moved from
> create_specialized_node to decide_about_value and
> decide_whether_version_node.
>
> 2. The current profile updating code artificially increased the assumed
> old count when the sum of counts of incoming edges to both the
> original and new node were bigger than the count of the original
> node.  This always happened when self-recursive edge from the clone
> was also redirected to the clone because both the original edge and
> its clone had original high counts.  This clutch was removed and
> replaced by the next point.
>
> 3. When cloning also redirects a self-recursive clone to the clone
> itself, new logic has been added to divide the counts brought by such
> recursive edges between the original node and the clone.  This is
> impossible to do well without special knowledge about the function and
> which non-recursive entry calls are responsible for what portion of
> recursion depth, so the approach taken is rather crude.
>
> For local nodes, we detect the case when the original node is never
> called (in the training run at least) with another value and if so,
> steal all its counts like if it was dead.  If that is not the case, we
> try to divide the count brought by recursive edges (or rather not
> brought by direct edges) proportionally to the counts brought by
> non-recursive edges - but with artificial limits in place so that we
> do not take too many or too few, because that was happening with
> detrimental effect in mcf_r.
>
> 4. When cloning creates extra clones for values brought by a formerly
> self-recursive edge with an arithmetic pass-through jump function on
> it, such as it does in exchange2_r, all such clones are processed at
> once rather than one after another.  The counts of all such nodes are
> distributed evenly (modulo even-formerly-non-recursive-edges) and the
> whole situation is then fixed up so that the edge counts fit.  This is
> what new function update_counts_for_self_gen_clones does.
>
> 5. When values brought by a formerly self-recursive edge with an
> arithmetic pass-through jump function on it are evaluated by
> heuristics which assumes vast majority of node counts are result of
> recursive calls and so we simply divide those with the number of
> clones there would be if we created another one.
>
> 6. The mechanisms in init_caller_stats and gather_caller_stats and
> get_info_about_necessary_edges was enhanced to gather data required
> for the above and a missing check not to count dead incoming edges was
> also added.
>
> gcc/ChangeLog:
>
> 2021-10-15  Martin Jambor  
>
>   * ipa-cp.c (struct caller_statistics): New fields rec_count_sum,
>   n_nonrec_calls and itself, document all fields.
>   (init_caller_stats): Initialize the above new fields.
>   (gather_caller_stats): Gather self-recursive counts and calls number.
>   (get_info_about_necessary_edges): Gather counts of self-recursive and
>   other edges bringing in the requested value separately.
>   (dump_profile_updates): Rework to dump info about a single node only.
>   (lenient_count_portion_handling): New function.
>   (struct gather_other_count_struct): New type.
>   (gather_count_of_non_rec_edges): New function.
>   (struct desc_incoming_count_struct): New type.
>   (analyze_clone_icoming_counts): New function.
>   (adjust_clone_incoming_counts): Likewise.
>   (update_counts_for_self_gen_clones): Likewise.
>   (update_profiling_info): Rewritten.
>   (update_specialized_profile): Adjust call to dump_profile_updates.
>   (create_specialized_node): Do not update profiling info.
>   (decide_about_value): New parameter self_gen_clones, either push new
>   clones into it or updat their profile counts.  For self-recursively
>   generated values, use a portion of the node count instead of count
>   from self-recursive edges to estimate goodness.
>   (decide_whether_version_node): Gather clones for self-generated values
>   in a new vector, update their profiles at once at the end.


Honza approved the patch in a private conversation and I have pushed it
to master as commit d1e2e4f9ce4df50564f1244dcea9befc3066faa8.

Thanks,

Martin


Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread David Edelsohn via Gcc-patches
On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:
>
> Hi,
>
> As PR102767 shows, the commit r12-3482 exposed one ICE in function
> rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
> on rs6000 (See define_expand "movmisalign"), so it return true in
> rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
> the cost querying rs6000_builtin_vectorization_cost, we don't have
> the arms to handle the V1TI input under (TARGET_VSX &&
> TARGET_ALLOW_MOVMISALIGN).
>
> The proposed fix is to add the consideration for V1TI, simply make it
> as the cost for doubleword which is apparently bigger than the cost of
> scalar, won't have the vectorization to happen, just to keep consistency
> and avoid ICE.  Another thought is to not support movmisalign for V1TI,
> but it sounds like a bad idea since it doesn't match the reality.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> PR target/102767
> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Consider
> V1T1 mode for unaligned load and store.
>
> gcc/testsuite/ChangeLog:
>
> PR target/102767
> * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index b7ea1483da5..73d3e06c3fc 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
>   {
> elements = TYPE_VECTOR_SUBPARTS (vectype);
> -   if (elements == 2)
> +   /* See PR102767, consider V1TI to keep consistency.  */
> +   if (elements == 2 || elements == 1)
>   /* Double word aligned.  */
>   return 4;
>
> @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
>
>  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
>{
> -elements = TYPE_VECTOR_SUBPARTS (vectype);
> -if (elements == 2)
> -  /* Double word aligned.  */
> -  return 2;
> +   elements = TYPE_VECTOR_SUBPARTS (vectype);
> +   /* See PR102767, consider V1TI to keep consistency.  */
> +   if (elements == 2 || elements == 1)
> + /* Double word aligned.  */
> + return 2;

This section of the patch incorrectly changes the indentation.  Please
use the correct indentation.

>
>  if (elements == 4)
>{
> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> new file mode 100644
> index 000..a4122482989
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> @@ -0,0 +1,21 @@
> +! { dg-require-effective-target powerpc_vsx_ok }
> +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
> +
> +INTERFACE
> +  FUNCTION elemental_mult (a, b, c)
> +type(*), DIMENSION(..) :: a, b, c
> +  END
> +END INTERFACE
> +
> +allocatable  z
> +integer, dimension(2,2) :: a, b
> +call test_CFI_address
> +contains
> +  subroutine test_CFI_address
> +if (elemental_mult (z, x, y) .ne. 0) stop
> +a = reshape ([4,3,2,1], [2,2])
> +b = reshape ([2,3,4,5], [2,2])
> +if (elemental_mult (i, a, b) .ne. 0) stop
> +  end
> +end
> +
>

The patch is okay with the indentation correction.

Thanks, David


Re: [RFC] Don't move cold code out of loop by checking bb count

2021-10-27 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> On 2021/9/28 20:09, Richard Biener wrote:
> > On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:
> >>
> >> Update the patch to v3, not sure whether you prefer the paste style
> >> and continue to link the previous thread as Segher dislikes this...
> >>
> >>
> >> [PATCH v3] Don't move cold code out of loop by checking bb count
> >>
> >>
> >> Changes:
> >> 1. Handle max_loop in determine_max_movement instead of
> >> outermost_invariant_loop.
> >> 2. Remove unnecessary changes.
> >> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
> >> can_sm_ref_p.
> >> 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
> >> infinite loop when implementing v1 and the iteration is missed to be
> >> updated actually.
> >>
> >> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
> >> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
> >>
> >> There was a patch trying to avoid move cold block out of loop:
> >>
> >> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
> >>
> >> Richard suggested to "never hoist anything from a bb with lower execution
> >> frequency to a bb with higher one in LIM invariantness_dom_walker
> >> before_dom_children".
> >>
> >> In gimple LIM analysis, add find_coldest_out_loop to move invariants to
> >> expected target loop, if profile count of the loop bb is colder
> >> than target loop preheader, it won't be hoisted out of loop.
> >> Likely for store motion, if all locations of the REF in loop is cold,
> >> don't do store motion of it.
> >>
> >> SPEC2017 performance evaluation shows 1% performance improvement for
> >> intrate GEOMEAN and no obvious regression for others.  Especially,
> >> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
> >> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
> >> on P8LE.
> >>
> >> gcc/ChangeLog:
> >>
> >> * loop-invariant.c (find_invariants_bb): Check profile count
> >> before motion.
> >> (find_invariants_body): Add argument.
> >> * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
> >> (determine_max_movement): Use find_coldest_out_loop.
> >> (move_computations_worker): Adjust and fix iteration udpate.
> >> (execute_sm_exit): Check pointer validness.
> >> (class ref_in_loop_hot_body): New functor.
> >> (ref_in_loop_hot_body::operator): New.
> >> (can_sm_ref_p): Use for_all_locs_in_loop.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/tree-ssa/recip-3.c: Adjust.
> >> * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
> >> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> >> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
> >> ---
> >>  gcc/loop-invariant.c   | 10 ++--
> >>  gcc/tree-ssa-loop-im.c | 61 --
> >>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
> >>  7 files changed, 165 insertions(+), 8 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> >>
> >> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> >> index fca0c2b24be..5c3be7bf0eb 100644
> >> --- a/gcc/loop-invariant.c
> >> +++ b/gcc/loop-invariant.c
> >> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
> >> always_reached, bool always_executed)
> >> call.  */
> >>
> >>  static void
> >> -find_invariants_bb (basic_block bb, bool always_reached, bool 
> >> always_executed)
> >> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> >> +   bool always_executed)
> >>  {
> >>rtx_insn *insn;
> >> +  basic_block preheader = loop_preheader_edge (loop)->src;
> >> +
> >> +  if (preheader->count > bb->count)
> >> +return;
> >>
> >>FOR_BB_INSNS (bb, insn)
> >>  {
> >> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
> >> *body,
> >>unsigned i;
> >>
> >>for (i = 0; i < loop->num_nodes; i++)
> >> -find_invariants_bb (body[i],
> >> -   bitmap_bit_p (always_reached, i),
> >> +find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
> >> bitmap_bit_p (always_executed, i));
> >>  }
> >>
> >> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> >> index 4b187c2cdaf..655fab03442 100644
> >> --- a/gcc/tree-ssa-loop-im.c
> >> +++ b/gcc/tree-ssa-loop-im.c
> >> @@ -417,6 +417,28 @@ movement_possibility (gimple *stmt)
> >>return ret;
> >>  }
> >>
> >> 

[PATCH] MAINTAINERS: Clarify the policy WRT the Write After Approval list

2021-10-27 Thread Maciej W. Rozycki
* MAINTAINERS: Clarify the policy WRT the Write After Approval
list.
---
On Tue, 26 Oct 2021, Jeff Law wrote:

> > >   It seems like there's been hardly any discussion about this matter
> > > around
> > > the time this stuff was added with commit bddcac9d1c32 ("[contrib] Add
> > > contrib/maintainers-verify.sh").  What was the actual motivation behind
> > > that change?
> > That was only addition of a script and testcase to verify what has been done
> > in MAINTAINERS since forever.
> > Just look at all the commits to remove redundant entries from Write After
> > Approval, e.g.
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2003-05/msg00366.html
> > All maintainers or reviewers (global or specific) have write after approval
> > rights for areas they don't maintain.
> I went ahead and fixed Maciej's entries in the obvious way.

 Thanks.  It did not occur to me that we had such a policy in place even 
before said commit, and now that I can see it is the case it seems to me 
like it has been a recurring problem with people not being aware of it, 
and I can hardly imagine anyone running the test suite for a MAINTAINERS 
file update.

 So while I maintain my concerns about the policy itself, how about this 
change, so that at least it's written down somewhere other than mailing 
list archives only?

  Maciej
---
 MAINTAINERS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fe56b2f647e..1471f53d30b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15,6 +15,9 @@ To report problems in GCC, please visit:
 
   http://gcc.gnu.org/bugs/
 
+Note: when adding someone to a more specific section please remove any
+corresponding entry from the Write After Approval list.
+
 Maintainers
 ===
 
-- 
2.11.0


[PATCH] First refactor of vect_analyze_loop

2021-10-27 Thread Richard Biener via Gcc-patches
This refactors the main loop analysis part in vect_analyze_loop,
re-purposing the existing vect_reanalyze_as_main_loop for this
to reduce code duplication.  Failure flow is a bit tricky since
we want to extract info from the analyzed loop but I wanted to
share the destruction part.  Thus I add some std::function and
lambda to funnel post-analysis for the case we want that
(when analyzing from the main iteration but not when re-analyzing
an epilogue as main).

I realize this probably doesn't help the unroll case yet, but it
looked like an improvement.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2021-10-27  Richard Biener  

* tree-vect-loop.c: Include .
(vect_reanalyze_as_main_loop): Rename to...
(vect_analyze_loop_1): ... this and generalize to be
able to use it twice ...
(vect_analyze_loop): ... here.
---
 gcc/tree-vect-loop.c | 202 ++-
 1 file changed, 102 insertions(+), 100 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 961c1623f81..9a62475a69f 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -2898,43 +2899,63 @@ vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
   return true;
 }
 
-/* If LOOP_VINFO is already a main loop, return it unmodified.  Otherwise
-   try to reanalyze it as a main loop.  Return the loop_vinfo on success
-   and null on failure.  */
+/* Analyze LOOP with VECTOR_MODE and as epilogue if MAIN_LOOP_VINFO is
+   not NULL.  Process the analyzed loop with PROCESS even if analysis
+   failed.  Sets *N_STMTS and FATAL according to the analysis.
+   Return the loop_vinfo on success and wrapped null on failure.  */
 
-static loop_vec_info
-vect_reanalyze_as_main_loop (loop_vec_info loop_vinfo, unsigned int *n_stmts)
+static opt_loop_vec_info
+vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared,
+machine_mode vector_mode, loop_vec_info main_loop_vinfo,
+unsigned int *n_stmts, bool &fatal,
+std::function process = nullptr)
 {
-  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
-return loop_vinfo;
+  /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
+  opt_loop_vec_info loop_vinfo = vect_analyze_loop_form (loop, shared);
+  if (!loop_vinfo)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"bad loop form.\n");
+  gcc_checking_assert (main_loop_vinfo == NULL);
+  return loop_vinfo;
+}
+  loop_vinfo->vector_mode = vector_mode;
 
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
-"* Reanalyzing as a main loop with vector mode %s\n",
-GET_MODE_NAME (loop_vinfo->vector_mode));
+  if (main_loop_vinfo)
+LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = main_loop_vinfo;
 
-  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  vec_info_shared *shared = loop_vinfo->shared;
-  opt_loop_vec_info main_loop_vinfo = vect_analyze_loop_form (loop, shared);
-  gcc_assert (main_loop_vinfo);
+  /* Run the main analysis.  */
+  fatal = false;
+  opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, n_stmts);
+  loop->aux = NULL;
 
-  main_loop_vinfo->vector_mode = loop_vinfo->vector_mode;
+  /* Process info before we destroy loop_vinfo upon analysis failure.  */
+  if (process)
+process (loop_vinfo);
 
-  bool fatal = false;
-  bool res = vect_analyze_loop_2 (main_loop_vinfo, fatal, n_stmts);
-  loop->aux = NULL;
-  if (!res)
+  if (dump_enabled_p ())
 {
-  if (dump_enabled_p ())
+  if (res)
dump_printf_loc (MSG_NOTE, vect_location,
-"* Failed to analyze main loop with vector"
-" mode %s\n",
+"* Analysis succeeded with vector mode %s\n",
 GET_MODE_NAME (loop_vinfo->vector_mode));
-  delete main_loop_vinfo;
-  return NULL;
+  else
+   dump_printf_loc (MSG_NOTE, vect_location,
+"* Analysis failed with vector mode %s\n",
+GET_MODE_NAME (loop_vinfo->vector_mode));
+}
+
+  if (!res)
+{
+  delete loop_vinfo;
+  if (fatal)
+   gcc_checking_assert (main_loop_vinfo == NULL);
+  return opt_loop_vec_info::propagate_failure (res);
 }
-  LOOP_VINFO_VECTORIZABLE_P (main_loop_vinfo) = 1;
-  return main_loop_vinfo;
+
+  LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
+  return loop_vinfo;
 }
 
 /* Function vect_analyze_loop.
@@ -2981,20 +3002,6 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
   unsigned HOST_WIDE_INT simdlen = loop->simdlen;
   while (1)
 {
-  /* 

Re: [V2/PATCH] Fix tree-optimization/102216: missed optimization causing Warray-bounds

2021-10-27 Thread Richard Biener via Gcc-patches
On Wed, Oct 27, 2021 at 12:00 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> The problem here is tree-ssa-forwprop.c likes to produce
> &MEM  [(void *)_4 + 152B] which is the same as
> _4 p+ 152 which the rest of GCC likes better.
> This implements this transformation back to pointer plus to
> improve better code generation later on.

Why do you think so?  Can you pin-point the transform that now
fixes the new testcase?

Comments below

> OK? Bootstrapped and tested on aarch64-linux-gnu.
>
> Changes from v1:
> * v2: Add comments.
>
> gcc/ChangeLog:
>
> PR tree-optimization/102216
> * tree-ssa-forwprop.c (rewrite_assign_addr): New function.
> (forward_propagate_addr_expr_1): Use rewrite_assign_addr
> when rewriting into the addr_expr into an assignment.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/102216
> * g++.dg/tree-ssa/pr102216.C: New test.
> ---
>  gcc/testsuite/g++.dg/tree-ssa/pr102216.C | 22 +
>  gcc/tree-ssa-forwprop.c  | 58 ++--
>  2 files changed, 67 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr102216.C
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr102216.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
> new file mode 100644
> index 000..b903e4eb57d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
> @@ -0,0 +1,22 @@
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +void link_error ();
> +void g ()
> +{
> +  const char **language_names;
> +
> +  language_names = new const char *[6];
> +
> +  const char **language_names_p = language_names;
> +
> +  language_names_p++;
> +  language_names_p++;
> +  language_names_p++;
> +
> +  if ( (language_names_p) - (language_names+3) != 0)
> +link_error();
> +  delete[] language_names;
> +}
> +/* We should have removed the link_error on the gimple level as GCC should
> +   be able to tell that language_names_p is the same as language_names+3.  */
> +/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" } } */
> +
> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> index a830bab78ba..e4331c60525 100644
> --- a/gcc/tree-ssa-forwprop.c
> +++ b/gcc/tree-ssa-forwprop.c
> @@ -637,6 +637,47 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
>return 0;
>  }
>
> +/* Rewrite the DEF_RHS as needed into the (plain) use statement.  */
> +
> +static void
> +rewrite_assign_addr (gimple_stmt_iterator *use_stmt_gsi, tree def_rhs)
> +{
> +  tree def_rhs_base;
> +  poly_int64 def_rhs_offset;
> +
> +  /* Get the base and offset.  */
> +  if ((def_rhs_base = get_addr_base_and_unit_offset (TREE_OPERAND (def_rhs, 
> 0),
> +&def_rhs_offset)))

So this will cause us to rewrite &MEM[p_1].a.b.c; to a pointer-plus,
right?  Don't
we want to preserve that for object-size stuff?  So maybe directly pattern
match ADDR_EXPR > only?

> +{
> +  tree new_ptr;
> +  poly_offset_int off = 0;
> +
> +  /* If the base was a MEM, then add the offset to the other
> + offset and adjust the base. */
> +  if (TREE_CODE (def_rhs_base) == MEM_REF)
> +   {
> + off += mem_ref_offset (def_rhs_base);
> + new_ptr = TREE_OPERAND (def_rhs_base, 0);
> +   }
> +  else
> +   new_ptr = build_fold_addr_expr (def_rhs_base);
> +
> +  /* If we have the new base is not an address express, then use a p+ 
> expression
> + as the new expression instead of &MEM[x, offset]. */
> +  if (TREE_CODE (new_ptr) != ADDR_EXPR)
> +   {
> + tree offset = wide_int_to_tree (sizetype, off);
> + def_rhs = build2 (POINTER_PLUS_EXPR, TREE_TYPE (def_rhs), new_ptr, 
> offset);

Ick.  You should be able to use gimple_assign_set_rhs_with_ops.

> +   }
> +}
> +
> +  /* Replace the rhs with the new expression.  */
> +  def_rhs = unshare_expr (def_rhs);

and definitely no need to unshare anything here?

> +  gimple_assign_set_rhs_from_tree (use_stmt_gsi, def_rhs);
> +  gimple *use_stmt = gsi_stmt (*use_stmt_gsi);
> +  update_stmt (use_stmt);
> +}
> +
>  /* We've just substituted an ADDR_EXPR into stmt.  Update all the
> relevant data structures to match.  */
>
> @@ -696,8 +737,8 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
>if (single_use_p
>   && useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (def_rhs)))
> {
> - gimple_assign_set_rhs1 (use_stmt, unshare_expr (def_rhs));
> - gimple_assign_set_rhs_code (use_stmt, TREE_CODE (def_rhs));
> + rewrite_assign_addr (use_stmt_gsi, def_rhs);
> + gcc_assert (gsi_stmt (*use_stmt_gsi) == use_stmt);
>   return true;
> }
>
> @@ -741,14 +782,7 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
>if (forward_propagate_addr_expr (lhs, new_def_rhs, single_use_p))
> return true;
>
> -  if (useless_type_conversion_p (T

Re: [V2/PATCH] Fix tree-optimization/102216: missed optimization causing Warray-bounds

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On 27 October 2021 11:59:58 CEST, apinski--- via Gcc-patches 
 wrote:
>From: Andrew Pinski 
>
>The problem here is tree-ssa-forwprop.c likes to produce
>&MEM  [(void *)_4 + 152B] which is the same as
>_4 p+ 152 which the rest of GCC likes better.
>This implements this transformation back to pointer plus to
>improve better code generation later on.
>
>OK? Bootstrapped and tested on aarch64-linux-gnu.
>
>Changes from v1:
>* v2: Add comments.
>
>gcc/ChangeLog:
>
>   PR tree-optimization/102216
>   * tree-ssa-forwprop.c (rewrite_assign_addr): New function.
>   (forward_propagate_addr_expr_1): Use rewrite_assign_addr
>   when rewriting into the addr_expr into an assignment.
>
>gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/102216
>   * g++.dg/tree-ssa/pr102216.C: New test.
>---
> gcc/testsuite/g++.dg/tree-ssa/pr102216.C | 22 +
> gcc/tree-ssa-forwprop.c  | 58 ++--
> 2 files changed, 67 insertions(+), 13 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr102216.C
>
>diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr102216.C 
>b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
>new file mode 100644
>index 000..b903e4eb57d
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
>@@ -0,0 +1,22 @@
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+void link_error ();
>+void g ()
>+{
>+  const char **language_names;
>+
>+  language_names = new const char *[6];
>+
>+  const char **language_names_p = language_names;
>+
>+  language_names_p++;
>+  language_names_p++;
>+  language_names_p++;
>+
>+  if ( (language_names_p) - (language_names+3) != 0)
>+link_error();
>+  delete[] language_names;
>+}
>+/* We should have removed the link_error on the gimple level as GCC should
>+   be able to tell that language_names_p is the same as language_names+3.  */
>+/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" } } */
>+
>diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>index a830bab78ba..e4331c60525 100644
>--- a/gcc/tree-ssa-forwprop.c
>+++ b/gcc/tree-ssa-forwprop.c
>@@ -637,6 +637,47 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
>   return 0;
> }
> 
>+/* Rewrite the DEF_RHS as needed into the (plain) use statement.  */
>+
>+static void
>+rewrite_assign_addr (gimple_stmt_iterator *use_stmt_gsi, tree def_rhs)
>+{
>+  tree def_rhs_base;
>+  poly_int64 def_rhs_offset;
>+
>+  /* Get the base and offset.  */
>+  if ((def_rhs_base = get_addr_base_and_unit_offset (TREE_OPERAND (def_rhs, 
>0),
>+   &def_rhs_offset)))
>+{
>+  tree new_ptr;
>+  poly_offset_int off = 0;
>+
>+  /* If the base was a MEM, then add the offset to the other
>+ offset and adjust the base. */
>+  if (TREE_CODE (def_rhs_base) == MEM_REF)
>+  {
>+off += mem_ref_offset (def_rhs_base);
>+new_ptr = TREE_OPERAND (def_rhs_base, 0);
>+  }
>+  else
>+  new_ptr = build_fold_addr_expr (def_rhs_base);
>+
>+  /* If we have the new base is not an address express, then use a p+ 
>expression
>+ as the new expression instead of &MEM[x, offset]. */
>+  if (TREE_CODE (new_ptr) != ADDR_EXPR)
>+  {
>+tree offset = wide_int_to_tree (sizetype, off);
>+def_rhs = build2 (POINTER_PLUS_EXPR, TREE_TYPE (def_rhs), new_ptr, 
>offset);
>+  }
>+}
>+
>+  /* Replace the rhs with the new expression.  */
>+  def_rhs = unshare_expr (def_rhs);
>+  gimple_assign_set_rhs_from_tree (use_stmt_gsi, def_rhs);
>+  gimple *use_stmt = gsi_stmt (*use_stmt_gsi);
>+  update_stmt (use_stmt);
>+}
>+
> /* We've just substituted an ADDR_EXPR into stmt.  Update all the
>relevant data structures to match.  */
> 
>@@ -696,8 +737,8 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
>   if (single_use_p
> && useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (def_rhs)))
>   {
>-gimple_assign_set_rhs1 (use_stmt, unshare_expr (def_rhs));
>-gimple_assign_set_rhs_code (use_stmt, TREE_CODE (def_rhs));
>+rewrite_assign_addr (use_stmt_gsi, def_rhs);
>+gcc_assert (gsi_stmt (*use_stmt_gsi) == use_stmt);
> return true;
>   }
> 
>@@ -741,14 +782,7 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
>   if (forward_propagate_addr_expr (lhs, new_def_rhs, single_use_p))
>   return true;
> 
>-  if (useless_type_conversion_p (TREE_TYPE (lhs),
>-   TREE_TYPE (new_def_rhs)))
>-  gimple_assign_set_rhs_with_ops (use_stmt_gsi, TREE_CODE (new_def_rhs),
>-  new_def_rhs);
>-  else if (is_gimple_min_invariant (new_def_rhs))
>-  gimple_assign_set_rhs_with_ops (use_stmt_gsi, NOP_EXPR, new_def_rhs);
>-  else
>-  return false;
>+  rewrite_assign_addr (use_stmt_gsi, new_def_rhs);
>   gcc_assert (gsi_stmt (*use_stmt_gsi) == use_stmt);
>   update_stmt (use_stmt);

ISTM the abov

[V2/PATCH] Fix tree-optimization/102216: missed optimization causing Warray-bounds

2021-10-27 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is tree-ssa-forwprop.c likes to produce
&MEM  [(void *)_4 + 152B] which is the same as
_4 p+ 152 which the rest of GCC likes better.
This implements this transformation back to pointer plus to
improve better code generation later on.

OK? Bootstrapped and tested on aarch64-linux-gnu.

Changes from v1:
* v2: Add comments.

gcc/ChangeLog:

PR tree-optimization/102216
* tree-ssa-forwprop.c (rewrite_assign_addr): New function.
(forward_propagate_addr_expr_1): Use rewrite_assign_addr
when rewriting into the addr_expr into an assignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/102216
* g++.dg/tree-ssa/pr102216.C: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr102216.C | 22 +
 gcc/tree-ssa-forwprop.c  | 58 ++--
 2 files changed, 67 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr102216.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr102216.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
new file mode 100644
index 000..b903e4eb57d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
@@ -0,0 +1,22 @@
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+void link_error ();
+void g ()
+{
+  const char **language_names;
+
+  language_names = new const char *[6];
+
+  const char **language_names_p = language_names;
+
+  language_names_p++;
+  language_names_p++;
+  language_names_p++;
+
+  if ( (language_names_p) - (language_names+3) != 0)
+link_error();
+  delete[] language_names;
+}
+/* We should have removed the link_error on the gimple level as GCC should
+   be able to tell that language_names_p is the same as language_names+3.  */
+/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" } } */
+
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index a830bab78ba..e4331c60525 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -637,6 +637,47 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
   return 0;
 }
 
+/* Rewrite the DEF_RHS as needed into the (plain) use statement.  */
+
+static void
+rewrite_assign_addr (gimple_stmt_iterator *use_stmt_gsi, tree def_rhs)
+{
+  tree def_rhs_base;
+  poly_int64 def_rhs_offset;
+
+  /* Get the base and offset.  */
+  if ((def_rhs_base = get_addr_base_and_unit_offset (TREE_OPERAND (def_rhs, 0),
+&def_rhs_offset)))
+{
+  tree new_ptr;
+  poly_offset_int off = 0;
+
+  /* If the base was a MEM, then add the offset to the other
+ offset and adjust the base. */
+  if (TREE_CODE (def_rhs_base) == MEM_REF)
+   {
+ off += mem_ref_offset (def_rhs_base);
+ new_ptr = TREE_OPERAND (def_rhs_base, 0);
+   }
+  else
+   new_ptr = build_fold_addr_expr (def_rhs_base);
+
+  /* If we have the new base is not an address express, then use a p+ 
expression
+ as the new expression instead of &MEM[x, offset]. */
+  if (TREE_CODE (new_ptr) != ADDR_EXPR)
+   {
+ tree offset = wide_int_to_tree (sizetype, off);
+ def_rhs = build2 (POINTER_PLUS_EXPR, TREE_TYPE (def_rhs), new_ptr, 
offset);
+   }
+}
+
+  /* Replace the rhs with the new expression.  */
+  def_rhs = unshare_expr (def_rhs);
+  gimple_assign_set_rhs_from_tree (use_stmt_gsi, def_rhs);
+  gimple *use_stmt = gsi_stmt (*use_stmt_gsi);
+  update_stmt (use_stmt);
+}
+
 /* We've just substituted an ADDR_EXPR into stmt.  Update all the
relevant data structures to match.  */
 
@@ -696,8 +737,8 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
   if (single_use_p
  && useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (def_rhs)))
{
- gimple_assign_set_rhs1 (use_stmt, unshare_expr (def_rhs));
- gimple_assign_set_rhs_code (use_stmt, TREE_CODE (def_rhs));
+ rewrite_assign_addr (use_stmt_gsi, def_rhs);
+ gcc_assert (gsi_stmt (*use_stmt_gsi) == use_stmt);
  return true;
}
 
@@ -741,14 +782,7 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
   if (forward_propagate_addr_expr (lhs, new_def_rhs, single_use_p))
return true;
 
-  if (useless_type_conversion_p (TREE_TYPE (lhs),
-TREE_TYPE (new_def_rhs)))
-   gimple_assign_set_rhs_with_ops (use_stmt_gsi, TREE_CODE (new_def_rhs),
-   new_def_rhs);
-  else if (is_gimple_min_invariant (new_def_rhs))
-   gimple_assign_set_rhs_with_ops (use_stmt_gsi, NOP_EXPR, new_def_rhs);
-  else
-   return false;
+  rewrite_assign_addr (use_stmt_gsi, new_def_rhs);
   gcc_assert (gsi_stmt (*use_stmt_gsi) == use_stmt);
   update_stmt (use_stmt);
   return true;
@@ -951,9 +985,7 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
  unshare_expr (def_rhs),
 

[PATCH] Refactor try_vectorize_loop_1

2021-10-27 Thread Richard Biener via Gcc-patches
This refactors epilogue loop handling in try_vectorize_loop_1 to not
suggest we're analyzing those there by splitting out the transform
phase which then can handle the epilogues.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-27  Richard Biener  

* tree-vectorizer.c (vect_transform_loops): New function,
split out from ...
(try_vectorize_loop_1): ... here.  Simplify as epilogues
are now fully handled in the split part.
---
 gcc/tree-vectorizer.c | 105 --
 1 file changed, 50 insertions(+), 55 deletions(-)

diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 4712dc6e7f9..89fa883fbb9 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -979,6 +979,50 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   free (bbs);
 }
 
+/* Generate vectorized code for LOOP and its epilogues.  */
+
+static void
+vect_transform_loops (hash_table *&simduid_to_vf_htab,
+ loop_p loop, gimple *loop_vectorized_call)
+{
+  loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
+
+  if (loop_vectorized_call)
+set_uid_loop_bbs (loop_vinfo, loop_vectorized_call);
+
+  unsigned HOST_WIDE_INT bytes;
+  if (dump_enabled_p ())
+{
+  if (GET_MODE_SIZE (loop_vinfo->vector_mode).is_constant (&bytes))
+   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
+"loop vectorized using %wu byte vectors\n", bytes);
+  else
+   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
+"loop vectorized using variable length vectors\n");
+}
+
+  loop_p new_loop = vect_transform_loop (loop_vinfo,
+loop_vectorized_call);
+  /* Now that the loop has been vectorized, allow it to be unrolled
+ etc.  */
+  loop->force_vectorize = false;
+
+  if (loop->simduid)
+{
+  simduid_to_vf *simduid_to_vf_data = XNEW (simduid_to_vf);
+  if (!simduid_to_vf_htab)
+   simduid_to_vf_htab = new hash_table (15);
+  simduid_to_vf_data->simduid = DECL_UID (loop->simduid);
+  simduid_to_vf_data->vf = loop_vinfo->vectorization_factor;
+  *simduid_to_vf_htab->find_slot (simduid_to_vf_data, INSERT)
+ = simduid_to_vf_data;
+}
+
+  /* Epilogue of vectorized loop must be vectorized too.  */
+  if (new_loop)
+vect_transform_loops (simduid_to_vf_htab, new_loop, NULL);
+}
+
 /* Try to vectorize LOOP.  */
 
 static unsigned
@@ -999,17 +1043,9 @@ try_vectorize_loop_1 (hash_table 
*&simduid_to_vf_htab,
 LOCATION_FILE (vect_location.get_location_t ()),
 LOCATION_LINE (vect_location.get_location_t ()));
 
-  opt_loop_vec_info loop_vinfo = opt_loop_vec_info::success (NULL);
-  /* In the case of epilogue vectorization the loop already has its
- loop_vec_info set, we do not require to analyze the loop in this case.  */
-  if (loop_vec_info vinfo = loop_vec_info_for_loop (loop))
-loop_vinfo = opt_loop_vec_info::success (vinfo);
-  else
-{
-  /* Try to analyze the loop, retaining an opt_problem if dump_enabled_p.  
*/
-  loop_vinfo = vect_analyze_loop (loop, &shared);
-  loop->aux = loop_vinfo;
-}
+  /* Try to analyze the loop, retaining an opt_problem if dump_enabled_p.  */
+  opt_loop_vec_info loop_vinfo = vect_analyze_loop (loop, &shared);
+  loop->aux = loop_vinfo;
 
   if (!loop_vinfo)
 if (dump_enabled_p ())
@@ -1083,8 +1119,7 @@ try_vectorize_loop_1 (hash_table 
*&simduid_to_vf_htab,
   return ret;
 }
 
-  /* Only count the original scalar loops.  */
-  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo) && !dbg_cnt (vect_loop))
+  if (!dbg_cnt (vect_loop))
 {
   /* Free existing information if loop is analyzed with some
 assumptions.  */
@@ -1093,62 +1128,22 @@ try_vectorize_loop_1 (hash_table 
*&simduid_to_vf_htab,
   return ret;
 }
 
-  if (loop_vectorized_call)
-set_uid_loop_bbs (loop_vinfo, loop_vectorized_call);
-
-  unsigned HOST_WIDE_INT bytes;
-  if (dump_enabled_p ())
-{
-  if (GET_MODE_SIZE (loop_vinfo->vector_mode).is_constant (&bytes))
-   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"loop vectorized using %wu byte vectors\n", bytes);
-  else
-   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"loop vectorized using variable length vectors\n");
-}
-
-  loop_p new_loop = vect_transform_loop (loop_vinfo,
-loop_vectorized_call);
   (*num_vectorized_loops)++;
-  /* Now that the loop has been vectorized, allow it to be unrolled
- etc.  */
-  loop->force_vectorize = false;
-
-  if (loop->simduid)
-{
-  simduid_to_vf *simduid_to_vf_data = XNEW (simduid_to_vf);
-  if (!simduid_to_vf_htab)
-   simduid_to_vf_htab = new hash_table (15);
-  simduid_to_vf_data->simduid = DECL_UID (loop->simduid);
-  simduid_to

[PATCH] Fix tree-optimization/102216: missed optimization causing Warray-bounds

2021-10-27 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is tree-ssa-forwprop.c likes to produce
&MEM  [(void *)_4 + 152B] which is the same as
_4 p+ 152 which the rest of GCC likes better.
This implements this transformation back to pointer plus to
improve better code generation later on.

OK? Bootstrapped and tested on aarch64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/102216
* tree-ssa-forwprop.c (rewrite_assign_addr): New function.
(forward_propagate_addr_expr_1): Use rewrite_assign_addr
when rewriting into the addr_expr into an assignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/102216
* g++.dg/tree-ssa/pr102216.C: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr102216.C | 22 
 gcc/tree-ssa-forwprop.c  | 46 +---
 2 files changed, 55 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr102216.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr102216.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
new file mode 100644
index 000..b903e4eb57d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr102216.C
@@ -0,0 +1,22 @@
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+void link_error ();
+void g ()
+{
+  const char **language_names;
+
+  language_names = new const char *[6];
+
+  const char **language_names_p = language_names;
+
+  language_names_p++;
+  language_names_p++;
+  language_names_p++;
+
+  if ( (language_names_p) - (language_names+3) != 0)
+link_error();
+  delete[] language_names;
+}
+/* We should have removed the link_error on the gimple level as GCC should
+   be able to tell that language_names_p is the same as language_names+3.  */
+/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" } } */
+
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index a830bab78ba..ba06bccdf75 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -637,6 +637,35 @@ forward_propagate_into_cond (gimple_stmt_iterator *gsi_p)
   return 0;
 }
 
+static void
+rewrite_assign_addr (gimple_stmt_iterator *use_stmt_gsi, tree def_rhs)
+{
+  tree def_rhs_base;
+  poly_int64 def_rhs_offset;
+  if ((def_rhs_base = get_addr_base_and_unit_offset (TREE_OPERAND (def_rhs, 0),
+&def_rhs_offset)))
+{
+  tree new_ptr;
+  poly_offset_int off = 0;
+  if (TREE_CODE (def_rhs_base) == MEM_REF)
+   {
+ off += mem_ref_offset (def_rhs_base);
+ new_ptr = TREE_OPERAND (def_rhs_base, 0);
+   }
+  else
+   new_ptr = build_fold_addr_expr (def_rhs_base);
+  if (TREE_CODE (new_ptr) != ADDR_EXPR)
+   {
+ tree offset = wide_int_to_tree (sizetype, off);
+ def_rhs = build2 (POINTER_PLUS_EXPR, TREE_TYPE (def_rhs), new_ptr, 
offset);
+   }
+}
+  def_rhs = unshare_expr (def_rhs);
+  gimple_assign_set_rhs_from_tree (use_stmt_gsi, def_rhs);
+  gimple *use_stmt = gsi_stmt (*use_stmt_gsi);
+  update_stmt (use_stmt);
+}
+
 /* We've just substituted an ADDR_EXPR into stmt.  Update all the
relevant data structures to match.  */
 
@@ -696,8 +725,8 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
   if (single_use_p
  && useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (def_rhs)))
{
- gimple_assign_set_rhs1 (use_stmt, unshare_expr (def_rhs));
- gimple_assign_set_rhs_code (use_stmt, TREE_CODE (def_rhs));
+ rewrite_assign_addr (use_stmt_gsi, def_rhs);
+ gcc_assert (gsi_stmt (*use_stmt_gsi) == use_stmt);
  return true;
}
 
@@ -741,14 +770,7 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
   if (forward_propagate_addr_expr (lhs, new_def_rhs, single_use_p))
return true;
 
-  if (useless_type_conversion_p (TREE_TYPE (lhs),
-TREE_TYPE (new_def_rhs)))
-   gimple_assign_set_rhs_with_ops (use_stmt_gsi, TREE_CODE (new_def_rhs),
-   new_def_rhs);
-  else if (is_gimple_min_invariant (new_def_rhs))
-   gimple_assign_set_rhs_with_ops (use_stmt_gsi, NOP_EXPR, new_def_rhs);
-  else
-   return false;
+  rewrite_assign_addr (use_stmt_gsi, new_def_rhs);
   gcc_assert (gsi_stmt (*use_stmt_gsi) == use_stmt);
   update_stmt (use_stmt);
   return true;
@@ -951,9 +973,7 @@ forward_propagate_addr_expr_1 (tree name, tree def_rhs,
  unshare_expr (def_rhs),
  fold_convert (ptr_type_node,
rhs2)));
-  gimple_assign_set_rhs_from_tree (use_stmt_gsi, new_rhs);
-  use_stmt = gsi_stmt (*use_stmt_gsi);
-  update_stmt (use_stmt);
+  rewrite_assign_addr (use_stmt_gsi, new_rhs);
   tidy_after_forward_propagate_addr (use_stmt);
   return true;
 }
-- 
2.17.1



Re: [committed] testsuite: Fix up gcc.dg/pr102897.c testcase [PR102897]

2021-10-27 Thread Kewen.Lin via Gcc-patches
Hi Jakub,

on 2021/10/27 下午3:51, Jakub Jelinek wrote:
> On Tue, Oct 26, 2021 at 11:40:01AM +0800, Kewen.Lin via Gcc-patches wrote:
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.dg/pr102897.c: New test.
> 
> The testcase FAILs on i686-linux due to:
> FAIL: gcc.dg/pr102897.c (test for excess errors)
> Excess errors:
> .../gcc/gcc/testsuite/gcc.dg/pr102897.c:11:1: warning: MMX vector return 
> without MMX enabled changes the ABI [-Wpsabi]
> .../gcc/gcc/testsuite/gcc.dg/pr102897.c:10:10: warning: MMX vector argument 
> without MMX enabled changes the ABI [-Wpsabi]
> Fixed by adding -Wno-psabi.
> 
> Tested on x86_64-linux and i686-linux, committed to trunk as obvious.
> 

Thanks for fixing this up!

BR,
Kewen

> 2021-10-27  Jakub Jelinek  
> 
>   * gcc.dg/pr102897.c: Add -Wno-psabi to dg-options.
> 
> --- gcc/testsuite/gcc.dg/pr102897.c.jj2021-10-27 09:00:28.848276246 
> +0200
> +++ gcc/testsuite/gcc.dg/pr102897.c   2021-10-27 09:40:45.628296807 +0200
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* Specify C99 to avoid the warning/error on compound literals.  */
> -/* { dg-options "-O1 -std=c99" } */
> +/* { dg-options "-O1 -std=c99 -Wno-psabi" } */
>  
>  /* Verify that there is no ICE.  */
>  
> 
> 
>   Jakub
>


[PATCH] print extended assertion failures to stderr

2021-10-27 Thread Jay Feldblum via Gcc-patches
From: yfeldblum 

The stdout stream is reserved for output intentionally produced by the
application. Assertion failures and other forms of logging must be
emitted to stderr, not to stdout.

It is common for testing and monitoring infrastructure to scan stderr
for errors, such as for assertion failures, and to collect or retain
them for analysis or observation. It is a norm that assertion failures
match this expectation in practice.

While `__builtin_fprintf` is available as a builtin, there is no
equivalent builtin for `stderr`. The only option in practice is to use
the macro `stderr`, which requires `#include `. It is desired
not to add such an include to `bits/c++config` so the solution is to
write and export a function which may be called by `bits/c++config`.

This is expected to be API-compatible and ABI-compatible with caveats.
Code compiled against an earlier libstdc++ will work when linked into a
later libstdc++ but the stream to which assertion failures are logged is
anybody's guess, and in practice will be determined by the link line and
the choice of linker. This fix targets builds for which all C++ code is
built against a libstdc++ with the fix.

Alternatives:
* This, which is the smallest change.
* This, but also defining symbols `std::__stdin` and `std::__stdout` for
  completeness.
* Define a symbol like `std::__printf_stderr` which prints any message
  with any formatting to stderr, just as `std::printf` does to stdout,
  and call that from `std::__replacement_assert` instead of calling
  `__builtin_printf`.
* Move `std::__replacement_assert` into libstdc++.so and no longer mark
  it as weak. This allows an application with some parts built against a
  previous libstdc++ to guarantee that the fix will be applied at least
  to the parts that are built against a libstdc++ containing the fix.

libstdc++-v3/ChangeLog:
include/bits/c++config (__glibcxx_assert): print to stderr.
---
 libstdc++-v3/include/bits/c++config |  8 --
 libstdc++-v3/src/c++98/Makefile.am  |  1 +
 libstdc++-v3/src/c++98/Makefile.in  |  2 +-
 libstdc++-v3/src/c++98/stdio.cc | 39 +
 4 files changed, 47 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/src/c++98/stdio.cc

diff --git a/libstdc++-v3/include/bits/c++config
b/libstdc++-v3/include/bits/c++config
index 
a64958096718126a49e8767694e913ed96108df2..d821ba09d88dc3e42ff1807200cfece71cc18bd9
100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -523,6 +523,10 @@ namespace std
 # ifdef _GLIBCXX_VERBOSE_ASSERT
 namespace std
 {
+  // Avoid the use of stderr, because we're trying to keep the 
+  // include out of the mix.
+  extern "C++" void* __stderr() _GLIBCXX_NOEXCEPT;
+
   // Avoid the use of assert, because we're trying to keep the 
   // include out of the mix.
   extern "C++" _GLIBCXX_NORETURN
@@ -531,8 +535,8 @@ namespace std
 const char* __function, const char* __condition)
   _GLIBCXX_NOEXCEPT
   {
-__builtin_printf("%s:%d: %s: Assertion '%s' failed.\n", __file, __line,
-  __function, __condition);
+__builtin_fprintf(__stderr(), "%s:%d: %s: Assertion '%s' failed.\n",
+  __file, __line, __function, __condition);
 __builtin_abort();
   }
 }
diff --git a/libstdc++-v3/src/c++98/Makefile.am
b/libstdc++-v3/src/c++98/Makefile.am
index 
b48b57a2945780bb48496d3b5e76de4be61f836e..4032f914ea20344f51f2f219c5575d2a3858c44c
100644
--- a/libstdc++-v3/src/c++98/Makefile.am
+++ b/libstdc++-v3/src/c++98/Makefile.am
@@ -136,6 +136,7 @@ sources = \
  math_stubs_float.cc \
  math_stubs_long_double.cc \
  stdexcept.cc \
+ stdio.cc \
  strstream.cc \
  tree.cc \
  istream.cc \
diff --git a/libstdc++-v3/src/c++98/Makefile.in
b/libstdc++-v3/src/c++98/Makefile.in
index 
f9ebb0ff4f4cb86cde7070b5ba6b8bf6a20515b3..e8aeb37d864a0ab7711d763fe8fbd3045db6e00d
100644
--- a/libstdc++-v3/src/c++98/Makefile.in
+++ b/libstdc++-v3/src/c++98/Makefile.in
@@ -142,7 +142,7 @@ am__objects_7 = bitmap_allocator.lo
pool_allocator.lo mt_allocator.lo \
  list.lo list-aux.lo list-aux-2.lo list_associated.lo \
  list_associated-2.lo locale.lo locale_init.lo locale_facets.lo \
  localename.lo math_stubs_float.lo math_stubs_long_double.lo \
- stdexcept.lo strstream.lo tree.lo istream.lo istream-string.lo \
+ stdexcept.lo stdio.lo strstream.lo tree.lo istream.lo istream-string.lo \
  streambuf.lo valarray.lo $(am__objects_1) $(am__objects_3) \
  $(am__objects_6)
 am_libc__98convenience_la_OBJECTS = $(am__objects_7)
diff --git a/libstdc++-v3/src/c++98/stdio.cc b/libstdc++-v3/src/c++98/stdio.cc
new file mode 100644
index 
..d0acb9117e1728f66f1a72ae3a9f471af72034ef
--- /dev/null
+++ b/libstdc++-v3/src/c++98/stdio.cc
@@ -0,0 +1,39 @@
+// Portability symbols for  -*- C++ -*-
+
+// Copyright (C) 2021-2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or

[committed] testsuite: Fix up gcc.dg/pr102897.c testcase [PR102897]

2021-10-27 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 11:40:01AM +0800, Kewen.Lin via Gcc-patches wrote:
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr102897.c: New test.

The testcase FAILs on i686-linux due to:
FAIL: gcc.dg/pr102897.c (test for excess errors)
Excess errors:
.../gcc/gcc/testsuite/gcc.dg/pr102897.c:11:1: warning: MMX vector return 
without MMX enabled changes the ABI [-Wpsabi]
.../gcc/gcc/testsuite/gcc.dg/pr102897.c:10:10: warning: MMX vector argument 
without MMX enabled changes the ABI [-Wpsabi]
Fixed by adding -Wno-psabi.

Tested on x86_64-linux and i686-linux, committed to trunk as obvious.

2021-10-27  Jakub Jelinek  

* gcc.dg/pr102897.c: Add -Wno-psabi to dg-options.

--- gcc/testsuite/gcc.dg/pr102897.c.jj  2021-10-27 09:00:28.848276246 +0200
+++ gcc/testsuite/gcc.dg/pr102897.c 2021-10-27 09:40:45.628296807 +0200
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* Specify C99 to avoid the warning/error on compound literals.  */
-/* { dg-options "-O1 -std=c99" } */
+/* { dg-options "-O1 -std=c99 -Wno-psabi" } */
 
 /* Verify that there is no ICE.  */
 


Jakub



[committed] openmp: Document that non-rect loops are not supported in Fortran yet

2021-10-27 Thread Jakub Jelinek via Gcc-patches
Hi!

I've found we claim to support non-rectangular loops, but don't actually
support those in Fortran, as can be seen on:
  integer i, j
  !$omp parallel do collapse(2)
  do i = 0, 10
do j = 0, i
end do
  end do
end
To support this, the Fortran FE needs to allow the valid forms of
non-rectangular loops and disallow others, so mainly it needs its
updated version of c-omp.c c_omp_check_loop_iv etc., plus for non-rectangular
lb or ub expressions emit a TREE_VEC instead of normal expression as the C/C++ 
FE
do, plus testsuite coverage.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-27  Jakub Jelinek  

* libgomp.texi (OpenMP 5.0): Mention that Non-rectangular loop nests
aren't implemented for Fortran yet.

--- libgomp/libgomp.texi.jj 2021-10-21 10:23:27.605832433 +0200
+++ libgomp/libgomp.texi2021-10-26 17:12:15.274870308 +0200
@@ -189,7 +189,7 @@ The OpenMP 4.5 specification is fully su
 @item @code{requires} directive @tab P
   @tab Only fulfillable requirement is @code{atomic_default_mem_order}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
-@item Non-rectangular loop nests @tab Y @tab
+@item Non-rectangular loop nests @tab P @tab Only C/C++
 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
 @item @code{nonmonotonic} as default loop schedule modifier for 
worksharing-loop
   constructs @tab Y @tab

Jakub



[committed] openmp: Allow non-rectangular loops with pointer iterators

2021-10-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch handles pointer iterators for non-rectangular loops.  They are
more limited than integral iterators of non-rectangular loops, in particular
only var-outer, var-outer + a2, a2 + var-outer or var-outer - a2 can appear
in lb or ub where a2 is some integral loop invariant expression, so no e.g.
multiplication etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-27  Jakub Jelinek  

gcc/
* omp-expand.c (expand_omp_for_init_counts): Handle non-rectangular
iterators with pointer types.
(expand_omp_for_init_vars, extract_omp_for_update_vars): Likewise.
gcc/c-family/
* c-omp.c (c_omp_check_loop_iv_r): Don't clear 3rd bit for
POINTER_PLUS_EXPR.
(c_omp_check_nonrect_loop_iv): Handle POINTER_PLUS_EXPR.
(c_omp_check_loop_iv): Set kind even if the iterator is non-integral.
gcc/testsuite/
* c-c++-common/gomp/loop-8.c: New test.
* c-c++-common/gomp/loop-9.c: New test.
libgomp/
* testsuite/libgomp.c/loop-26.c: New test.
* testsuite/libgomp.c/loop-27.c: New test.

--- gcc/omp-expand.c.jj 2021-10-09 10:07:51.913704546 +0200
+++ gcc/omp-expand.c2021-10-26 16:24:53.209939548 +0200
@@ -1975,6 +1975,7 @@ expand_omp_for_init_counts (struct omp_f
  break;
   if (i == fd->last_nonrect
  && fd->loops[i].outer == fd->last_nonrect - fd->first_nonrect
+ && !POINTER_TYPE_P (TREE_TYPE (fd->loops[i].v))
  && !TYPE_UNSIGNED (TREE_TYPE (fd->loops[i].v)))
{
  int o = fd->first_nonrect;
@@ -2250,15 +2251,22 @@ expand_omp_for_init_counts (struct omp_f
  gsi2 = gsi_after_labels (cur_bb);
  tree n1, n2;
  t = fold_convert (itype, unshare_expr (fd->loops[i].n1));
- if (fd->loops[i].m1)
+ if (fd->loops[i].m1 == NULL_TREE)
+   n1 = t;
+ else if (POINTER_TYPE_P (itype))
+   {
+ gcc_assert (integer_onep (fd->loops[i].m1));
+ t = fold_convert (sizetype,
+   unshare_expr (fd->loops[i].n1));
+ n1 = fold_build_pointer_plus (vs[i - fd->loops[i].outer], t);
+   }
+ else
{
  n1 = fold_convert (itype, unshare_expr (fd->loops[i].m1));
  n1 = fold_build2 (MULT_EXPR, itype,
vs[i - fd->loops[i].outer], n1);
  n1 = fold_build2 (PLUS_EXPR, itype, n1, t);
}
- else
-   n1 = t;
  n1 = force_gimple_operand_gsi (&gsi2, n1, true, NULL_TREE,
 true, GSI_SAME_STMT);
  if (i < fd->last_nonrect)
@@ -2267,17 +2275,26 @@ expand_omp_for_init_counts (struct omp_f
  expand_omp_build_assign (&gsi2, vs[i], n1);
}
  t = fold_convert (itype, unshare_expr (fd->loops[i].n2));
- if (fd->loops[i].m2)
+ if (fd->loops[i].m2 == NULL_TREE)
+   n2 = t;
+ else if (POINTER_TYPE_P (itype))
+   {
+ gcc_assert (integer_onep (fd->loops[i].m2));
+ t = fold_convert (sizetype,
+   unshare_expr (fd->loops[i].n2));
+ n2 = fold_build_pointer_plus (vs[i - fd->loops[i].outer], t);
+   }
+ else
{
  n2 = fold_convert (itype, unshare_expr (fd->loops[i].m2));
  n2 = fold_build2 (MULT_EXPR, itype,
vs[i - fd->loops[i].outer], n2);
  n2 = fold_build2 (PLUS_EXPR, itype, n2, t);
}
- else
-   n2 = t;
  n2 = force_gimple_operand_gsi (&gsi2, n2, true, NULL_TREE,
 true, GSI_SAME_STMT);
+ if (POINTER_TYPE_P (itype))
+   itype = signed_type_for (itype);
  if (i == fd->last_nonrect)
{
  gcond *cond_stmt
@@ -2295,8 +2312,10 @@ expand_omp_for_init_counts (struct omp_f
 ? -1 : 1));
  t = fold_build2 (PLUS_EXPR, itype,
   fold_convert (itype, fd->loops[i].step), t);
- t = fold_build2 (PLUS_EXPR, itype, t, n2);
- t = fold_build2 (MINUS_EXPR, itype, t, n1);
+ t = fold_build2 (PLUS_EXPR, itype, t,
+  fold_convert (itype, n2));
+ t = fold_build2 (MINUS_EXPR, itype, t,
+  fold_convert (itype, n1));
  tree step = fold_convert (itype, fd->loops[i].step);
  if (TYPE_UNSIGNED (itype)
  && fd->loops[i].cond_code == GT_EXPR)
@@ -2323,7 +2342,11 @@ expand_omp_for_init_counts (struct omp_f
  gsi2 = gsi_a

[committed] openmp: Don't reject some valid initializers or conditions of non-rectangular loops [PR102854]

2021-10-27 Thread Jakub Jelinek via Gcc-patches
Hi!

In C++, if an iterator has or might have (e.g. dependent type) class type we
remember the original init expressions and check those separately for presence
of iterators, because for class iterators we turn those into expressions that
always do contain reference to the current iterator.  But this resulted in
rejecting valid non-rectangular loop where the dependent type is later 
instantiated
to an integral type.

Non-rectangular loops with class random access iterators remain broken, that is 
something
to be fixed incrementally.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-27  Jakub Jelinek  

PR c++/102854
gcc/c-family/
* c-common.h (c_omp_check_loop_iv_exprs): Add enum tree_code argument.
* c-omp.c (c_omp_check_loop_iv_r): For trees other than decls,
TREE_VEC, PLUS_EXPR, MINUS_EXPR, MULT_EXPR, POINTER_PLUS_EXPR or
conversions temporarily clear the 3rd bit from d->kind while walking
subtrees.
(c_omp_check_loop_iv_exprs): Add CODE argument.  Or in 4 into data.kind
if possibly non-rectangular.
gcc/cp/
* semantics.c (handle_omp_for_class_iterator,
finish_omp_for): Adjust c_omp_check_loop_iv_exprs caller.
gcc/testsuite/
* g++.dg/gomp/loop-3.C: Don't expect some errors.
* g++.dg/gomp/loop-7.C: New test.

--- gcc/c-family/c-common.h.jj  2021-10-05 09:53:55.377734121 +0200
+++ gcc/c-family/c-common.h 2021-10-21 16:09:20.397732877 +0200
@@ -1234,8 +1234,8 @@ extern void c_finish_omp_taskyield (loca
 extern tree c_finish_omp_for (location_t, enum tree_code, tree, tree, tree,
  tree, tree, tree, tree, bool);
 extern bool c_omp_check_loop_iv (tree, tree, walk_tree_lh);
-extern bool c_omp_check_loop_iv_exprs (location_t, tree, int, tree, tree, tree,
-  walk_tree_lh);
+extern bool c_omp_check_loop_iv_exprs (location_t, enum tree_code, tree, int,
+  tree, tree, tree, walk_tree_lh);
 extern tree c_finish_oacc_wait (location_t, tree, tree);
 extern tree c_oacc_split_loop_clauses (tree, tree *, bool);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
--- gcc/c-family/c-omp.c.jj 2021-10-21 10:27:37.760331329 +0200
+++ gcc/c-family/c-omp.c2021-10-21 18:31:29.967708593 +0200
@@ -1353,6 +1353,19 @@ c_omp_check_loop_iv_r (tree *tp, int *wa
}
   d->fail = true;
 }
+  else if ((d->kind & 4)
+  && TREE_CODE (*tp) != TREE_VEC
+  && TREE_CODE (*tp) != PLUS_EXPR
+  && TREE_CODE (*tp) != MINUS_EXPR
+  && TREE_CODE (*tp) != MULT_EXPR
+  && !CONVERT_EXPR_P (*tp))
+{
+  *walk_subtrees = 0;
+  d->kind &= 3;
+  walk_tree_1 (tp, c_omp_check_loop_iv_r, data, NULL, d->lh);
+  d->kind |= 4;
+  return NULL_TREE;
+}
   else if (d->ppset->add (*tp))
 *walk_subtrees = 0;
   /* Don't walk dtors added by C++ wrap_cleanups_r.  */
@@ -1651,11 +1677,13 @@ c_omp_check_loop_iv (tree stmt, tree dec
 /* Similar, but allows to check the init or cond expressions individually.  */
 
 bool
-c_omp_check_loop_iv_exprs (location_t stmt_loc, tree declv, int i, tree decl,
-  tree init, tree cond, walk_tree_lh lh)
+c_omp_check_loop_iv_exprs (location_t stmt_loc, enum tree_code code,
+  tree declv, int i, tree decl, tree init, tree cond,
+  walk_tree_lh lh)
 {
   hash_set pset;
   struct c_omp_check_loop_iv_data data;
+  int kind = (code != OACC_LOOP && i > 0) ? 4 : 0;
 
   data.declv = declv;
   data.fail = false;
@@ -1674,7 +1702,7 @@ c_omp_check_loop_iv_exprs (location_t st
   if (init)
 {
   data.expr_loc = EXPR_LOCATION (init);
-  data.kind = 0;
+  data.kind = kind;
   walk_tree_1 (&init,
   c_omp_check_loop_iv_r, &data, NULL, lh);
 }
@@ -1682,7 +1710,7 @@ c_omp_check_loop_iv_exprs (location_t st
 {
   gcc_assert (COMPARISON_CLASS_P (cond));
   data.expr_loc = EXPR_LOCATION (init);
-  data.kind = 1;
+  data.kind = kind | 1;
   if (TREE_OPERAND (cond, 0) == decl)
walk_tree_1 (&TREE_OPERAND (cond, 1),
 c_omp_check_loop_iv_r, &data, NULL, lh);
--- gcc/cp/semantics.c.jj   2021-10-21 10:23:12.014050655 +0200
+++ gcc/cp/semantics.c  2021-10-21 16:10:30.835749744 +0200
@@ -9211,7 +9211,7 @@ handle_omp_for_class_iterator (int i, lo
TREE_OPERAND (cond, 1), iter);
   return true;
 }
-  if (!c_omp_check_loop_iv_exprs (locus, orig_declv, i,
+  if (!c_omp_check_loop_iv_exprs (locus, code, orig_declv, i,
  TREE_VEC_ELT (declv, i), NULL_TREE,
  cond, cp_walk_subtrees))
 return true;
@@ -9597,7 +9597,7 @@ finish_omp_for (location_t locus, enum t
   tree orig_init;
   FOR_EACH_VEC_ELT (*orig_inits, i, orig_init)
if (orig_init
-   && 

Re: [PATCH v2 1/4] Fix loop split incorrect count and probability

2021-10-27 Thread Jan Hubicka via Gcc-patches
> On Wed, 27 Oct 2021, Jan Hubicka wrote:
> 
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> > >   (do_split_loop_on_cond): Likewise.
> > > ---
> > >  gcc/tree-ssa-loop-split.c | 25 -
> > >  1 file changed, 16 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > > index 3f6ad046623..d30782888f3 100644
> > > --- a/gcc/tree-ssa-loop-split.c
> > > +++ b/gcc/tree-ssa-loop-split.c
> > > @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
> > >   stmts2);
> > >   tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> > >   if (!initial_true)
> > > -   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> > > +   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > > +
> > > + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> > > +? EDGE_SUCC (bbs[i], 0)
> > > +: EDGE_SUCC (bbs[i], 1);
> > >  
> > >   /* Now version the loop, placing loop2 after loop1 connecting
> > >  them, and fix up SSA form for that.  */
> > > @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
> > >   basic_block cond_bb;
> > >  
> > >   class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> > > -profile_probability::always (),
> > > -profile_probability::always (),
> > > -profile_probability::always (),
> > > -profile_probability::always (),
> > > +true_edge->probability,
> > > +true_edge->probability.invert (),
> > > +true_edge->probability,
> > > +true_edge->probability.invert (),
> > >  true);
> > 
> > As discussed yesterday, for loop of form
> > 
> > for (...)
> >   if (cond)
> > cond = something();
> >   else
> > something2
> > 
> > Split as
> 
> Note that you are missing to conditionalize loop1 execution
> on 'cond' (not sure if that makes a difference).
You are right - forgot to mention that.

Entry conditional makes no difference on scaling stmts inside loop but
affects its header and expected trip count. We however need to set up
probability of this conditional (and preheader count if it exists)
There is no general way to read the probability of this initial
conditional from cfg profile.  So I guess we are stuck with guessing
some arbitrary value. I guess common case is that cond is true first
iteration tough and often we can easily see that fromo PHI node
initializing the test variable.

Other thing that changes is expected number of iterations of the split
loops, so we may want to update the exit conditinal probability
accordingly...

Honza
> 
> > loop1:
> if (cond)
> > for (...)
> >   if (true)
> > cond = something();
> > if (!cond)
> >   break
> >   else
> > something2 ();
> > 
> > loop2:
> > for (...)
> >   if (false)
> > cond = something();
> >   else
> > something2 ();
> > 
> > If "if (cond)" has probability p, you want to scale loop1 by p
> > and loop2 by 1-p as your patch does, but you need to exclude the basic
> > blocks guarded by the condition.
> > 
> > One way is to break out loop_version and implement it inline, other
> > option (perhaps leading to less code duplication) is to add argument listing
> > basic blocks that should not be scaled, which would be set to both arms
> > of the if.
> > 
> > Are there other profile patches of your I should look at?
> > Honza
> > >   gcc_assert (loop2);
> > >  
> > > @@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> > > invar_branch)
> > >initialize_original_copy_tables ();
> > >  
> > >struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > > -  profile_probability::always (),
> > > -  profile_probability::never (),
> > > -  profile_probability::always (),
> > > -  profile_probability::always (),
> > > +  invar_branch->probability.invert (),
> > > +  invar_branch->probability,
> > > +  invar_branch->probability.invert (),
> > > +  invar_branch->probability,
> > >true);
> > >if (!loop2)
> > >  {
> > > @@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> > > invar_branch)
> > >to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> > >to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> > >  
> > > +  to_loop1->probability = invar_branch->probability.invert ();
> > > +  to_loop2->prob

Re: [PATCH v2 1/4] Fix loop split incorrect count and probability

2021-10-27 Thread Richard Biener via Gcc-patches
On Wed, 27 Oct 2021, Jan Hubicka wrote:

> > 
> > gcc/ChangeLog:
> > 
> > * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> > (do_split_loop_on_cond): Likewise.
> > ---
> >  gcc/tree-ssa-loop-split.c | 25 -
> >  1 file changed, 16 insertions(+), 9 deletions(-)
> > 
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index 3f6ad046623..d30782888f3 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
> > stmts2);
> > tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
> > if (!initial_true)
> > - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> > + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > +
> > +   edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> > +  ? EDGE_SUCC (bbs[i], 0)
> > +  : EDGE_SUCC (bbs[i], 1);
> >  
> > /* Now version the loop, placing loop2 after loop1 connecting
> >them, and fix up SSA form for that.  */
> > @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
> > basic_block cond_bb;
> >  
> > class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> > -  profile_probability::always (),
> > -  profile_probability::always (),
> > -  profile_probability::always (),
> > -  profile_probability::always (),
> > +  true_edge->probability,
> > +  true_edge->probability.invert (),
> > +  true_edge->probability,
> > +  true_edge->probability.invert (),
> >true);
> 
> As discussed yesterday, for loop of form
> 
> for (...)
>   if (cond)
> cond = something();
>   else
> something2
> 
> Split as

Note that you are missing to conditionalize loop1 execution
on 'cond' (not sure if that makes a difference).

> loop1:
if (cond)
> for (...)
>   if (true)
> cond = something();
> if (!cond)
>   break
>   else
> something2 ();
> 
> loop2:
> for (...)
>   if (false)
> cond = something();
>   else
> something2 ();
> 
> If "if (cond)" has probability p, you want to scale loop1 by p
> and loop2 by 1-p as your patch does, but you need to exclude the basic
> blocks guarded by the condition.
> 
> One way is to break out loop_version and implement it inline, other
> option (perhaps leading to less code duplication) is to add argument listing
> basic blocks that should not be scaled, which would be set to both arms
> of the if.
> 
> Are there other profile patches of your I should look at?
> Honza
> > gcc_assert (loop2);
> >  
> > @@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> > invar_branch)
> >initialize_original_copy_tables ();
> >  
> >struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > -profile_probability::always (),
> > -profile_probability::never (),
> > -profile_probability::always (),
> > -profile_probability::always (),
> > +invar_branch->probability.invert (),
> > +invar_branch->probability,
> > +invar_branch->probability.invert (),
> > +invar_branch->probability,
> >  true);
> >if (!loop2)
> >  {
> > @@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> > invar_branch)
> >to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> >to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> >  
> > +  to_loop1->probability = invar_branch->probability.invert ();
> > +  to_loop2->probability = invar_branch->probability;
> > +
> >/* Due to introduction of a control flow edge from loop1 latch to loop2
> >   pre-header, we should update PHIs in loop2 to reflect this connection
> >   between loop1 and loop2.  */
> > -- 
> > 2.27.0.90.geebb51ba8c
> > 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH v2 1/4] Fix loop split incorrect count and probability

2021-10-27 Thread Jan Hubicka via Gcc-patches
> As discussed yesterday, for loop of form
> 
> for (...)
>   if (cond)
> cond = something();
>   else
> something2
> 
> Split as
> 

Say "if (cond)" has probability p, then individual statements scale as
follows:

  loop1:
pfor (...)
p  if (true)
1cond = something();
1if (!cond)
x  break
0  else
0something2 ();
 
 loop2:
1-p  for (...)
1-pif (false)
0cond = something();
1  else
1something2 ();

Block with x has same count as preheader of the loop.

Your patch does
  loop1:
pfor (...)
p  if (true)
pcond = something();
pif (!cond)
x  break
p  else
psomething2 ();
 
 loop2:
1-p  for (...)
1-pif (false)
1-p  cond = something();
1-pelse
1-p  something2 ();

One does not need set 0 correctly (blocks will be removed), but it would
be nice to avoid dropping 1s down. Looking at this, things will go well
with your change if we are guaranteed that something() and something2()
is always 1 bb becuase block merging happening after turning condiitonal
to constant will remove the misupdated count. Your profile scales after merging
is:
  loop1:
pfor (...)
1cond = something();
1if (!cond)
x  break
 
 loop2:
1-p  for (...)
1   something2 ();

assuming that profile was sane and frequency of something() is
p*frequency of the conditional and similarly for something2().
This is why you see final profile correct.  So if we split only loops
with one BB then/else arms, the patch is OK with comment explaining
this.

Also I wonder, do we correctly duplicate&update known bounds on
iteration counts attached to the loop struccture?

Honza
> 
> If "if (cond)" has probability p, you want to scale loop1 by p
> and loop2 by 1-p as your patch does, but you need to exclude the basic
> blocks guarded by the condition.
> 
> One way is to break out loop_version and implement it inline, other
> option (perhaps leading to less code duplication) is to add argument listing
> basic blocks that should not be scaled, which would be set to both arms
> of the if.
> 
> Are there other profile patches of your I should look at?
> Honza
> > gcc_assert (loop2);
> >  
> > @@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> > invar_branch)
> >initialize_original_copy_tables ();
> >  
> >struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> > -profile_probability::always (),
> > -profile_probability::never (),
> > -profile_probability::always (),
> > -profile_probability::always (),
> > +invar_branch->probability.invert (),
> > +invar_branch->probability,
> > +invar_branch->probability.invert (),
> > +invar_branch->probability,
> >  true);
> >if (!loop2)
> >  {
> > @@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> > invar_branch)
> >to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
> >to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
> >  
> > +  to_loop1->probability = invar_branch->probability.invert ();
> > +  to_loop2->probability = invar_branch->probability;
> > +
> >/* Due to introduction of a control flow edge from loop1 latch to loop2
> >   pre-header, we should update PHIs in loop2 to reflect this connection
> >   between loop1 and loop2.  */
> > -- 
> > 2.27.0.90.geebb51ba8c
> > 


Re: RISCV: Add zmmul extension

2021-10-27 Thread Kito Cheng
Hi Shi-Hua:

> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -1872,7 +1872,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
> outer_code, int opno ATTRIBUTE_UN
>  case MULT:
>if (float_mode_p)
> *total = tune_param->fp_mul[mode == DFmode];
> -  else if (!TARGET_MUL)
> +  else if (!TARGET_MUL && !TARGET_ZMMUL)
> /* Estimate the cost of a library call.  */
> *total = COSTS_N_INSNS (speed ? 32 : 6);
>else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> @@ -4736,6 +4736,9 @@ riscv_option_override (void)
>if (flag_pic)
>  g_switch_value = 0;
>
> +  /* zmmul */
> +  if (TARGET_ZMMUL && TARGET_MUL)
> +error ("can not use both the % and the % extension");

My understanding is zmmul and M are not mutually exclusive, so we
don't need this check,

Otherwise it is LGTM, but I'm just surprised it's still 0.1 and not frozen yet.

[1] https://github.com/riscv/riscv-isa-manual/pull/648#issuecomment-842461775


Re: [PATCH v2 1/4] Fix loop split incorrect count and probability

2021-10-27 Thread Jan Hubicka via Gcc-patches
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>   (do_split_loop_on_cond): Likewise.
> ---
>  gcc/tree-ssa-loop-split.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3f6ad046623..d30782888f3 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -575,7 +575,11 @@ split_loop (class loop *loop1)
>   stmts2);
>   tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
>   if (!initial_true)
> -   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
> +   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> +
> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> +? EDGE_SUCC (bbs[i], 0)
> +: EDGE_SUCC (bbs[i], 1);
>  
>   /* Now version the loop, placing loop2 after loop1 connecting
>  them, and fix up SSA form for that.  */
> @@ -583,10 +587,10 @@ split_loop (class loop *loop1)
>   basic_block cond_bb;
>  
>   class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> -profile_probability::always (),
> -profile_probability::always (),
> -profile_probability::always (),
> -profile_probability::always (),
> +true_edge->probability,
> +true_edge->probability.invert (),
> +true_edge->probability,
> +true_edge->probability.invert (),
>  true);

As discussed yesterday, for loop of form

for (...)
  if (cond)
cond = something();
  else
something2

Split as

loop1:
for (...)
  if (true)
cond = something();
if (!cond)
  break
  else
something2 ();

loop2:
for (...)
  if (false)
cond = something();
  else
something2 ();

If "if (cond)" has probability p, you want to scale loop1 by p
and loop2 by 1-p as your patch does, but you need to exclude the basic
blocks guarded by the condition.

One way is to break out loop_version and implement it inline, other
option (perhaps leading to less code duplication) is to add argument listing
basic blocks that should not be scaled, which would be set to both arms
of the if.

Are there other profile patches of your I should look at?
Honza
>   gcc_assert (loop2);
>  
> @@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> invar_branch)
>initialize_original_copy_tables ();
>  
>struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> -  profile_probability::always (),
> -  profile_probability::never (),
> -  profile_probability::always (),
> -  profile_probability::always (),
> +  invar_branch->probability.invert (),
> +  invar_branch->probability,
> +  invar_branch->probability.invert (),
> +  invar_branch->probability,
>true);
>if (!loop2)
>  {
> @@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge 
> invar_branch)
>to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
>to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
>  
> +  to_loop1->probability = invar_branch->probability.invert ();
> +  to_loop2->probability = invar_branch->probability;
> +
>/* Due to introduction of a control flow edge from loop1 latch to loop2
>   pre-header, we should update PHIs in loop2 to reflect this connection
>   between loop1 and loop2.  */
> -- 
> 2.27.0.90.geebb51ba8c
>