[FYI] [Ada] reenable ada83 library unit renaming error

2021-10-13 Thread Alexandre Oliva via Gcc-patches


The condition of the 'if' encompassed that of the 'elsif', so the error
message wouldn't get a chance to be printed.

Regstrapped on x86_64-linux-gnu.  I'm checking this in.

for  gcc/ada/ChangeLog

* par-ch10.adb (P_Compilation_Unit): Reenable ada83 library
unit renaming test and error.
---
 gcc/ada/par-ch10.adb |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/par-ch10.adb b/gcc/ada/par-ch10.adb
index f02934afc7e44..76f0edddc94c6 100644
--- a/gcc/ada/par-ch10.adb
+++ b/gcc/ada/par-ch10.adb
@@ -532,13 +532,14 @@ package body Ch10 is
| N_Subprogram_Body
| N_Subprogram_Renaming_Declaration
  then
-Unit_Node := Specification (Unit_Node);
-
- elsif Nkind (Unit_Node) = N_Subprogram_Renaming_Declaration then
-if Ada_Version = Ada_83 then
+if Nkind (Unit_Node) = N_Subprogram_Renaming_Declaration
+  and then Ada_Version = Ada_83
+then
Error_Msg_N
  ("(Ada 83) library unit renaming not allowed", Unit_Node);
 end if;
+
+Unit_Node := Specification (Unit_Node);
  end if;
 
  if Nkind (Unit_Node) in N_Task_Body


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] AVX512FP16: Support vector shuffle builtins

2021-10-13 Thread Hongyu Wang via Gcc-patches
Hi,

This patch supports HFmode vector shuffle by creating HImode subreg when
expanding permutation expr.

Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}
OK for master?

gcc/ChangeLog:

* config/i386/i386-expand.c (ix86_expand_vec_perm): Convert
HFmode input operand to HImode.
(ix86_vectorize_vec_perm_const): Likewise.
(ix86_expand_vector_init): Allow HFmode for one_operand_shuffle.
* config/i386/sse.md (*avx512bw_permvar_truncv16siv16hi_1_hf):
New define_insn.
(*avx512f_permvar_truncv8siv8hi_1_hf):
Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-builtin_shuffle-1.c: New test.
* gcc.target/i386/avx512fp16-pr101846.c: Ditto.
* gcc.target/i386/avx512fp16-pr94680.c: Ditto.
---
 gcc/config/i386/i386-expand.c | 29 ++-
 gcc/config/i386/sse.md| 54 +++-
 .../i386/avx512fp16-builtin_shuffle-1.c   | 86 +++
 .../gcc.target/i386/avx512fp16-pr101846.c | 56 
 .../gcc.target/i386/avx512fp16-pr94680.c  | 61 +
 5 files changed, 284 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-builtin_shuffle-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr101846.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr94680.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index c0924a59efb..0f50ed3b9f8 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -4836,6 +4836,18 @@ ix86_expand_vec_perm (rtx operands[])
   e = GET_MODE_UNIT_SIZE (mode);
   gcc_assert (w <= 64);
 
+  if (GET_MODE_INNER (mode) == HFmode)
+{
+  machine_mode orig_mode = mode;
+  mode = mode_for_vector (HImode, w).require ();
+  if (target)
+   target = lowpart_subreg (mode, target, orig_mode);
+  if (op0)
+   op0 = lowpart_subreg (mode, op0, orig_mode);
+  if (op1)
+   op1 = lowpart_subreg (mode, op1, orig_mode);
+}
+
   if (TARGET_AVX512F && one_operand_shuffle)
 {
   rtx (*gen) (rtx, rtx, rtx) = NULL;
@@ -15092,7 +15104,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx 
vals)
  rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
  if (inner_mode == QImode
  || inner_mode == HImode
- || inner_mode == TImode)
+ || inner_mode == TImode
+ || inner_mode == HFmode)
{
  unsigned int n_bits = n_elts * GET_MODE_SIZE (inner_mode);
  scalar_mode elt_mode = inner_mode == TImode ? DImode : SImode;
@@ -21099,6 +21112,20 @@ ix86_vectorize_vec_perm_const (machine_mode vmode, rtx 
target, rtx op0,
   unsigned int i, nelt, which;
   bool two_args;
 
+  /* For HF mode vector, convert it to HI using subreg.  */
+  if (GET_MODE_INNER (vmode) == HFmode)
+{
+  machine_mode orig_mode = vmode;
+  vmode = mode_for_vector (HImode,
+  GET_MODE_NUNITS (vmode)).require ();
+  if (target)
+   target = lowpart_subreg (vmode, target, orig_mode);
+  if (op0)
+   op0 = lowpart_subreg (vmode, op0, orig_mode);
+  if (op1)
+   op1 = lowpart_subreg (vmode, op1, orig_mode);
+}
+
   d.target = target;
   d.op0 = op0;
   d.op1 = op1;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a3c4a3f1e62..d023d8a1c2e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -12573,6 +12573,33 @@
(truncate:V16HI (match_dup 1)))]
   "operands[1] = lowpart_subreg (V16SImode, operands[1], V32HImode);")
 
+(define_insn_and_split "*avx512bw_permvar_truncv16siv16hi_1_hf"
+  [(set (match_operand:V16HF 0 "nonimmediate_operand")
+   (vec_select:V16HF
+ (subreg:V32HF
+   (unspec:V32HI
+ [(match_operand:V32HI 1 "register_operand")
+  (match_operand:V32HI 2 "permvar_truncate_operand")]
+UNSPEC_VPERMVAR) 0)
+ (parallel [(const_int 0) (const_int 1)
+(const_int 2) (const_int 3)
+(const_int 4) (const_int 5)
+(const_int 6) (const_int 7)
+(const_int 8) (const_int 9)
+(const_int 10) (const_int 11)
+(const_int 12) (const_int 13)
+(const_int 14) (const_int 15)])))]
+  "TARGET_AVX512BW && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (truncate:V16HI (match_dup 1)))]
+{
+  operands[0] = lowpart_subreg (V16HImode, operands[0], V16HFmode);
+  operands[1] = lowpart_subreg (V16SImode, operands[1], V32HImode);
+})
+
+
 (define_insn_and_split "*avx512f_permvar_truncv8siv8hi_1"
   [(set (match_operand:V8HI 0 "nonimmediate_operand")
(vec_select:V8HI
@@ -12591,6 +12618,28 @@
(truncate:V8HI (match_dup 1)))]
   "operands[1] = lowpart_subreg (V8SImode, operands[1], V16HImode);")
 

[r12-4379 Regression] FAIL: gcc.target/i386/sse-26.c (test for excess errors) on Linux/x86_64

2021-10-13 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

97c320016642a40a347d558abc952cc487ad4ff6 is the first bad commit
commit 97c320016642a40a347d558abc952cc487ad4ff6
Author: Roger Sayle 
Date:   Wed Oct 13 19:49:47 2021 +0100

x86_64: Some SUBREG related optimization tweaks to i386 backend.

caused

FAIL: gcc.target/i386/avx-1.c (internal compiler error)
FAIL: gcc.target/i386/avx-1.c (test for excess errors)
FAIL: gcc.target/i386/avx-2.c (internal compiler error)
FAIL: gcc.target/i386/avx-2.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/sse-13.c (internal compiler error)
FAIL: gcc.target/i386/sse-13.c (test for excess errors)
FAIL: gcc.target/i386/sse-14.c (internal compiler error)
FAIL: gcc.target/i386/sse-14.c (test for excess errors)
FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
FAIL: gcc.target/i386/sse-22.c (internal compiler error)
FAIL: gcc.target/i386/sse-22.c (test for excess errors)
FAIL: gcc.target/i386/sse-23.c (internal compiler error)
FAIL: gcc.target/i386/sse-23.c (test for excess errors)
FAIL: gcc.target/i386/sse-24.c (internal compiler error)
FAIL: gcc.target/i386/sse-24.c (test for excess errors)
FAIL: gcc.target/i386/sse-25.c (internal compiler error)
FAIL: gcc.target/i386/sse-25.c (test for excess errors)
FAIL: gcc.target/i386/sse-26.c (internal compiler error)
FAIL: gcc.target/i386/sse-26.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4379/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx-1.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx-2.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/keylocker-aesdecwide128kl.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/keylocker-aesdecwide128kl.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/keylocker-aesdecwide256kl.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/keylocker-aesdecwide256kl.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/keylocker-aesencwide128kl.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/keylocker-aesencwide128kl.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/keylocker-aesencwide256kl.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/keylocker-aesencwide256kl.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse-13.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse-13.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse-14.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse-14.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse-22a.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse-22a.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse-22.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/sse-22.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 

Re: [PATCH] AVX512FP16: Adjust builtin for mask complex fma

2021-10-13 Thread Hongtao Liu via Gcc-patches
On Wed, Oct 13, 2021 at 5:07 PM Hongyu Wang via Gcc-patches
 wrote:
>
> Hi,
>
> Current mask/mask3 implementation for complex fma contains
> duplicated parameter in macro, which may cause error at -O0.
> Refactor macro implementation to builtins to avoid potential
> error.
>
> For round intrinsic with NO_ROUND as input, ix86_erase_embedded_rounding
> erases embedded_rounding upspec but could break other emit_insn in
> expanders. Skip those expanders with multiple emit_insn for this
> function and check rounding in expander with subst.
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}.
> OK for master?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/avx512fp16intrin.h (_mm512_mask_fcmadd_pch):
> Adjust builtin call.
> (_mm512_mask3_fcmadd_pch): Likewise.
> (_mm512_mask_fmadd_pch): Likewise
> (_mm512_mask3_fmadd_pch): Likewise
> (_mm512_mask_fcmadd_round_pch): Likewise
> (_mm512_mask3_fcmadd_round_pch): Likewise
> (_mm512_mask_fmadd_round_pch): Likewise
> (_mm512_mask3_fmadd_round_pch): Likewise
> (_mm_mask_fcmadd_sch): Likewise
> (_mm_mask3_fcmadd_sch): Likewise
> (_mm_mask_fmadd_sch): Likewise
> (_mm_mask3_fmadd_sch): Likewise
> (_mm_mask_fcmadd_round_sch): Likewise
> (_mm_mask3_fcmadd_round_sch): Likewise
> (_mm_mask_fmadd_round_sch): Likewise
> (_mm_mask3_fmadd_round_sch): Likewise
> (_mm_fcmadd_round_sch): Likewise
> * config/i386/avx512fp16vlintrin.h (_mm_mask_fmadd_pch):
> Adjust builtin call.
> (_mm_mask3_fmadd_pch): Likewise
> (_mm256_mask_fmadd_pch): Likewise
> (_mm256_mask3_fmadd_pch): Likewise
> (_mm_mask_fcmadd_pch): Likewise
> (_mm_mask3_fcmadd_pch): Likewise
> (_mm256_mask_fcmadd_pch): Likewise
> (_mm256_mask3_fcmadd_pch): Likewise
> * config/i386/i386-builtin.def: Add mask3 builtin for complex
> fma, and adjust mask_builtin to corresponding expander.
> * config/i386/i386-expand.c (ix86_expand_round_builtin):
> Skip eraseing embedded rounding for expanders that emits
> multiple insns.
> * config/i386/sse.md (complexmove): New mode_attr.
> (_fmaddc__mask1): New expander.
> (_fcmaddc__mask1): Likewise.
> (avx512fp16_fmaddcsh_v8hf_mask1): Likewise.
> (avx512fp16_fcmaddcsh_v8hf_mask1): Likewise.
> (avx512fp16_fcmaddcsh_v8hf_mask3): Likewise.
> (avx512fp16_fmaddcsh_v8hf_mask3): Likewise.
> * config/i386/subst.md (round_embedded_complex): New subst.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx-1.c: Add new mask3 builtins.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-23.c: Ditto.
> * gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Add scanning for
> mask/mask3 intrinsic.
> * gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto.
> * gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c: New test for
> -mavx512vl.
> * gcc.target/i386/avx512fp16-vfmaddcsh-1c.c: Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h| 261 ++
>  gcc/config/i386/avx512fp16vlintrin.h  |  56 ++--
>  gcc/config/i386/i386-builtin.def  |  24 +-
>  gcc/config/i386/i386-expand.c |  22 +-
>  gcc/config/i386/sse.md| 183 
>  gcc/config/i386/subst.md  |   3 +
>  gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
>  .../i386/avx512fp16-vfcmaddcsh-1a.c   |   2 +
>  .../i386/avx512fp16-vfcmaddcsh-1c.c   |  13 +
>  .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c |   2 +
>  .../gcc.target/i386/avx512fp16-vfmaddcsh-1c.c |  13 +
>  gcc/testsuite/gcc.target/i386/sse-13.c|   4 +
>  gcc/testsuite/gcc.target/i386/sse-23.c|   4 +
>  13 files changed, 375 insertions(+), 216 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1c.c
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h 
> b/gcc/config/i386/avx512fp16intrin.h
> index 29cf6792335..5e49447a020 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -6258,13 +6258,11 @@ extern __inline __m512h
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
>  {
> -  return (__m512h) __builtin_ia32_movaps512_mask
> -((__v16sf)
> - __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
> - (__v32hf) __C,
> - (__v32hf) __D, __B,
> - _MM_FROUND_CUR_DIRECTION),
> - (__v16sf) __A, __B);
> +  return (__m512h)
> +__builtin_ia32_vfcmaddcph512_mask_round 

[committed] hppa: Fix TARGET_SOFT_FLOAT patterns in pa.md

2021-10-13 Thread John David Anglin

This change fixes building libgcc with -msoft-float.  Getting soft float to 
work in libgcc
is still a work in progress.

Tested on hppa-unkown-linux-gnu, hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11.

Committed to active branches.

Dave
---
Fix TARGET_SOFT_FLOAT patterns in pa.md

2021-10-13  John David Anglin  

gcc/ChangeLog:

* config/pa/pa.md (cbranchsf4): Disable if TARGET_SOFT_FLOAT.
(cbranchdf4): Likewise.
Add missing move patterns for TARGET_SOFT_FLOAT.

diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index b314f96de35..ba947ab1be9 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -1383,7 +1383,7 @@
 (match_operand:SF 2 "reg_or_0_operand" "")])
  (label_ref (match_operand 3 "" ""))
  (pc)))]
-  ""
+  "! TARGET_SOFT_FLOAT"
   "
 {
   pa_emit_bcond_fp (operands);
@@ -1398,7 +1398,7 @@
 (match_operand:DF 2 "reg_or_0_operand" "")])
  (label_ref (match_operand 3 "" ""))
  (pc)))]
-  ""
+  "! TARGET_SOFT_FLOAT"
   "
 {
   pa_emit_bcond_fp (operands);
@@ -2236,6 +2236,29 @@
(set_attr "pa_combine_type" "addmove")
(set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4")])

+(define_insn ""
+  [(set (match_operand:SI 0 "move_dest_operand"
+ "=r,r,r,r,r,r,Q,!*q,!r")
+   (match_operand:SI 1 "move_src_operand"
+ "A,r,J,N,K,RQ,rM,!rM,!*q"))]
+  "(register_operand (operands[0], SImode)
+|| reg_or_0_operand (operands[1], SImode))
+   && TARGET_SOFT_FLOAT
+   && TARGET_64BIT"
+  "@
+   ldw RT'%A1,%0
+   copy %1,%0
+   ldi %1,%0
+   ldil L'%1,%0
+   {zdepi|depwi,z} %Z1,%0
+   ldw%M1 %1,%0
+   stw%M0 %r1,%0
+   mtsar %r1
+   {mfctl|mfctl,w} %%sar,%0"
+  [(set_attr "type" "load,move,move,move,shift,load,store,move,move")
+   (set_attr "pa_combine_type" "addmove")
+   (set_attr "length" "4,4,4,4,4,4,4,4,4")])
+
 (define_insn ""
   [(set (match_operand:SI 0 "indexed_memory_operand" "=R")
(match_operand:SI 1 "register_operand" "f"))]
@@ -4042,6 +4065,25 @@
(set_attr "pa_combine_type" "addmove")
(set_attr "length" "4,4,4,4,4,4,4,4,4")])

+(define_insn ""
+  [(set (match_operand:DF 0 "move_dest_operand"
+ "=!*r,*r,*r,*r,*r,Q")
+   (match_operand:DF 1 "move_src_operand"
+ "!*r,J,N,K,RQ,*rG"))]
+  "(register_operand (operands[0], DFmode)
+|| reg_or_0_operand (operands[1], DFmode))
+   && TARGET_SOFT_FLOAT && TARGET_64BIT"
+  "@
+   copy %1,%0
+   ldi %1,%0
+   ldil L'%1,%0
+   depdi,z %z1,%0
+   ldd%M1 %1,%0
+   std%M0 %r1,%0"
+  [(set_attr "type" "move,move,move,shift,load,store")
+   (set_attr "pa_combine_type" "addmove")
+   (set_attr "length" "4,4,4,4,4,4")])
+
 
 (define_expand "movdi"
   [(set (match_operand:DI 0 "general_operand" "")
@@ -4200,6 +4242,28 @@
(set_attr "pa_combine_type" "addmove")
(set_attr "length" "4,4,4,4,4,4,4,4,4,4,4,4")])

+(define_insn ""
+  [(set (match_operand:DI 0 "move_dest_operand"
+ "=r,r,r,r,r,r,Q,!*q,!r")
+   (match_operand:DI 1 "move_src_operand"
+ "A,r,J,N,K,RQ,rM,!rM,!*q"))]
+  "(register_operand (operands[0], DImode)
+|| reg_or_0_operand (operands[1], DImode))
+   && TARGET_SOFT_FLOAT && TARGET_64BIT"
+  "@
+   ldd RT'%A1,%0
+   copy %1,%0
+   ldi %1,%0
+   ldil L'%1,%0
+   depdi,z %z1,%0
+   ldd%M1 %1,%0
+   std%M0 %r1,%0
+   mtsar %r1
+   {mfctl|mfctl,w} %%sar,%0"
+  [(set_attr "type" "load,move,move,move,shift,load,store,move,move")
+   (set_attr "pa_combine_type" "addmove")
+   (set_attr "length" "4,4,4,4,4,4,4,4,4")])
+
 (define_insn ""
   [(set (match_operand:DI 0 "indexed_memory_operand" "=R")
(match_operand:DI 1 "register_operand" "f"))]
@@ -4405,6 +4469,23 @@
(set_attr "pa_combine_type" "addmove")
(set_attr "length" "4,4,4,4,4,4")])

+(define_insn ""
+  [(set (match_operand:SF 0 "move_dest_operand"
+ "=!*r,*r,Q")
+   (match_operand:SF 1 "reg_or_0_or_nonsymb_mem_operand"
+ "!*rG,RQ,*rG"))]
+  "(register_operand (operands[0], SFmode)
+|| reg_or_0_operand (operands[1], SFmode))
+   && TARGET_SOFT_FLOAT
+   && TARGET_64BIT"
+  "@
+   copy %r1,%0
+   ldw%M1 %1,%0
+   stw%M0 %r1,%0"
+  [(set_attr "type" "move,load,store")
+   (set_attr "pa_combine_type" "addmove")
+   (set_attr "length" "4,4,4")])
+
 (define_insn ""
   [(set (match_operand:SF 0 "indexed_memory_operand" "=R")
(match_operand:SF 1 "register_operand" "f"))]


Re: [PATCH, rs6000] punish reload of lfiwzx when loading an int variable [PR102169, PR102146]

2021-10-13 Thread Segher Boessenkool
On Wed, Sep 29, 2021 at 04:32:19PM +0800, HAO CHEN GUI wrote:
>   The patch punishes reload of alternative pair of "d, Z" for 
> movsi_internal1. The reload occurs if 'Z' doesn't match and generates an 
> additional insn. So the memory reload should be punished.

As David says, why only for loads?  But also, why not for lxsiwzx (and
stxsiwx) as well?

But, what for all other uses of lfiwzx?  And lfiwax?

We need to find out why the register allocator considers it a good idea
to use FP regs here, and fix *that*?

The extra insn you talk about is because this insn only allows indexed
addressing ([reg+reg] or [reg] addressing).  That is true for very many
insns.  Reload (well, LRA in the modern world) should know about such
extra costs.  Does it not?

> gcc/
>     * gcc/config/rs6000/rs6000.md (movsi_internal1): disparages
>     slightly the alternative 'Z' of "lfiwzx" when reload is 
> needed.

"Disparage", no "s".  Changelog entries are written in the imperative.


Segher


Re: [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations

2021-10-13 Thread Segher Boessenkool
On Wed, Oct 13, 2021 at 12:04:39PM -0500, Paul A. Clarke wrote:
> On Mon, Oct 11, 2021 at 07:11:13PM -0500, Segher Boessenkool wrote:
> > > - _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
> > 
> > Did this fail on p7?  If not, add a test that *does*?
> 
> Do you mean fail if not for "dg-require-effective-target p8vector_hw"?
> We have that, in gcc/testsuite/gcc.target/powerpc/sse2-pmuludq-1.c.

"Some compatibility implementations of x86 intrinsics include
Power intrinsics which require POWER8."

Plus, everything this patch does.  None of that would be needed if it
worked on p7!

So things in this patch are either not needed (so add noise only, and
reduce functionality on older systems for no reason), or they do fix a
bug.  It would be nice if we could have detected such bugs earlier.

> > > gcc
> > >   PR target/101893
> > 
> > This is a different bug (the vgbdd one)?
> 
> PR 101893 is the same issue: things not being properly masked by
> #ifdefs.

But PR101893 does not mention anything you touch here, and this patch
does not fix PR101893.  The main purpose of bug tracking systems is the
tracking part!


Segher


Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.

2021-10-13 Thread H.J. Lu via Gcc-patches
On Wed, Oct 13, 2021 at 2:08 AM Uros Bizjak via Gcc-patches
 wrote:
>
> On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle  
> wrote:
> >
> >
> > Good catch.  I agree with Hongtao that although my testing revealed
> > no problems with the previous version of this patch, it makes sense to
> > call gen_reg_rtx to generate an pseudo intermediate instead of attempting
> > to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
> > intermediate.  I've left the existing behaviour the same, so that
> > memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> > and "make -k check" with no new failures.
> >
> > Ok for mainline?
> >
> >
> > 2021-10-13  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
> > pseudo intermediate when moving a SUBREG into a hard register,
> > by checking ix86_hardreg_mov_ok.
> > (ix86_expand_vector_extract): Store zero-extended SImode
> > intermediate in a pseudo, then set target using a SUBREG_PROMOTED
> > annotated subreg.
> > * config/i386/sse.md (mov_internal): Prevent CSE creating
> > complex (SUBREG) sets of (vector) hard registers before reload, by
> > checking ix86_hardreg_mov_ok.
>
> OK.
>
> Thanks,
> Uros.
>
> >
> > Thanks,
> > Roger
> >
> > -Original Message-
> > From: Hongtao Liu 
> > Sent: 11 October 2021 12:29
> > To: Roger Sayle 
> > Cc: GCC Patches 
> > Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to 
> > i386 backend.
> >
> > On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle  
> > wrote:
> > > gcc/ChangeLog
> > > * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
> > > pseudo intermediate when moving a SUBREG into a hard register,
> > > by checking ix86_hardreg_mov_ok.
> >
> >/* Make operand1 a register if it isn't already.  */
> >if (can_create_pseudo_p ()
> > -  && !register_operand (op0, mode)
> > -  && !register_operand (op1, mode))
> > +  && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0,
> > + mode)
> > +  && !register_operand (op1, mode
> >  {
> >rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
> >
> > ix86_gen_scratch_sse_rtx probably returns a hard register, but here you 
> > want a pseudo register.
> >
> > --
> > BR,
> > Hongtao
> >

This caused:

https://gcc.gnu.org/pipermail/gcc-regression/2021-October/075498.html

FAIL: gcc.target/i386/avx-1.c (internal compiler error)
FAIL: gcc.target/i386/avx-1.c (test for excess errors)
FAIL: gcc.target/i386/avx-2.c (internal compiler error)
FAIL: gcc.target/i386/avx-2.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/sse-13.c (internal compiler error)
FAIL: gcc.target/i386/sse-13.c (test for excess errors)
FAIL: gcc.target/i386/sse-14.c (internal compiler error)
FAIL: gcc.target/i386/sse-14.c (test for excess errors)
FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
FAIL: gcc.target/i386/sse-22.c (internal compiler error)
FAIL: gcc.target/i386/sse-22.c (test for excess errors)
FAIL: gcc.target/i386/sse-23.c (internal compiler error)
FAIL: gcc.target/i386/sse-23.c (test for excess errors)
FAIL: gcc.target/i386/sse-24.c (internal compiler error)
FAIL: gcc.target/i386/sse-24.c (test for excess errors)
FAIL: gcc.target/i386/sse-25.c (internal compiler error)
FAIL: gcc.target/i386/sse-25.c (test for excess errors)
FAIL: gcc.target/i386/sse-26.c (internal compiler error)
FAIL: gcc.target/i386/sse-26.c (test for excess errors)

You can reproduce them by adding -march=cascadelake to these tests.
-- 
H.J.


[committed] libstdc++: Fix regression in memory use when constructing paths

2021-10-13 Thread Jonathan Wakely via Gcc-patches

On 13/10/21 21:19 +0100, Jonathan Wakely wrote:

On 13/10/21 20:41 +0100, Jonathan Wakely wrote:

Adjust the __detail::__effective_range overloads so they always return a
string or string view using std::char_traits, because we don't care
about the traits of an incoming string.

Use std::contiguous_iterator in the __effective_range(const Source&)
overload, to allow returning a basic_string_view in more cases. For the
non-contiguous cases in both __effective_range and __string_from_range,
return a std::string instead of std::u8string when the value type of the
range is char8_t.  These changes avoid unnecessary basic_string
temporaries.


[...]


 template
   inline auto
   __string_from_range(_InputIterator __first, _InputIterator __last)
   {
 using _EcharT
= typename std::iterator_traits<_InputIterator>::value_type;
-  static_assert(__is_encoded_char<_EcharT>);
+  static_assert(__is_encoded_char<_EcharT>); // C++17 [fs.req]/3

-#if __cpp_lib_concepts
-  constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>;
-#else
-  constexpr bool __contiguous
-   = is_pointer_v;
-#endif
-  if constexpr (__contiguous)
+  if constexpr (__is_contiguous<_InputIterator>)


Oops, this pessimiszes construction from string::iterator and
vector::iterator in C++17 mode, because the new __is_contiguous
variable template just uses is_pointer_v, without the __niter_base
call that unwraps a __normal_iterator.

That means that we now create a basic_string temporary where we
previously jsut returned a basic_string_view.

I am testing a fix.


Here's the fix, committed to trunk.

With this change there are no temporaries for string and vector
iterators passed to path constructors. For debug mode there are some
unnecessary temporaries for vector::iterator arguments, because the
safe iterator isn't recognized as contiguous in C++17 mode (but it's
fine in C++20 mode).

The bytes allocated to construct a path consisting of a single
filename with 38 characters are:

Constructor arguments   GCC 11 | GCC 12 | 11 debug | 12 debug
---||--|-
const char*   39   |   39   |39|   39
const char(&)[N]  39   |   39   |39|   39
const char8_t*39   |   39   |39|   39
const char8_t(&)[N]   39   |   39   |39|   39
std::string::iterator pair39   |   39   |39|   39
std::string::iterator NTCTS   92   |   39   |92|   39
std::u8string::iterator pair  39   |   39   |39|   39
std::u8string::iterator NTCTS131   |   39   |   131|   39
std::vector::iterator pair   39   |   39   |78|   78
std::vector::iterator NTCTS 131   |   39   |   131|   39
std::list::iterator pair 78   |   78   |78|   78
std::list::iterator NTCTS   131   |  131   |   131|  131

So for GCC 12 there are no unwanted allocations unless the iterators
are not contiguous (e.g. std::list::iterator). In that case we need to
construct a temporary string. For the pair(Iter, Iter) constructor we
can use distance(first, last) to size that temporary string correctly,
but for the path(const Source&) constructor we read one character at a
time from the input and call push_back.

In any case, the regression is fixed and we're at least as good as GCC
11 in all cases now.


commit f874a13ca3870a56036a90758b0d41c8c217f4f7
Author: Jonathan Wakely 
Date:   Wed Oct 13 21:32:14 2021

libstdc++: Fix regression in memory use when constructing paths

The changes in r12-4381 were intended to reduce memory usage, but
replacing the __contiguous constant in __string_from_range with the new
__is_contiguous variable template caused a regression. The old code
checked is_pointer_v but he new
code just checks is_pointer_v<_InputIterator>. This means that we no
longer recognise basic_string::iterator and vector::iterator as
contiguous, and so return a temporary basic_string instead of a
basic_string_view. This only affects C++17 mode, because the
std::contiguous_iterator concept is used in C++20 which gives the right
answer for __normal_iterator (and more types as well).

The fix is to specialize the new __is_contiguous variable template so it
is true for __normal_iterator specializations. The new partial
specializations are defined for C++20 too, because it should be cheaper
to match the partial specialization than to check whether the
std::contiguous_iterator concept is satisfied.

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (__detail::__is_contiguous): Add
partial specializations for pointers and __normal_iterator.

diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index 05db792fbae..c51bfa3095a 100644
--- a/libstdc++-v3/include/bits/fs_path.h

[committed] libstdc++: Rename files with the wrong extensions

2021-10-13 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* testsuite/27_io/filesystem/path/construct/102592.C: Moved to...
* testsuite/27_io/filesystem/path/construct/102592.cc: ...here.
* testsuite/28_regex/match_results/102667.C: Moved to...
* testsuite/28_regex/match_results/102667.cc: ...here.

Tested powerpc64le-linux. Committed to trunk.

I've added a local git pre-commit hook to stop me adding more files
with .C extensions.


commit ce55693604813c5be7d23260f1fd276cf5a48f8f
Author: Jonathan Wakely 
Date:   Wed Oct 13 22:31:51 2021

libstdc++: Rename files with the wrong extensions

libstdc++-v3/ChangeLog:

* testsuite/27_io/filesystem/path/construct/102592.C: Moved to...
* testsuite/27_io/filesystem/path/construct/102592.cc: ...here.
* testsuite/28_regex/match_results/102667.C: Moved to...
* testsuite/28_regex/match_results/102667.cc: ...here.

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/construct/102592.C 
b/libstdc++-v3/testsuite/27_io/filesystem/path/construct/102592.cc
similarity index 100%
rename from libstdc++-v3/testsuite/27_io/filesystem/path/construct/102592.C
rename to libstdc++-v3/testsuite/27_io/filesystem/path/construct/102592.cc
diff --git a/libstdc++-v3/testsuite/28_regex/match_results/102667.C 
b/libstdc++-v3/testsuite/28_regex/match_results/102667.cc


Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-13 Thread Joseph Myers
On Wed, 13 Oct 2021, HAO CHEN GUI via Gcc-patches wrote:

>   As to IEEE behavior, do you mean "Minimum and maximum operations" defined in
> IEEE-754 2019?  If so, I think VSX/altivec min/max instructions don't conform
> with it. It demands a quite NaN if either operand is a NaN while our
> instructions don't.
> 
> IEEE-754 2019 maximum(x, y) is xif x>y, yif y>x, and a quiet NaN if either
> operand is a NaN, according to 6.2. For this operation, +0 compares greater
> than −0. Otherwise (i.e., when x=y and signs are the same) it is either xor
> y. Actions for xvmaxdp

We don't have any built-in functions (or I think other internal 
operations) for the IEEE 754-2019 operations (C2X function names fmaximum, 
fminimum, fmaximum_num, fminimum_num, plus per-type suffixes) either, 
though as I noted when adding those functions to glibc, having such 
built-in functions would make sense (specifically, so that RISC-V can 
expand calls to fmaximum_num and fminimum_num inline when building for F 
or D extension version 2.2 and later).  The built-in functions we have for 
fmax and fmin correspond to the IEEE 754-2008 operations (as implemented 
by the AArch64 fmaxnm / fminnm instructions, for example).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libstdc++: Fix compare_three_way for constexpr and Clang

2021-10-13 Thread Paul Keir via Gcc-patches
I'd like to cancel the request to apply that patch.

At the time I had actually assumed that Clang was at fault, but your comment 
made me pause. I'll submit a bug report as you suggest. We can reconsider the 
patch in future once that bug is resolved.


From: Jonathan Wakely 
Sent: 11 October 2021 22:04
To: Paul Keir
Cc: gcc-patches@gcc.gnu.org; libstd...@gcc.gnu.org
Subject: Re: [PATCH] libstdc++: Fix compare_three_way for constexpr and Clang

The source of this email is EXTERNAL to UWS

On Mon, 11 Oct 2021 at 20:48, Jonathan Wakely  wrote:
>
> On Fri, 20 Aug 2021 at 21:19, Paul Keir wrote:
> >
> > Hi,
> >
> > The current compare_three_way implementation makes provision for constant 
> > evaluation contexts (avoiding reinterpret_cast etc.), but the approach 
> > fails with Clang; when it compares two const volatile void pointers: 
> > "comparison between unequal pointers to void has unspecified result". I 
> > include a fix and test.
> >
> > Could someone commit the attached patch for me?
>
> Sorry for dropping the ball on this again. I've applied the patch
> locally and I'm testing it now. Unless I'm mistaken, you do not have a
> copyright assignment on file with the FSF, is that right? Are you able
> to certify that you have the right to submit this to GCC, as described
> at 
> https://eu-west-1.protection.sophos.com?d=gnu.org=aHR0cHM6Ly9nY2MuZ251Lm9yZy9kY28uaHRtbA===NWY2MGNhZjMzZTA5NzkwZGZlNmJhMzUy=SlR2SzN4czZueWZRZGdubVA0Z2M4M2FGbC9YLzIrWEVZaUpTMEhaZUJLND0==929febb5b144486493bd3fc3c8522cfe
>  ?

P.S. patches should not touch the ChangeLog file. It was always wrong,
because it usually makes the patch fail to apply. Since we moved to
Git the ChangeLog files are automatically generated from the Git
commits anyway, so are never touched as part of the commit. The
changelog entry is still needed, but should be in the Git commit
message not as a patch against the actual ChangeLog file.

>
> Also, if GCC is failing to diagnose the invalid comparisons here then
> that should be reported to bugzilla as a c++ "accepts-invalid" bug.



Please consider the environment and think before you print.

The University of the West of Scotland is a registered Scottish charity. 
Charity number SC002520.

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender.

Please note that any views or opinions presented in this email are solely those 
of the author and do not necessarily represent those of the University of the 
West of Scotland.

As a public body, the University of the West of Scotland may be required to 
make available emails as well as other written forms of information as a result 
of a request made under the Freedom of Information (Scotland) Act 2002.


[RFC] Replace VRP with EVRP passes

2021-10-13 Thread Andrew MacLeod via Gcc-patches
As work has progressed, we're pretty close to being able to functionally 
replace VRP with another EVRP pass.  At least it seems close enough that 
we should discuss if thats something we might want to consider for this 
release.   Replacing just one of the 2 VRP passes is another option.


First, lets examine simplifications/folds.

Running over my set of 380 GCC source files, we see the following 
results for number of cases we currently get:


Number of EVRP cases :   5789
Number of VRP1 cases :   4913
Number of VRP2 cases :   279
 combined VRP1/2:   5192

The 2 passes of VRP get a total of 5192 cases.

If we run EVRP instead of each VRP pass, we get the following results:

Number of EVRP1 cases :   5789
Number of EVRP2 cases :   7521
Number of EVRP3 cases :   2240
 combined EVRP2/3:   9761

so the EVRP passes find an additional 4569 opportunities,  or 88% more. 
This initially surprised me until it occurred to me that this is 
probably due to the pruning require for ASSERT_EXPRs, which means it 
never had a chance to see some of those cases. Notice how the second 
pass appears far more effective now.


Regarding what we would miss if we took VRP out, if we run  EVRP passes 
first then a VRP pass immediately after, we see what VRP finds that EVRP 
cannot:


Number of EVRP2 cases :  7521
Number of VRP1 cases :   11
Number of EVRP3 cases :  2269
Number of VRP2 cases :   54

I have looked at some of these, and so far they all appear to be cases 
which are solved via the iteration model VRP uses. regardless, missing 
65 cases and getting 4569 new ones would seem to be a win. I will 
continue to investigate them.


== Performance ==

The threading work has been pulled out of VRP, so we get a better idea 
of what VRPs time really is. we're showing about a 58% slowdown in VRP 
over the 2 passes.  I've begun investigating because it shouldn't be off 
by that much, Im seeing a lot of excess time being wasted with callback 
queries from the substitute_and_fold engine when processing PHIs.  It 
should be possible to bring this all back in line as that isnt the model 
ranger should be using anyway.


I figured while I'm looking into the performance side of it, maybe we 
should start talking about whether we want to replace one or both of the 
VRP passes with an EVRP instance.



I see 3 primary options:
1 - replace both VRP passes with EVRP instances.
2 - replace VRP2 with EVRP2
3 - Replace neither, leave it as is.

I figure since the second pass of VRP doesn't get a lot to start with, 
it probably doesn't make sense to replace VRP1 and not VRP2.


Option 1 is what I would expect the strategic move next release to be, 
it seems ready now, its just a matter of whether we want to give it more 
time.  It would also be trivial to turn VRP back on for one for both 
later in the cycle if we determines there was something important missing.


option 2 is something we ought to really consider if we don't want to do 
option 1.  There are a few PRs that are starting to open that have VRP 
not getting something due to the whims of more precise mutli-ranges 
being converted back to a value_range, and replacing VRP2 would allow us 
to catch those..  plus, we pick up a lot more than  VRP2 does.


I would propose we add a param, similar to what EVRP has which will 
allow us to choose which pass is called for VRP1 and VRP2 and set our 
defaults appropriately.  I wouldn't work with a hybrid like we did with 
EVRP... just choose which pass runs.     And we'll have to adjust some 
testcases based one whatever our default is.


Thoughts?

Personally I think we give option 1 a go and if something shows up over 
the next couple of months, or  we cant get performance in line with 
where we want it, then we can switch back to VRP for one or both 
passes.  I wouldn't  expect either, but one never knows :-)


If that isn't palatable for everyone, then I'd suggest option 2

Andrew








Re: [RFC PATCH 0/8] RISC-V: Bit-manipulation extension.

2021-10-13 Thread Vineet Gupta

Hi Kito,

On 9/23/21 12:57 AM, Kito Cheng wrote:

Bit manipulation extension[1] is finishing the public review and waiting for
the rest of the ratification process, I believe that will become a ratified
extension soon, so I think it's time to submit to upstream for review now :)

As the title included RFC, it's not a rush to merge to trunk yet, I would
like to merge that until it is officially ratified.

This patch set is the implementation of bit-manipulation extension, which
includes zba, zbb, zbc and zbs extension, but only included in instruction/md
pattern only, no intrinsic function implementation.

Most work is done by Jim Willson and many other contributors
on https://github.com/riscv-collab/riscv-gcc.


[1] https://github.com/riscv/riscv-bitmanip/releases/tag/1.0.0


I wanted to give these a try. Is it reasonable to apply these to a gcc 
11.1 baseline and give a spin in buildroot or do these absolutely have 
to be bleeding edge gcc.


Thx,
-Vineet


Re: [committed] libstdc++: Refactor filesystem::path encoding conversions

2021-10-13 Thread Jonathan Wakely via Gcc-patches

On 13/10/21 20:41 +0100, Jonathan Wakely wrote:

Adjust the __detail::__effective_range overloads so they always return a
string or string view using std::char_traits, because we don't care
about the traits of an incoming string.

Use std::contiguous_iterator in the __effective_range(const Source&)
overload, to allow returning a basic_string_view in more cases. For the
non-contiguous cases in both __effective_range and __string_from_range,
return a std::string instead of std::u8string when the value type of the
range is char8_t.  These changes avoid unnecessary basic_string
temporaries.


[...]


  template
inline auto
__string_from_range(_InputIterator __first, _InputIterator __last)
{
  using _EcharT
= typename std::iterator_traits<_InputIterator>::value_type;
-  static_assert(__is_encoded_char<_EcharT>);
+  static_assert(__is_encoded_char<_EcharT>); // C++17 [fs.req]/3

-#if __cpp_lib_concepts
-  constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>;
-#else
-  constexpr bool __contiguous
-   = is_pointer_v;
-#endif
-  if constexpr (__contiguous)
+  if constexpr (__is_contiguous<_InputIterator>)


Oops, this pessimiszes construction from string::iterator and
vector::iterator in C++17 mode, because the new __is_contiguous
variable template just uses is_pointer_v, without the __niter_base
call that unwraps a __normal_iterator.

That means that we now create a basic_string temporary where we
previously jsut returned a basic_string_view.

I am testing a fix.




Re: [Patch] [v3] Fortran: Fix Bind(C) Array-Descriptor Conversion (Move to Front-End Code)

2021-10-13 Thread Harald Anlauf via Gcc-patches

Hi Tobias,

Am 13.10.21 um 18:01 schrieb Tobias Burnus:

Dear all,

a minor update [→ v3].


this has become an impressive work.


I searched for XFAIL in Sandra's c-interop/ and found
two remaining true** xfails, now fixed:

- gfortran.dg/c-interop/typecodes-scalar-basic.f90
   The conversion of scalars of type(c_ptr) was mishandled;
   fixed now; the fix did run into issues converting a string_cst,
   which has also been fixed.

- gfortran.dg/c-interop/fc-descriptor-7.f90
   this one uses TRANSPOSE which did not work [now mostly* does]
   → PR fortran/101309 now also fixed.

I forgot what the exact issue for the latter was. However, when
looking at the testcase and extending it, I did run into the
following issue - and at the end the testcase does now pass.
The issue I had was that when a contiguous check was requested
(i.e. only copy in when needed) it failed to work, when
parmse->expr was (a pointer to) a descriptor. I fixed that and
now most* things work.

OK for mainline? Comments? Suggestions? More PRs which fixes
this patch? Regressions? Test results?


Doesn't break my own codes so far.

If nobody else responds within the next days, assume an OK
from my side.

This will also provide Gerhard with a new playground.  ;-)

Thanks for the patch!

Harald


Tobias

PS: I intent to commit this patch to the OG11 (devel/omp/gcc-11)
branch, in case someone wants to test it there.

PPS: Nice to have an extensive testcase suite - kudos to Sandra
in this case. I am sure Gerald will find more issues and once
it is in, I think I/we have to check some PRs + José's patches
whether for additional testcases + follow-up fixes.

(*) I write most as passing a (potentially) noncontiguous
assumed-rank array to a CONTIGUOUS assumed-rank array causes
an ICE as the scalarizer does not handle dynamic ranks alias
expr->rank == -1 / ss->dimen = -1.
I decided that that's a separate issue and filled:
https://gcc.gnu.org/PR102729
BTW, my impression is that fixing that PR might fix will solve
the trans*.c part of https://gcc.gnu.org/PR102641 - but I have
not investigated.

(**) There are still some 'xfail' in comments (outside dg-*)
whose tests now pass. And those where for two bugs in the same
statement, only one is reported - and the other only after fixing
the first one, which is fine.

On 09.10.21 23:48, Tobias Burnus wrote:

Hi all,

attached is the updated version. Changes:
* Handle noncontiguous arrays – with BIND(C), (g)Fortran needs to make it
  contiguous in the caller but also handle noncontiguous in the callee.
* Fixes/handle 'character(len=*)' with BIND(C); those always use an
  array descriptor - also with explicit-size and assumed-size arrays
* Fixed a bunch of bugs, found when writing extensive testcases.
* Fixed type(*) handling - those now pass properly type and elem_len
  on when calling a new function (bind(C) or not).

Besides adding the type itself (which is rather straight forward),
this patch only had minor modifications – and then the two big
conversion functions.

While it looks intimidating, it should be comparably simple to
review as everything is on one place and hopefully sufficiently
well documented.

OK – for mainline?  Other comments? More PRs which are fixed?
Issues not yet fixed (which are inside the scope of this patch)?

(If this patch is too long, I also have a nine-day old pending patch
at https://gcc.gnu.org/pipermail/gcc-patches/2021-October/580624.html )

Tobias

PS: The following still applies.

On 06.09.21 12:52, Tobias Burnus wrote:

gfortran's internal array descriptor (xgfc descriptor) and
the descriptor used with BIND(C) (CFI descriptor, ISO_Fortran_binding.h
of TS29113 / Fortran 2018) are different. Thus, when calling a BIND(C)
procedure the gfc descriptor has to be converted to cfi – and when a
BIND(C) procedure is implemented in Fortran, the argument has to be
converted back from CFI to gfc.

The current implementation handles part in the FE and part in
libgfortran,
but there were several issues, e.g. PR101635 failed due to alias issues,
debugging wasn't working well, uninitialized memory was used in some
cases
etc.

This patch now moves descriptor conversion handling to the FE – which
also
can make use of compile-time knowledge, useful both for diagnostic
and to
optimize the code.

Additionally:
- Some cases where TS29113 mandates that the array descriptor should be
  used now use the array descriptor, in particular character scalars
with
  'len=*' and allocatable/pointer scalars.
- While debugging the alias issue, I simplified 'select rank'. While
some
  special case is needed for assumed-shape arrays, those cannot
appear when
  the argument has the pointer or allocatable attribute. That's not
only a
  missed optimization, pointer/allocatable arrays can also be NULL -
such
  that accessing desc->dim.ubound[rank-1] can be uninitialized memory
...

OK?  Comments? Suggestions?

 * * *

For some more dumps, see the discussion about the alias issue at:

[committed] libstdc++: Refactor filesystem::path encoding conversions

2021-10-13 Thread Jonathan Wakely via Gcc-patches
Adjust the __detail::__effective_range overloads so they always return a
string or string view using std::char_traits, because we don't care
about the traits of an incoming string.

Use std::contiguous_iterator in the __effective_range(const Source&)
overload, to allow returning a basic_string_view in more cases. For the
non-contiguous cases in both __effective_range and __string_from_range,
return a std::string instead of std::u8string when the value type of the
range is char8_t.  These changes avoid unnecessary basic_string
temporaries.

Also simplify __string_from_range(Iter, Iter) to not need
std::__to_address for the contiguous case.

Combine the _S_convert(string_type) and _S_convert(const T&) overloads
into a single _S_convert(T) function which also avoids the dangling
view problem of PR 102592 (should that recur somehow).

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (__detail::__is_contiguous): New
variable template to identify contiguous iterators.
(__detail::__unified_char8_t): New alias template to decide when
to treat char8_t as char without encoding conversion.
(__detail::__effective_range(const basic_string&)): Use
std::char_traits for returned string view.
(__detail::__effective_range(const basic_string_view&)):
Likewise.
(__detail::__effective_range(const Source&)): Use
__is_contiguous to detect mode cases of contiguous iterators.
Use __unified_char8_t to return a std::string instead of
std::u8string.

Tested powerpc64le-linux. Committed to trunk.

commit b83b810ac440f72e7551b6496539e60ac30c0d8a
Author: Jonathan Wakely 
Date:   Wed Oct 13 17:19:57 2021

libstdc++: Refactor filesystem::path encoding conversions

Adjust the __detail::__effective_range overloads so they always return a
string or string view using std::char_traits, because we don't care
about the traits of an incoming string.

Use std::contiguous_iterator in the __effective_range(const Source&)
overload, to allow returning a basic_string_view in more cases. For the
non-contiguous casecl in both __effective_range and __string_from_range,
return a std::string instead of std::u8string when the value type of the
range is char8_t.  These changes avoid unnecessary basic_string
temporaries.

Also simplify __string_from_range(Iter, Iter) to not need
std::__to_address for the contiguous case.

Combine the _S_convert(string_type) and _S_convert(const T&) overloads
into a single _S_convert(T) function which also avoids the dangling
view problem of PR 102592 (should that recur somehow).

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (__detail::__is_contiguous): New
variable template to identify contiguous iterators.
(__detail::__unified_char8_t): New alias template to decide when
to treat char8_t as char without encoding conversion.
(__detail::__effective_range(const basic_string&)): Use
std::char_traits for returned string view.
(__detail::__effective_range(const basic_string_view&)):
Likewise.
(__detail::__effective_range(const Source&)): Use
__is_contiguous to detect mode cases of contiguous iterators.
Use __unified_char8_t to return a std::string instead of
std::u8string.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 7ead8ac299c..05db792fbae 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -153,33 +153,56 @@ namespace __detail
   template>
 using _Path2 = enable_if_t<__is_path_iter_src<_Tr>::value, path>;
 
+#if __cpp_lib_concepts
+  template
+constexpr bool __is_contiguous = std::contiguous_iterator<_Iter>;
+#else
+  template
+constexpr bool __is_contiguous = is_pointer_v<_Iter>;
+#endif
+
+#if !defined _GLIBCXX_FILESYSTEM_IS_WINDOWS && defined _GLIBCXX_USE_CHAR8_T
+  // For POSIX treat char8_t sequences as char without encoding conversions.
+  template
+using __unified_u8_t
+  = __conditional_t, char, _EcharT>;
+#else
+  template
+using __unified_u8_t = _EcharT;
+#endif
+
   // The __effective_range overloads convert a Source parameter into
-  // either a basic_string_view or basic_string containing the
+  // either a basic_string_view or basic_string containing the
   // effective range of the Source, as defined in [fs.path.req].
 
   template
-inline basic_string_view<_CharT, _Traits>
+inline basic_string_view<_CharT>
 __effective_range(const basic_string<_CharT, _Traits, _Alloc>& __source)
+noexcept
 { return __source; }
 
   template
-inline const basic_string_view<_CharT, _Traits>&
+inline basic_string_view<_CharT>
 __effective_range(const basic_string_view<_CharT, _Traits>& __source)
+noexcept
 { return __source; }
 
+  // Return the 

[committed] libstdc++: Fix dangling string_view in filesystem::path [PR102592]

2021-10-13 Thread Jonathan Wakely via Gcc-patches
When creating a path from a pair of non-contiguous iterators we pass the
iterators to _S_convert(Iter, Iter). That function passes the iterators
to __string_from_range to get a contiguous sequence of characters, and
then calls _S_convert(const C*, const C*) to perform the encoding
conversions. If the value type, C, is char8_t, then no conversion is
needed and the _S_convert(const char8_t*, const char8_t*)
specialization casts the pointer to const char* and returns a
std::string_view that refs to the char8_t sequence. However, that
sequence is owned by the std::u8string rvalue returned by
__string_from_range, which goes out of scope when _S_convert(Iter, Iter)
returns. That means the std::string_view is dangling and we get
undefined behaviour when parsing it as a path.

The same problem does not exist for the path members taking a "Source"
argument, because those functions all convert a non-contiguous range
into a basic_string immediately, using __effective_range(__source).
That means that the rvalue string returned by that function is still in
scope for the full expression, so the string_view does not dangle.

The solution for the buggy functions is to do the same thing, and call
__string_from_range immediately, so that the returned rvalue is still in
scope for the lifetime of the string_view returned by _S_convert. To
avoid reintroducing the same problem, remove the _S_convert(Iter, Iter)
overload that calls __string_from_range and returns a dangling view.

libstdc++-v3/ChangeLog:

PR libstdc++/102592
* include/bits/fs_path.h (path::path(Iter, Iter, format))
(path::append(Iter, Iter), path::concat(Iter, Iter)): Call
__string_from_range directly, instead of two-argument overload
of _S_convert.
(path::_S_convert(Iter, Iter)): Remove.
* testsuite/27_io/filesystem/path/construct/102592.C: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 85b24e32dc27ec2e70b853713e0713cbc1ff08c3
Author: Jonathan Wakely 
Date:   Wed Oct 13 17:02:59 2021

libstdc++: Fix dangling string_view in filesystem::path [PR102592]

When creating a path from a pair of non-contiguous iterators we pass the
iterators to _S_convert(Iter, Iter). That function passes the iterators
to __string_from_range to get a contiguous sequence of characters, and
then calls _S_convert(const C*, const C*) to perform the encoding
conversions. If the value type, C, is char8_t, then no conversion is
needed and the _S_convert(const char8_t*, const char8_t*)
specialization casts the pointer to const char* and returns a
std::string_view that refs to the char8_t sequence. However, that
sequence is owned by the std::u8string rvalue returned by
__string_from_range, which goes out of scope when _S_convert(Iter, Iter)
returns. That means the std::string_view is dangling and we get
undefined behaviour when parsing it as a path.

The same problem does not exist for the path members taking a "Source"
argument, because those functions all convert a non-contiguous range
into a basic_string immediately, using __effective_range(__source).
That means that the rvalue string returned by that function is still in
scope for the full expression, so the string_view does not dangle.

The solution for the buggy functions is to do the same thing, and call
__string_from_range immediately, so that the returned rvalue is still in
scope for the lifetime of the string_view returned by _S_convert. To
avoid reintroducing the same problem, remove the _S_convert(Iter, Iter)
overload that calls __string_from_range and returns a dangling view.

libstdc++-v3/ChangeLog:

PR libstdc++/102592
* include/bits/fs_path.h (path::path(Iter, Iter, format))
(path::append(Iter, Iter), path::concat(Iter, Iter)): Call
__string_from_range directly, instead of two-argument overload
of _S_convert.
(path::_S_convert(Iter, Iter)): Remove.
* testsuite/27_io/filesystem/path/construct/102592.C: New test.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 1918c243d74..7ead8ac299c 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -292,7 +292,7 @@ namespace __detail
 template>
   path(_InputIterator __first, _InputIterator __last, format = auto_format)
-  : _M_pathname(_S_convert(__first, __last))
+  : _M_pathname(_S_convert(__detail::__string_from_range(__first, __last)))
   { _M_split_cmpts(); }
 
 template&
   append(_InputIterator __first, _InputIterator __last)
   {
-   _M_append(_S_convert(__first, __last));
+   _M_append(_S_convert(__detail::__string_from_range(__first, __last)));
return *this;
   }
 
@@ -390,7 +390,7 @@ namespace __detail
   __detail::_Path2<_InputIterator>&
   concat(_InputIterator 

[PATCH] PR fortran/102716 - ICE in gfc_validate_kind(): Got bad kind

2021-10-13 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

another simple and obvious fix: we need to reorder the argument checks
to the SHAPE intrinsic so that invalid KIND arguments can be detected.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As I consider this a safe fix, I'd like to backport to suitable branches.

Thanks,
Harald

Fortran: fix order of checks for the SHAPE intrinsic

gcc/fortran/ChangeLog:

	PR fortran/102716
	* check.c (gfc_check_shape): Reorder checks so that invalid KIND
	arguments can be detected.

gcc/testsuite/ChangeLog:

	PR fortran/102716
	* gfortran.dg/shape_10.f90: New test.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 677209ee95e..cfaf9d26bbc 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -5086,6 +5086,13 @@ gfc_check_shape (gfc_expr *source, gfc_expr *kind)
   if (gfc_invalid_null_arg (source))
 return false;

+  if (!kind_check (kind, 1, BT_INTEGER))
+return false;
+  if (kind && !gfc_notify_std (GFC_STD_F2003, "%qs intrinsic "
+			   "with KIND argument at %L",
+			   gfc_current_intrinsic, >where))
+return false;
+
   if (source->rank == 0 || source->expr_type != EXPR_VARIABLE)
 return true;

@@ -5098,13 +5105,6 @@ gfc_check_shape (gfc_expr *source, gfc_expr *kind)
   return false;
 }

-  if (!kind_check (kind, 1, BT_INTEGER))
-return false;
-  if (kind && !gfc_notify_std (GFC_STD_F2003, "%qs intrinsic "
-			   "with KIND argument at %L",
-			   gfc_current_intrinsic, >where))
-return false;
-
   return true;
 }

diff --git a/gcc/testsuite/gfortran.dg/shape_10.f90 b/gcc/testsuite/gfortran.dg/shape_10.f90
new file mode 100644
index 000..4943c21b1d2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/shape_10.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! PR fortran/102716
+
+program p
+  integer, parameter :: a(1) = shape([2], [1]) ! { dg-error "must be a scalar" }
+end


[PATCH] PR fortran/102717 - ICE in gfc_simplify_reshape, at fortran/simplify.c:6843

2021-10-13 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

when simplifying RESHAPE we hit a gcc_assert for negative entries in the
SHAPE array.  Obvious solution: replace gcc_assert by an error message.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this is a safe fix, I'd like to backport to suitable branches.

Thanks,
Harald

Fortran: generate error	message for negative elements in SHAPE array

gcc/fortran/ChangeLog:

	PR fortran/102717
	* simplify.c (gfc_simplify_reshape): Replace assert by error
	message for negative elements in SHAPE array.

gcc/testsuite/ChangeLog:

	PR fortran/102717
	* gfortran.dg/reshape_shape_2.f90: New test.

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index f40e4930b58..d675f2c3aef 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -6840,7 +6840,13 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
   gfc_extract_int (e, [rank]);

   gcc_assert (rank >= 0 && rank < GFC_MAX_DIMENSIONS);
-  gcc_assert (shape[rank] >= 0);
+  if (shape[rank] < 0)
+	{
+	  gfc_error ("The SHAPE array for the RESHAPE intrinsic at %L has a "
+		 "negative value %d for dimension %d",
+		 _exp->where, shape[rank], rank+1);
+	  return _bad_expr;
+	}

   rank++;
 }
diff --git a/gcc/testsuite/gfortran.dg/reshape_shape_2.f90 b/gcc/testsuite/gfortran.dg/reshape_shape_2.f90
new file mode 100644
index 000..8f1757687bc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/reshape_shape_2.f90
@@ -0,0 +1,7 @@
+! { dg-do compile }
+! PR fortran/102717
+
+program p
+  integer, parameter :: a(1) = 2
+  integer, parameter :: b(2) = reshape([3,4], -[a]) ! { dg-error "negative" }
+end


[PATCH] libiberty: d-demangle: add test cases for simple special mangles

2021-10-13 Thread Luís Ferreira
Simple mangled names (only with identifiers) are not being covered by coverage
tests.

Signed-off-by: Luís Ferreira 

libiberty/ChangeLog:

* testsuite/d-demangle-expected: add test cases for simple special 
mangles
---
 libiberty/testsuite/d-demangle-expected | 8 
 1 file changed, 8 insertions(+)

diff --git a/libiberty/testsuite/d-demangle-expected 
b/libiberty/testsuite/d-demangle-expected
index 44a3649c429..ba54e5a2812 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -18,6 +18,14 @@ _Dmain
 D main
 #
 --format=dlang
+_D8demangleZ
+demangle
+#
+--format=dlang
+_D8demangle4testZ
+demangle.test
+#
+--format=dlang
 _D8demangle4testPFLAiYi
 demangle.test
 #
-- 
2.33.0



Re: [PATCH] hardened conditionals

2021-10-13 Thread Alexandre Oliva via Gcc-patches
On Oct 12, 2021, Richard Biener  wrote:

> Are there any issues with respect to debugging when using such
> asm()s?

Not in this case.  When creating short-lived copies for immediate use,
like I do in the proposed patch, either the original value remains live
in its original location and we use an actual copy, or the original
value was dead, and we'll have a stmt/insn that visibly marks it as
such, though the value actually remains there.  The newly-added compare
statements use these anonymous, temporary copies, so they're not
relevant for debug information.

Using asms could have effects on debug info and on optimizations in case
their outputs *become* the location/value of the variable, i.e., as if
in source code we did:

  asm ("" : "+g" (var));

After this optimization barrier, the compiler wouldn't know any more
anything it might have known before about the value held in the
variable.  And, if the variable is a gimple register, there would have
to be a new debug bind stmt to bind the variable to its "new" value.
(The debug machinery would assume the asm stmt modifies the value, and
the output would thus overwrite the location with a value unrelated to
the variable without the restated debug bind)


The risk for debug info of introducing such asm stmts after conversion
into SSA is that the debug binds wouldn't be added automatically, as
they are when they're present in source code.


>> Yeah, that would be another way to do it, but then it would have to be a
>> lot trickier, given all the different ways in which compare-and-branch
>> can be expressed in RTL.

> Agreed, though it would be less disturbing to the early RTL pipeline
> and RTL expansion.

Is that a concern, though?  It's not like such structures couldn't be
present in source code, after all, so the RTL machinery has to be able
to deal with them one way or another.  For someone who wishes the
compiler to introduce this hardening, the point in which it's added
seems immaterial to me, as long as it remains all the way to the end.


Now, if we had some kind of optimization barrier at the point of use, we
would be able to save a copy in some cases.  E.g., instead of:

  tmp = var;
  asm ("" : "+g" (tmp));
  if (tmp < cst) ...
  [reads from var]

we could have:

  if (__noopt (var) < cst) ...
  [reads from var]

and that would use var's value without taking an extra copy it, but also
without enabling optimizations based on knowledge about the value.

ISTM that introducing this sort of __noopt use would be a very
significant undertaking.  In RTL, a pseudo set to the original value of
var, and that eventually resolves to the same location that holds that
value, but marked in a way that prevents CSE, fwprop and whatnot (make
it volatile?) would likely get 99% of the way there, but making pseudos
to that end (and having such marks remain after reload) seems nontrivial
to me.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] libiberty: d-demangle: Add test case for function literals

2021-10-13 Thread Luís Ferreira
Coverage tests doesn't include a case for function literals

Signed-off-by: Luís Ferreira 

libiberty/ChangeLog:

* testsuite/d-demangle-expected: add test case for function literals
---
 libiberty/testsuite/d-demangle-expected | 4 
 1 file changed, 4 insertions(+)

diff --git a/libiberty/testsuite/d-demangle-expected 
b/libiberty/testsuite/d-demangle-expected
index 44a3649c429..3c3aebb06aa 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -973,6 +973,10 @@ demangle.test(char)
 _D8demangle4testFaZv
 demangle.test(char)
 #
+--format=dlang
+_D8demangle__T3abcS_DQt10__lambda13FNaNbNiNfZiZQBhFZi
+demangle.abc!(demangle.__lambda13()).abc()
+#
 # Unittests
 #
 --format=dlang
-- 
2.33.0



Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-13 Thread Segher Boessenkool
On Tue, Oct 12, 2021 at 04:57:43PM +0800, HAO CHEN GUI wrote:
> b/gcc/config/rs6000/rs6000-call.c
> index b4e13af4dc6..90527734ceb 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
>    return true;
>  /* flavors of vec_min.  */
>  case VSX_BUILTIN_XVMINDP:
> +    case ALTIVEC_BUILTIN_VMINFP:
> +  if (!flag_finite_math_only || flag_signed_zeros)
> +   return false;
> +  /* Fall through to MIN_EXPR.  */
> +  gcc_fallthrough ();
>  case P8V_BUILTIN_VMINSD:
>  case P8V_BUILTIN_VMINUD:
>  case ALTIVEC_BUILTIN_VMINSB:

"Fall though to code for MIN_EXPR"?  It suggests it is a label, as
written now.  Or don't have this comment at all, maybe?

> +/* { dg-do compile { target { powerpc*-*-* } } } */

Leave out the target clause?  Testcases in gcc.target/powerpc/ are not
run when this is not satisfied anyway, testing it twice is just more
noise.


Segher


Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-13 Thread Segher Boessenkool
On Wed, Oct 13, 2021 at 10:29:26AM +0200, Richard Biener wrote:
> On Wed, Oct 13, 2021 at 9:43 AM HAO CHEN GUI  wrote:
> >As to IEEE behavior, do you mean "Minimum and maximum operations" 
> > defined in IEEE-754 2019?  If so, I think VSX/altivec min/max instructions 
> > don't conform with it. It demands a quite NaN if either operand is a NaN 
> > while our instructions don't.
> >
> > IEEE-754 2019 maximum(x, y) is xif x>y, yif y>x, and a quiet NaN if either 
> > operand is a NaN, according to 6.2. For this operation, +0 compares greater 
> > than −0. Otherwise (i.e., when x=y and signs are the same) it is either xor 
> > y. Actions for xvmaxdp
> 
> Hmm, then I do not understand the reason for the patch - people using
> the intrinsics cannot expect IEEE semantics then.
> So you are concerned that people don't get the 1:1 machine instruction
> but eventually the IEEE conforming MIN/MAX_EXPR?

I do not know about Gimple MIN_EXPR (it is not documented?), but the
RTL "smin" is meaningless in the presence of NaNs or signed zeros.  This
is documented (in rtl.texi):

"""
@findex smin
@findex smax
@cindex signed minimum
@cindex signed maximum
@item (smin:@var{m} @var{x} @var{y})
@itemx (smax:@var{m} @var{x} @var{y})
Represents the smaller (for @code{smin}) or larger (for @code{smax}) of
@var{x} and @var{y}, interpreted as signed values in mode @var{m}.
When used with floating point, if both operands are zeros, or if either
operand is @code{NaN}, then it is unspecified which of the two operands
is returned as the result.
"""

(not exactly meaningless, okay, but not usable for almost anything).

> But that can then still happen with -ffast-math so I wonder what's the point.

That :-)


Segher


[r12-4369 Regression] FAIL: gcc.dg/torture/pr69760.c -O3 -g (test for excess errors) on Linux/x86_64

2021-10-13 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

3c0194d7ff21d61c02f3c6b111c83ef24a69e1f0 is the first bad commit
commit 3c0194d7ff21d61c02f3c6b111c83ef24a69e1f0
Author: Richard Biener 
Date:   Mon Oct 11 12:27:10 2021 +0200

tree-optimization/102659 - avoid undefined overflow after if-conversion

caused

FAIL: gcc.dg/torture/pr69760.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr69760.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/pr69760.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr69760.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.dg/torture/pr69760.c   -O3 -g  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4369/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/pr69760.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[COMMITTED] ctfc: remove redundant comma in enumerator list

2021-10-13 Thread Indu Bhagat via Gcc-patches
This also helps get rid of warning

ctfc.h:215:18: warning: comma at end of enumerator list [-Wpedantic]
   CTF_DTU_D_SLICE,

gcc/ChangeLog:

* ctfc.h (enum ctf_dtu_d_union_enum): Remove redundant comma.
---
 gcc/ctfc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index a0b7e41..701c7ea 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -212,7 +212,7 @@ enum ctf_dtu_d_union_enum {
   CTF_DTU_D_ARRAY,
   CTF_DTU_D_ENCODING,
   CTF_DTU_D_ARGUMENTS,
-  CTF_DTU_D_SLICE,
+  CTF_DTU_D_SLICE
 };
 
 enum ctf_dtu_d_union_enum
-- 
1.8.3.1



[PATH][_GLIBCXX_DEBUG] Fix unordered container merge

2021-10-13 Thread François Dumont via Gcc-patches

Hi

    libstdc++: [_GLIBCXX_DEBUG] Implement unordered container merge

    The _GLIBCXX_DEBUG unordered containers need a dedicated merge 
implementation
    so that any existing iterator on the transfered nodes is properly 
invalidated.


    Add typedef/using declaration for everything used as-is from normal 
implementation.


    libstdc++-v3/ChangeLog:

    * include/debug/safe_container.h (_Safe_container<>): Make 
all methods

    protected.
    * include/debug/safe_unordered_container.h
    (_Safe_unordered_container<>::_M_invalide_all): Make public.
    (_Safe_unordered_container<>::_M_invalide_if): Likewise.
(_Safe_unordered_container<>::_M_invalide_local_if): Likewise.
    * include/debug/unordered_map
    (unordered_map<>::mapped_type, pointer, const_pointer): New 
typedef.
    (unordered_map<>::reference, const_reference, 
difference_type): New typedef.
    (unordered_map<>::get_allocator, empty, size, max_size): 
Add usings.
    (unordered_map<>::bucket_count, max_bucket_count, bucket): 
Add usings.
    (unordered_map<>::hash_function, key_equal, count, 
contains): Add usings.

    (unordered_map<>::operator[], at, rehash, reserve): Add usings.
    (unordered_map<>::merge): New.
    (unordered_multimap<>::mapped_type, pointer, 
const_pointer): New typedef.
    (unordered_multimap<>::reference, const_reference, 
difference_type): New typedef.
    (unordered_multimap<>::get_allocator, empty, size, 
max_size): Add usings.
    (unordered_multimap<>::bucket_count, max_bucket_count, 
bucket): Add usings.
    (unordered_multimap<>::hash_function, key_equal, count, 
contains): Add usings.

    (unordered_multimap<>::rehash, reserve): Add usings.
    (unordered_multimap<>::merge): New.
    * include/debug/unordered_set
    (unordered_set<>::mapped_type, pointer, const_pointer): New 
typedef.
    (unordered_set<>::reference, const_reference, 
difference_type): New typedef.
    (unordered_set<>::get_allocator, empty, size, max_size): 
Add usings.
    (unordered_set<>::bucket_count, max_bucket_count, bucket): 
Add usings.
    (unordered_set<>::hash_function, key_equal, count, 
contains): Add usings.

    (unordered_set<>::rehash, reserve): Add usings.
    (unordered_set<>::merge): New.
    (unordered_multiset<>::mapped_type, pointer, 
const_pointer): New typedef.
    (unordered_multiset<>::reference, const_reference, 
difference_type): New typedef.
    (unordered_multiset<>::get_allocator, empty, size, 
max_size): Add usings.
    (unordered_multiset<>::bucket_count, max_bucket_count, 
bucket): Add usings.
    (unordered_multiset<>::hash_function, key_equal, count, 
contains): Add usings.

    (unordered_multiset<>::rehash, reserve): Add usings.
    (unordered_multiset<>::merge): New.
    * 
testsuite/23_containers/unordered_map/debug/merge1_neg.cc: New test.
    * 
testsuite/23_containers/unordered_map/debug/merge2_neg.cc: New test.
    * 
testsuite/23_containers/unordered_map/debug/merge3_neg.cc: New test.
    * 
testsuite/23_containers/unordered_map/debug/merge4_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multimap/debug/merge1_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multimap/debug/merge2_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multimap/debug/merge3_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multimap/debug/merge4_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multiset/debug/merge1_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multiset/debug/merge2_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multiset/debug/merge3_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multiset/debug/merge4_neg.cc: New test.
    * 
testsuite/23_containers/unordered_set/debug/merge1_neg.cc: New test.
    * 
testsuite/23_containers/unordered_set/debug/merge2_neg.cc: New test.
    * 
testsuite/23_containers/unordered_set/debug/merge3_neg.cc: New test.
    * 
testsuite/23_containers/unordered_set/debug/merge4_neg.cc: New test.
    * testsuite/util/testsuite_abi.h: [_GLIBCXX_DEBUG] Use 
normal unordered container implementation.


Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/debug/safe_container.h b/libstdc++-v3/include/debug/safe_container.h
index 97c47167fe8..5de55d69f34 100644
--- a/libstdc++-v3/include/debug/safe_container.h
+++ b/libstdc++-v3/include/debug/safe_container.h
@@ -78,7 +78,6 @@ namespace __gnu_debug
   { }
 #endif
 
-public:
   // Copy assignment invalidate all iterators.
   _Safe_container&
   operator=(const 

Re: [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations

2021-10-13 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 11, 2021 at 07:11:13PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:10PM -0500, Paul A. Clarke wrote:
> > Some compatibility implementations of x86 intrinsics include
> > Power intrinsics which require POWER8.  Guard them.
> 
> > emmintrin.h:
> > - _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
> >   but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
> >   The "POWER8" version works fine on pre-POWER8.
> 
> Huh.  It just generates xvcmpeqdp I suppose?

Yes.

> > - _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
> 
> Did this fail on p7?  If not, add a test that *does*?

Do you mean fail if not for "dg-require-effective-target p8vector_hw"?
We have that, in gcc/testsuite/gcc.target/powerpc/sse2-pmuludq-1.c.

> > pmmintrin.h:
> > - _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
> > - _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
> 
> Similar.

gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c
gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c

> > smmintrin.h:
> > - _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
> > - _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
> > - _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
> > tmmintrin.h:
> > - _mm_sign_epi8: vec_neg(v4si) uses vsububm.
> > - _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
> > - _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
> >   Note that the above three could actually be supported pre-POWER8,
> >   but current GCC does not support them before POWER8.
> > - _mm_sign_pi8: depends on _mm_sign_epi8.
> > - _mm_sign_pi16: depends on _mm_sign_epi16.
> > - _mm_sign_pi32: depends on _mm_sign_epi32.
> 
> And more.

gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c
- although this one will _actually_ fail on P7, as it only requires
"vsx_hw". I'll fix this.
gcc/testsuite/gcc.target/powerpc/ssse3-psignb.c
gcc/testsuite/gcc.target/powerpc/ssse3-psignw.c
gcc/testsuite/gcc.target/powerpc/ssse3-psignd.c

> > gcc
> > PR target/101893
> 
> This is a different bug (the vgbdd one)?

PR 101893 is the same issue: things not being properly masked by
#ifdefs.

> All looks good, but we need such failing tests :-)

Thanks for the review! Let me know what you mean by "failing tests".
("Would fail if not for ..."?)

PC


Re: [PATCH] check to see if null pointer is dereferenceable [PR102630]

2021-10-13 Thread Martin Sebor via Gcc-patches

On 10/13/21 2:25 AM, Richard Biener wrote:

On Wed, Oct 13, 2021 at 3:32 AM Martin Sebor via Gcc-patches
 wrote:


On 10/11/21 6:26 PM, Joseph Myers wrote:

The testcase uses the __seg_fs address space, which is x86-specific, but
it isn't in an x86-specific directory or otherwise restricted to x86
targets; thus, I'd expect it to fail for other architectures.

This is not a review of the rest of the patch.



Good point!  I thought I might make the test target-independent
(via macros) but it looks like just i386 defines the hook to
something other than false so I should probably move it under
i386.


The patch is OK with the testcase moved.


I changed the test, moved it under the i386 directory, and also
added -Wall to the existing addr-space-2.c, and committed
the result in r12-4376.



Note I don't think we should warn about *(int *)0xdeadbee0,

/* Pointer constants other than null are most likely the result
-of erroneous null pointer addition/subtraction.  Set size to
-zero.  For null pointers, set size to the maximum for now
-since those may be the result of jump threading.  */

there's too much "may be" and "most likely" for my taste.  How can
the user mark a deliberate valid constant address?


Using a volatile pointer works

  int* volatile p = (int*)0xdeadbee0;
  *p = 0;

but is not very elegant.  The other common solution is to make
it a variable and assign it an address in a linker script, but
that's too heavy-weight for some.

I would prefer to make AVR's attribute address generic and
encourage programmers to switch to using it to declare these
things as objects.  The major advantage is that what's at such
an address becomes a first class citizen in the type system
(and beyond), with a type and size.



Maybe we can use better (target dependent?) heuristic based on
what virtual addresses are likely unmapped (the zero page, the
page "before" the zero page)?


I agree we need something better: ideally, detect the null
pointer arithmetic before it's folded into a constant pointer
address.  I've opened pr102731 as a reminder.

Martin



Richard.



Thanks
Martin




Re: [PATCH] Allow different vector types for stmt groups

2021-10-13 Thread Martin Jambor
Hi,

On Mon, Sep 27 2021, Richard Biener via Gcc-patches wrote:
>
[...]
>
> The following is what I have pushed after re-bootstrapping and testing
> on x86_64-unknown-linux-gnu.
>
> Richard.
>
> From fc335f9fde40d7a20a1a6e38fd6f842ed93a039e Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Wed, 18 Nov 2020 09:36:57 +0100
> Subject: [PATCH] Allow different vector types for stmt groups
> To: gcc-patches@gcc.gnu.org
>
> This allows vectorization (in practice non-loop vectorization) to
> have a stmt participate in different vector type vectorizations.
> It allows us to remove vect_update_shared_vectype and replace it
> by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> vect_analyze_stmt and vect_transform_stmt.
>
> For data-ref the situation is a bit more complicated since we
> analyze alignment info with a specific vector type in mind which
> doesn't play well when that changes.
>
> So the bulk of the change is passing down the actual vector type
> used for a vectorized access to the various accessors of alignment
> info, first and foremost dr_misalignment but also aligned_access_p,
> known_alignment_for_access_p, vect_known_alignment_in_bytes and
> vect_supportable_dr_alignment.  I took the liberty to replace
> ALL_CAPS macro accessors with the lower-case function invocations.
>
> The actual changes to the behavior are in dr_misalignment which now
> is the place factoring in the negative step adjustment as well as
> handling alignment queries for a vector type with bigger alignment
> requirements than what we can (or have) analyze(d).
>
> vect_slp_analyze_node_alignment makes use of this and upon receiving
> a vector type with a bigger alingment desire re-analyzes the DR
> with respect to it but keeps an older more precise result if possible.
> In this context it might be possible to do the analysis just once
> but instead of analyzing with respect to a specific desired alignment
> look for the biggest alignment we can compute a not unknown alignment.
>
> The ChangeLog includes the functional changes but not the bulk due
> to the alignment accessor API changes - I hope that's something good.
>
> 2021-09-17  Richard Biener  
>
>   PR tree-optimization/97351
>   PR tree-optimization/97352
>   PR tree-optimization/82426
>   * tree-vectorizer.h (dr_misalignment): Add vector type
>   argument.
>   (aligned_access_p): Likewise.
>   (known_alignment_for_access_p): Likewise.
>   (vect_supportable_dr_alignment): Likewise.
>   (vect_known_alignment_in_bytes): Likewise.  Refactor.
>   (DR_MISALIGNMENT): Remove.
>   (vect_update_shared_vectype): Likewise.
>   * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
>   a vector type with larger alignment requirement and apply
>   the negative step adjustment here.
>   (vect_calculate_target_alignment): Remove.
>   (vect_compute_data_ref_alignment): Get explicit vector type
>   argument, do not apply a negative step alignment adjustment
>   here.
>   (vect_slp_analyze_node_alignment): Re-analyze alignment
>   when we re-visit the DR with a bigger desired alignment but
>   keep more precise results from smaller alignments.
>   * tree-vect-slp.c (vect_update_shared_vectype): Remove.
>   (vect_slp_analyze_node_operations_1): Do not update the
>   shared vector type on stmts.
>   * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
>   vector type of an SLP node to the representative stmt-info.
>   (vect_transform_stmt): Likewise.

I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP
433.milc bechmark when compiled with -Ofast -march=native -flto to this
commit.  See also:

  
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0=289.70.0;

I am not sure if a bugzilla bug is in order because I cannot reproduce
the regression neither on an AMD zen3 machine nor on Intel CascadeLake,
because the history of the benchmark performance and because I know milc
can be sensitive to conditions outside our control.  And the list of
dependencies of PR 26163 is long enough as it is.  OTOH, the regression
reproduces reliably for me.

Some relevant perf data:

BEFORE:
# Samples: 585K of event 'cycles:u'
# Event count (approx.): 472738682838
#
# Overhead   Samples  Command  Shared Object   Symbol
#     ...  ..  
.
# 
24.59%140397  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
u_shift_fermion
18.47%105497  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
add_force_to_mom
15.97% 96343  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
mult_su3_na
15.29% 90027  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
mult_su3_nn
 5.55% 35114  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
path_product
 4.75% 27693  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 

Re: [PATCH] libiberty: d-demangle: use distinguishable tuple()

2021-10-13 Thread Luís Ferreira
On Wed, 2021-10-13 at 16:42 +0100, Luís Ferreira wrote:
> On Wed, 2021-10-13 at 16:34 +0100, Luís Ferreira wrote:
> > Since Tuple!() is templated type from standard library, this can
> > make
> > two
> > demangled names undistinguishable.
> > 
> > Signed-off-by: Luís Ferreira 
> > 
> > libiberty/ChangeLog:
> > 
> > * d-demangle.c (dlang_parse_tuple): use tuple() instead of
> > Tuple!()
> > ---
> >  libiberty/d-demangle.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
> > index 880f2ec85a4..5dbdc36adbe 100644
> > --- a/libiberty/d-demangle.c
> > +++ b/libiberty/d-demangle.c
> > @@ -1711,7 +1711,7 @@ dlang_parse_tuple (string *decl, const char
> > *mangled, struct dlang_info *info)
> >    if (mangled == NULL)
> >  return NULL;
> >  
> > -  string_append (decl, "Tuple!(");
> > +  string_append (decl, "tuple(");
> >  
> >    while (elements--)
> >  {
> 
> I need to update tests
> 

Updated on PATCH v2.

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part


[PATCH v2] libiberty: d-demangle: use distinguishable tuple()

2021-10-13 Thread Luís Ferreira
Since Tuple!() is templated type from standard library, this can make two
demangled names undistinguishable.

Signed-off-by: Luís Ferreira 

libiberty/ChangeLog:

* d-demangle.c (dlang_parse_tuple): use tuple() instead of Tuple!()
* testsuite/d-demangle-expected: rename the tests to use tuple()
  instead of Tuple!().
---
 libiberty/d-demangle.c  |  2 +-
 libiberty/testsuite/d-demangle-expected | 12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 880f2ec85a4..5dbdc36adbe 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -1711,7 +1711,7 @@ dlang_parse_tuple (string *decl, const char *mangled, 
struct dlang_info *info)
   if (mangled == NULL)
 return NULL;
 
-  string_append (decl, "Tuple!(");
+  string_append (decl, "tuple(");
 
   while (elements--)
 {
diff --git a/libiberty/testsuite/d-demangle-expected 
b/libiberty/testsuite/d-demangle-expected
index 44a3649c429..98044ad23c5 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -367,27 +367,27 @@ demangle.test(char, char)
 #
 --format=dlang
 _D8demangle4testFB0Zv
-demangle.test(Tuple!())
+demangle.test(tuple())
 #
 --format=dlang
 _D8demangle4testFB1aZv
-demangle.test(Tuple!(char))
+demangle.test(tuple(char))
 #
 --format=dlang
 _D8demangle4testFB2aaZv
-demangle.test(Tuple!(char, char))
+demangle.test(tuple(char, char))
 #
 --format=dlang
 _D8demangle4testFB3aaaZv
-demangle.test(Tuple!(char, char, char))
+demangle.test(tuple(char, char, char))
 #
 --format=dlang
 _D8demangle4testFB2OaaZv
-demangle.test(Tuple!(shared(char), char))
+demangle.test(tuple(shared(char), char))
 #
 --format=dlang
 _D8demangle4testFB3aDFZaaZv
-demangle.test(Tuple!(char, char() delegate, char))
+demangle.test(tuple(char, char() delegate, char))
 #
 --format=dlang
 _D8demangle4testFDFZaZv
-- 
2.33.0



[committed] hppa: Add support for 32-bit hppa targets in muldi3 expander

2021-10-13 Thread John David Anglin

This patches patch allows inlining 64-bit hardware multiplication on 32-bit 
hppa targets
instead of using __muldi3 from libgcc.  This should improve performance at the 
expense of
a slight increase in code size.

We need this because I am testing a change to build libgcc with software float 
and integer
multiplication.

Tested on hppa2.0w-hp-hpux11.11, hppa64-hp-hpux11.11 and 
hppa-unknown-linux-gnu.  Committed to
all active branches.

Dave
---

Add support for 32-bit hppa targets in muldi3 expander

2021-10-13  John David Anglin  

gcc/ChangeLog:

* config/pa/pa.md (muldi3): Add support for inlining 64-bit
multiplication on 32-bit PA 1.1 and 2.0 targets.

diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index b314f96de35..10623dd6fdb 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -5374,32 +5374,38 @@
   [(set (match_operand:DI 0 "register_operand" "")
 (mult:DI (match_operand:DI 1 "register_operand" "")
 (match_operand:DI 2 "register_operand" "")))]
-  "TARGET_64BIT && ! TARGET_DISABLE_FPREGS && ! TARGET_SOFT_FLOAT"
+  "! optimize_size
+   && TARGET_PA_11
+   && ! TARGET_DISABLE_FPREGS
+   && ! TARGET_SOFT_FLOAT"
   "
 {
   rtx low_product = gen_reg_rtx (DImode);
   rtx cross_product1 = gen_reg_rtx (DImode);
   rtx cross_product2 = gen_reg_rtx (DImode);
-  rtx cross_scratch = gen_reg_rtx (DImode);
-  rtx cross_product = gen_reg_rtx (DImode);
   rtx op1l, op1r, op2l, op2r;
-  rtx op1shifted, op2shifted;
-
-  op1shifted = gen_reg_rtx (DImode);
-  op2shifted = gen_reg_rtx (DImode);
-  op1l = gen_reg_rtx (SImode);
-  op1r = gen_reg_rtx (SImode);
-  op2l = gen_reg_rtx (SImode);
-  op2r = gen_reg_rtx (SImode);
-
-  emit_move_insn (op1shifted, gen_rtx_LSHIFTRT (DImode, operands[1],
-   GEN_INT (32)));
-  emit_move_insn (op2shifted, gen_rtx_LSHIFTRT (DImode, operands[2],
-   GEN_INT (32)));
-  op1r = force_reg (SImode, gen_rtx_SUBREG (SImode, operands[1], 4));
-  op2r = force_reg (SImode, gen_rtx_SUBREG (SImode, operands[2], 4));
-  op1l = force_reg (SImode, gen_rtx_SUBREG (SImode, op1shifted, 4));
-  op2l = force_reg (SImode, gen_rtx_SUBREG (SImode, op2shifted, 4));
+
+  if (TARGET_64BIT)
+{
+  rtx op1shifted = gen_reg_rtx (DImode);
+  rtx op2shifted = gen_reg_rtx (DImode);
+
+  emit_move_insn (op1shifted, gen_rtx_LSHIFTRT (DImode, operands[1],
+   GEN_INT (32)));
+  emit_move_insn (op2shifted, gen_rtx_LSHIFTRT (DImode, operands[2],
+   GEN_INT (32)));
+  op1r = force_reg (SImode, gen_rtx_SUBREG (SImode, operands[1], 4));
+  op2r = force_reg (SImode, gen_rtx_SUBREG (SImode, operands[2], 4));
+  op1l = force_reg (SImode, gen_rtx_SUBREG (SImode, op1shifted, 4));
+  op2l = force_reg (SImode, gen_rtx_SUBREG (SImode, op2shifted, 4));
+}
+  else
+{
+  op1r = force_reg (SImode, gen_lowpart (SImode, operands[1]));
+  op2r = force_reg (SImode, gen_lowpart (SImode, operands[2]));
+  op1l = force_reg (SImode, gen_highpart (SImode, operands[1]));
+  op2l = force_reg (SImode, gen_highpart (SImode, operands[2]));
+}

   /* Emit multiplies for the cross products.  */
   emit_insn (gen_umulsidi3 (cross_product1, op2r, op1l));
@@ -5408,13 +5414,35 @@
   /* Emit a multiply for the low sub-word.  */
   emit_insn (gen_umulsidi3 (low_product, copy_rtx (op2r), copy_rtx (op1r)));

-  /* Sum the cross products and shift them into proper position.  */
-  emit_insn (gen_adddi3 (cross_scratch, cross_product1, cross_product2));
-  emit_insn (gen_ashldi3 (cross_product, cross_scratch, GEN_INT (32)));
+  if (TARGET_64BIT)
+{
+  rtx cross_scratch = gen_reg_rtx (DImode);
+  rtx cross_product = gen_reg_rtx (DImode);

-  /* Add the cross product to the low product and store the result
- into the output operand .  */
-  emit_insn (gen_adddi3 (operands[0], cross_product, low_product));
+  /* Sum the cross products and shift them into proper position.  */
+  emit_insn (gen_adddi3 (cross_scratch, cross_product1, cross_product2));
+  emit_insn (gen_ashldi3 (cross_product, cross_scratch, GEN_INT (32)));
+
+  /* Add the cross product to the low product and store the result
+into the output operand .  */
+  emit_insn (gen_adddi3 (operands[0], cross_product, low_product));
+}
+  else
+{
+  rtx cross_scratch = gen_reg_rtx (SImode);
+
+  /* Sum cross products.  */
+  emit_move_insn (cross_scratch,
+ gen_rtx_PLUS (SImode,
+   gen_lowpart (SImode, cross_product1),
+   gen_lowpart (SImode, cross_product2)));
+  emit_move_insn (gen_lowpart (SImode, operands[0]),
+ gen_lowpart (SImode, low_product));
+  emit_move_insn (gen_highpart (SImode, operands[0]),
+ 

[Patch] [v3] Fortran: Fix Bind(C) Array-Descriptor Conversion (Move to Front-End Code)

2021-10-13 Thread Tobias Burnus

Dear all,

a minor update [→ v3].

I searched for XFAIL in Sandra's c-interop/ and found
two remaining true** xfails, now fixed:

- gfortran.dg/c-interop/typecodes-scalar-basic.f90
  The conversion of scalars of type(c_ptr) was mishandled;
  fixed now; the fix did run into issues converting a string_cst,
  which has also been fixed.

- gfortran.dg/c-interop/fc-descriptor-7.f90
  this one uses TRANSPOSE which did not work [now mostly* does]
  → PR fortran/101309 now also fixed.

I forgot what the exact issue for the latter was. However, when
looking at the testcase and extending it, I did run into the
following issue - and at the end the testcase does now pass.
The issue I had was that when a contiguous check was requested
(i.e. only copy in when needed) it failed to work, when
parmse->expr was (a pointer to) a descriptor. I fixed that and
now most* things work.

OK for mainline? Comments? Suggestions? More PRs which fixes
this patch? Regressions? Test results?

Tobias

PS: I intent to commit this patch to the OG11 (devel/omp/gcc-11)
branch, in case someone wants to test it there.

PPS: Nice to have an extensive testcase suite - kudos to Sandra
in this case. I am sure Gerald will find more issues and once
it is in, I think I/we have to check some PRs + José's patches
whether for additional testcases + follow-up fixes.

(*) I write most as passing a (potentially) noncontiguous
assumed-rank array to a CONTIGUOUS assumed-rank array causes
an ICE as the scalarizer does not handle dynamic ranks alias
expr->rank == -1 / ss->dimen = -1.
I decided that that's a separate issue and filled:
https://gcc.gnu.org/PR102729
BTW, my impression is that fixing that PR might fix will solve
the trans*.c part of https://gcc.gnu.org/PR102641 - but I have
not investigated.

(**) There are still some 'xfail' in comments (outside dg-*)
whose tests now pass. And those where for two bugs in the same
statement, only one is reported - and the other only after fixing
the first one, which is fine.

On 09.10.21 23:48, Tobias Burnus wrote:

Hi all,

attached is the updated version. Changes:
* Handle noncontiguous arrays – with BIND(C), (g)Fortran needs to make it
  contiguous in the caller but also handle noncontiguous in the callee.
* Fixes/handle 'character(len=*)' with BIND(C); those always use an
  array descriptor - also with explicit-size and assumed-size arrays
* Fixed a bunch of bugs, found when writing extensive testcases.
* Fixed type(*) handling - those now pass properly type and elem_len
  on when calling a new function (bind(C) or not).

Besides adding the type itself (which is rather straight forward),
this patch only had minor modifications – and then the two big
conversion functions.

While it looks intimidating, it should be comparably simple to
review as everything is on one place and hopefully sufficiently
well documented.

OK – for mainline?  Other comments? More PRs which are fixed?
Issues not yet fixed (which are inside the scope of this patch)?

(If this patch is too long, I also have a nine-day old pending patch
at https://gcc.gnu.org/pipermail/gcc-patches/2021-October/580624.html )

Tobias

PS: The following still applies.

On 06.09.21 12:52, Tobias Burnus wrote:

gfortran's internal array descriptor (xgfc descriptor) and
the descriptor used with BIND(C) (CFI descriptor, ISO_Fortran_binding.h
of TS29113 / Fortran 2018) are different. Thus, when calling a BIND(C)
procedure the gfc descriptor has to be converted to cfi – and when a
BIND(C) procedure is implemented in Fortran, the argument has to be
converted back from CFI to gfc.

The current implementation handles part in the FE and part in
libgfortran,
but there were several issues, e.g. PR101635 failed due to alias issues,
debugging wasn't working well, uninitialized memory was used in some
cases
etc.

This patch now moves descriptor conversion handling to the FE – which
also
can make use of compile-time knowledge, useful both for diagnostic
and to
optimize the code.

Additionally:
- Some cases where TS29113 mandates that the array descriptor should be
  used now use the array descriptor, in particular character scalars
with
  'len=*' and allocatable/pointer scalars.
- While debugging the alias issue, I simplified 'select rank'. While
some
  special case is needed for assumed-shape arrays, those cannot
appear when
  the argument has the pointer or allocatable attribute. That's not
only a
  missed optimization, pointer/allocatable arrays can also be NULL -
such
  that accessing desc->dim.ubound[rank-1] can be uninitialized memory
...

OK?  Comments? Suggestions?

 * * *

For some more dumps, see the discussion about the alias issue at:
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578364.html
("[RFH] ME optimizes variable assignment away / Fortran bind(C)
descriptor conversion")
plus the original emails:
- https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578271.html
- and (correct dump)
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578274.html

[PATCH] libiberty: d-demangle: write distinguishable variadics on demangled symbol

2021-10-13 Thread Luís Ferreira
Currently _D8demangle4testFYv and _D8demangle4testFXv report the same demangled
symbol and they are not the same. The official demangler reports
"demangle.test(, ...)", which is the distinguishable way to do it.

Signed-off-by: Luís Ferreira 

libiberty/ChangeLog:

* d-demangle.c (dlang_function_args): change Y variadic to always
  report ", ...".
* testsuite/d-demangle-expected: change test for Y variadic and add a
  missing test for X variadic.
---
 libiberty/d-demangle.c  | 4 +---
 libiberty/testsuite/d-demangle-expected | 4 
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 880f2ec85a4..4ec94316dad 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -690,9 +690,7 @@ dlang_function_args (string *decl, const char *mangled, 
struct dlang_info *info)
  return mangled;
case 'Y': /* (variadic T t, ...) style.  */
  mangled++;
- if (n != 0)
-   string_append (decl, ", ");
- string_append (decl, "...");
+ string_append (decl, ", ...");
  return mangled;
case 'Z': /* Normal function.  */
  mangled++;
diff --git a/libiberty/testsuite/d-demangle-expected 
b/libiberty/testsuite/d-demangle-expected
index 44a3649c429..ec481d27dbe 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -359,6 +359,10 @@ demangle.test(char, char, ...)
 #
 --format=dlang
 _D8demangle4testFYv
+demangle.test(, ...)
+#
+--format=dlang
+_D8demangle4testFXv
 demangle.test(...)
 #
 --format=dlang
-- 
2.33.0



RE: [arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]

2021-10-13 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Wednesday, October 13, 2021 2:09 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: Christophe Lyon 
> Subject: Re: [arm] Fix MVE addressing modes for VLDR[BHW] and
> VSTR[BHW]
> 
> 
> On 13/10/2021 13:37, Kyrylo Tkachov wrote:
> > Hi Andre,
> >
> >
> > @@ -24276,7 +24271,7 @@ arm_print_operand (FILE *stream, rtx x, int
> code)
> > else if (code == POST_MODIFY || code == PRE_MODIFY)
> >   {
> > asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
> > -   postinc_reg = XEXP ( XEXP (x, 1), 1);
> > +   postinc_reg = XEXP (XEXP (addr, 1), 1);
> > if (postinc_reg && CONST_INT_P (postinc_reg))
> >   {
> > if (code == POST_MODIFY)
> >
> > this looks like a bug fix that should be separately backported to the
> branches?
> > Otherwise, the patch looks ok for trunk to me.
> > Thanks,
> > Kyrill
> >
> Normally I'd agree with you, but this is specific for the 'E' handling,
> which is MVE only and I am pretty sure the existing code would never
> accept POST/PRE Modify codes so this issue will never trigger before my
> patch.So I'm not sure it's useful to backport a bugfix for a bug that
> won't trigger, unless we also backport the entire patch, but I suspect
> we don't want to do that?

Hmmm I see your reasoning, but it looks like the code there currently is either 
dead or just plain wrong.
I think unless we can guarantee that autoinc modes cannot be generated on the 
branches we should fix it, since the fix is a straightforward one.
The branches are not frozen close to release so the risk is low IMO.

So could you please test this hunk separately on the branches as well (and 
apply it to branches after some time on trunk if you'd like to wait for it to 
bake there).
Thanks,
Kyrill




Re: [PATCH] libiberty: d-demangle: use distinguishable tuple()

2021-10-13 Thread Luís Ferreira
On Wed, 2021-10-13 at 16:34 +0100, Luís Ferreira wrote:
> Since Tuple!() is templated type from standard library, this can make
> two
> demangled names undistinguishable.
> 
> Signed-off-by: Luís Ferreira 
> 
> libiberty/ChangeLog:
> 
> * d-demangle.c (dlang_parse_tuple): use tuple() instead of
> Tuple!()
> ---
>  libiberty/d-demangle.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
> index 880f2ec85a4..5dbdc36adbe 100644
> --- a/libiberty/d-demangle.c
> +++ b/libiberty/d-demangle.c
> @@ -1711,7 +1711,7 @@ dlang_parse_tuple (string *decl, const char
> *mangled, struct dlang_info *info)
>    if (mangled == NULL)
>  return NULL;
>  
> -  string_append (decl, "Tuple!(");
> +  string_append (decl, "tuple(");
>  
>    while (elements--)
>  {

I need to update tests

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part


[PATCH] libiberty: d-demangle: use distinguishable tuple()

2021-10-13 Thread Luís Ferreira
Since Tuple!() is templated type from standard library, this can make two
demangled names undistinguishable.

Signed-off-by: Luís Ferreira 

libiberty/ChangeLog:

* d-demangle.c (dlang_parse_tuple): use tuple() instead of Tuple!()
---
 libiberty/d-demangle.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 880f2ec85a4..5dbdc36adbe 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -1711,7 +1711,7 @@ dlang_parse_tuple (string *decl, const char *mangled, 
struct dlang_info *info)
   if (mangled == NULL)
 return NULL;
 
-  string_append (decl, "Tuple!(");
+  string_append (decl, "tuple(");
 
   while (elements--)
 {
-- 
2.33.0



Re: [PATCH, rs6000] Optimization for vec_xl_sext

2021-10-13 Thread David Edelsohn via Gcc-patches
>> gcc/
>> * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin):
>> Modify the expansion for sign extension. All extentions are done
>> within VSX resgisters.
>
> Two typos here:  extentions => extensions, resgisters => registers.

This is okay with Bill's comments addressed.

Thanks, David


Re: [PATCH, rs6000] punish reload of lfiwzx when loading an int variable [PR102169, PR102146]

2021-10-13 Thread David Edelsohn via Gcc-patches
>   The patch punishes reload of alternative pair of "d, Z" for 
> movsi_internal1. The reload occurs if 'Z' doesn't match and generates an 
> additional insn. So the memory reload should be punished.
>
>   Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
>
> ChangeLog
>
> 2021-09-29 Haochen Gui 
>
> gcc/
> * gcc/config/rs6000/rs6000.md (movsi_internal1): disparages
> slightly the alternative 'Z' of "lfiwzx" when reload is needed.

Capitalize "D" of disparages.

Should this disparage "stfiwzx" also?  Carl Love saw poor code
generation at -O0 for a trivial example where a stack store moved the
value to an FPR and used stfiwx.

Thanks, David


Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-13 Thread David Edelsohn via Gcc-patches
2021-08-25 Haochen Gui 

gcc/
 * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
 Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
 VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

Please write something more than "modify".  The ChangeLog should be
more like the email subject line for this patch.

gcc/testsuite/
 * gcc.target/powerpc/vec-minmax-1.c: New test.
 * gcc.target/powerpc/vec-minmax-2.c: Likewise.

Please ensure that the indentation is correct for the case statements;
it was unclear from the text pasted into the email.

Okay with those clarifications.

Thanks, David


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-13 Thread Michael Matz via Gcc-patches
Hello,

[this is the fourth attempt to write a comment/review/opinion for this 
ao_ref-in-tcc_reference, please accept some possible incoherence]

On Tue, 12 Oct 2021, Richard Biener via Gcc-patches wrote:

> This prototype hack introduces a new tcc_reference TREE_AOREFWRAP
> which we can use to wrap a reference tree, recording the ao_ref
> associated with it.  That comes in handy when trying to optimize
> the constant factor involved with alias stmt walking (or alias
> queries in general) where there's parts that are liner in the
> reference expression complexity, namely get_ref_base_and_extent,
> which shows up usually high on profiles.

So, generally I like storing things into the IL that are impossible to 
(re-)discover.  Remembering things that are merely slowish to rediscover 
is less clear, the increase of IL memory use, and potentially the 
necessary pointer chasing might just trade one clearly measurable slow 
point (the rediscover computation) with many slow points all over the 
place (the pointer chasing/cache effects).  Here ...

> The following patch is minimal as to make tree-ssa.exp=ssa-fre-*
> not ICE and make the testcases from PR28071 and PR39326 compile
> successfully at -O1 (both testcases show a moderately high
> load on alias stmt walking around 25%, resp. 34%).  With the
> patch which makes use of the cache only from stmt_may_clobber_ref_p
> for now the compile-time improves by 7%, resp. 19% which means
> overall the idea might be worth pursuing.

... you seem to have a point, though.  Also, I am of the opinion that our 
gimple memrefs could be even fatter (and also encode things like 
multi-dimensional accesses, either right from the source code or 
discovered by index analysis), and your idea goes into that direction.  
So, yay for encoding memref properties into the IL, even though here it's 
only a cache.  You solved the necessary invalidation already.  Perhaps 
only partly, that will be seen once exposed to the wild.

So, the only question seems to be how to encode it: either by ...

> transparent by instead of wrapping the refs with another tree

... this (wrapping) ...

> to reallocate the outermost handled_component_p (and only those),

... or that (aggregation).  There is a third possibility if you want this 
only in the gimple world (which is the case): encode it not in the trees 
but in the gimple statements.  This sort of works easily for everything 
except calls.  I will not consider this variant, nor the side table 
implementation.

While writing this email I switched between more liking one or the other, 
multiple times.  So, I'll write down some basic facts/requirements:

1) You basically want to add stuff to an existing structure:
(a) by wrapping: to work seemlessly the outer tree should have similar 
enough properties to the inner tree (e.g. also be tcc_reference) to be 
used interchangably in most code, except that which needs to look at 
the added stuff.
(b) by aggregating the stuff into the existing structure itself: if you 
need both structs (with and without stuff) the pure thing to do is to 
actually create two structs, once with, once without stuff.
2) the added stuff is optional
3) we have multiple things (all tcc_reference) to which to add stuff
4) all tcc_reference are tree_exp, which is variable number of operands,
   which constrain things we can do naturally (e.g. we can't add stuff 
   after tree_exp, except by constraining the number of operands)

Considering this it seems that aggregation is worse: you basically double 
the number of structure types (at least conceptually, if you go with your 
bit-idea).  So, some idea of wrapping seems more natural.

(I think your idea of aggregation but going with a bit flag to indicate if 
this tcc_reference is or isn't annotated, and therefore has things 
allocated after the variable number of operands, is a terrible hack)

There is another possibility doing something like your bit-flag 
aggregation but with fewer hackery: if ao_ref would be a tree it could be 
a normal operand of a tree_exp (and hence tcc_reference), just that the 
number of operands then would vary depending on if it's annotated or not.

Making ao_ref into a tree would also enable the use of ANNOTATE_EXPR for a 
generic wrapping tree.  (Currently its used only in very specific cases, 
so ANNOTATE_EXPR handling would need to be extended all over, and as it's 
not a tcc_reference it would probably rather mean to introduce a new 
ANNOTATE_REF).

Anyway, with this:

struct tree_ref_annotation {
  struct tree_base base;
  struct ao_ref ao_ref;
};

DEFTREECODE(TREE_MEM_ANNO, "mem_anno", tcc_exceptional, 0);

you could then add

DEFTREECODE(MEM_REF_A, "mem_ref_a", tcc_reference, 3);

where TREE_OPERAND(memref, 2) would then be a TREE_MEM_ANNO.  If we were 
to add one operand slot to each tcc_reference we could even do without new 
tree codes: the existence of an ao_ref would simply be indicated by 
TREE_OPERAND(ref, position) 

Re: [PATCH] rs6000/test: Adjust some cases due to O2 vect [PR102658]

2021-10-13 Thread Martin Sebor via Gcc-patches

On 10/13/21 1:43 AM, Kewen.Lin wrote:

on 2021/10/13 下午2:29, Hongtao Liu via Gcc-patches wrote:

On Wed, Oct 13, 2021 at 11:34 AM Hongtao Liu  wrote:


On Tue, Oct 12, 2021 at 11:49 PM Martin Sebor  wrote:


On 10/11/21 8:31 PM, Hongtao Liu wrote:

On Tue, Oct 12, 2021 at 4:08 AM Martin Sebor via Gcc-patches
 wrote:


On 10/11/21 11:43 AM, Segher Boessenkool wrote:

On Mon, Oct 11, 2021 at 10:23:03AM -0600, Martin Sebor wrote:

On 10/11/21 9:30 AM, Segher Boessenkool wrote:

On Mon, Oct 11, 2021 at 10:47:00AM +0800, Kewen.Lin wrote:

- For generic test cases, it follows the existing suggested
practice with necessary target/xfail selector.


Not such a great choice.  Many of those tests do not make sense with
vectorisation enabled.  This should have been thought about, in some
cases resulting in not running the test with vectorisation enabled, and
in some cases duplicating the test, once with and once without
vectorisation.


The tests detect bugs that are present both with and without
vetctorization, so they should pass both ways.


Then it should be tested both ways!  This is my point.


Agreed.  (Most warnings are tested with just one set of options,
but it's becoming apparent that the middle end ones should be
exercised more extensively.)




That they don't
tells us that that the warnings need work (they were written with
an assumption that doesn't hold anymore).


They were written in world A.  In world B many things behave
differently.  Transplanting the testcases from A to B without any extra
analysis will not test what the testcases wanted to test, and possibly
nothing at all anymore.


Absolutely.




We need to track that
work somehow, but simply xfailing them without making a record
of what underlying problem the xfails correspond to isn't the best
way.  In my experience, what works well is opening a bug for each
distinct limitation (if one doesn't already exist) and adding
a reference to it as a comment to the xfail.


Probably, yes.


But you are just following established practice, so :-)


I also am okay with this.  If it was decided x86 does not have to deal
with these (generic!) problems, then why should we do other people's
work?


I don't know that anything was decided.  I think those changes
were made in haste, and (as you noted in your review of these
updates to them), were incomplete (missing comments referencing
the underlying bugs or limitations).  Now that we've noticed it
we should try to fix it.  I'm not expecting you (or Kwen) to do
other people's work, but it would help to let them/us know that
there is work for us to do.  I only noticed the problem by luck.


-  struct A1 a = { 0, { 1 } };   // { dg-warning
"\\\[-Wstringop-overflow" "" { target { i?86-*-* x86_64-*-* } } }
+  struct A1 a = { 0, { 1 } };   // { dg-warning
"\\\[-Wstringop-overflow" "" { target { i?86-*-* x86_64-*-* powerpc*-*-*
} } }


As I mentioned in the bug, when adding xfails for regressions
please be sure to reference the bug that tracks the underlying
root cause.]


You are saying this to whoever added that x86 xfail I hope.


In general it's an appeal to both authors and reviewers of such
changes.  Here, it's mostly for Hongtao who apparently added all
these undocumented xfails.


There may be multiple problems, and we need to
identify what it is in each instance.  As the author of
the tests I can help with that but not if I'm not in the loop
on these changes (it would seem prudent to get the author's
thoughts on such sweeping changes to their work).


Yup.


I discussed one of these failures with Hongtao in detail at
the time autovectorization was being enabled and made the same
request then but I didn't realize the problem was so pervasive.

In addition, the target-specific conditionals in the xfails are
going to be difficult to maintain.


It is a cop-out.  Especially because it makes no comment why it is
xfailed (which should *always* be explained!)


It might be okay for one or
two in a single test but for so many we need a better solution
than that.  If autovectorization is only enabled for a subset
of targets then a solution might be to add a new DejagGNU test
for it and conditionalize the xfails on it.


That, combined with duplicating these tests and still testing the
-fno-vectorization situation properly.  Those tests tested something.
With vectorisation enabled they might no longer test that same thing,
especially if the test fails now!


Right.  The original autovectorization change was made either
without a full analysis of its impact on the affected warnings,
or its impact wasn't adequately captured (either in the xfails
comments or by opening bugs for them).  Now that we know about
this we should try to fix it.  The first step toward that is
to review the xfailed test cases and for each add a comment with
the bug that captures its root cause.

Hongtao, please let me know if you are going to work on that.

I will make a copy of the tests to test the -fno-tree-vectorize
scenario(the 

[PATCH] ipa-sra: Improve debug info for removed parameters (PR 93385)

2021-10-13 Thread Martin Jambor
Hi,

in spring I added code eliminating any statements using parameters
removed by IPA passes (to fix PR 93385).  That patch fixed issues such
as divisions by zero that such code could perform but it only reset
all affected debug bind statements, this one updates them with
expressions which can allow the debugger to print the removed value -
see the added test-case for an example.

Even though I originally did not want to create DEBUG_EXPR_DECLs for
intermediate values, I ended up doing so, because otherwise the code
started creating statements like

   # DEBUG __aD.198693 => [(const struct _Alloc_nodeD.171110 
*)D#195]._M_tD.184726->_M_implD.171154

which not only is a bit scary but also gimple-fold ICEs on
it. Therefore I decided they are probably quite necessary.

The patch simply notes each removed SSA name present in a debug
statement and then works from it backwards, looking if it can
reconstruct the expression it represents (which can fail if a
non-degenerate PHI node is in the way).  If it can, it populates two
hash maps with those expressions so that 1) removed assignments are
replaced with a debug bind defining a new intermediate debug_decl_expr
and 2) existing debug binds that refer to SSA names that are bing
removed now refer to corresponding debug_decl_exprs.

If a removed parameter is passed to another function, the debugging
information still cannot describe its value there - see the xfailed
test in the testcase.  I sort of know what needs to be done but that
needs a little bit more of IPA infrastructure on top of this patch and
so I would like to get this patch reviewed first.

Bootstrapped and tested on x86_64-linux, i686-linux and (long time
ago) on aarch64-linux.  Also LTO-bootstrapped and on x86_64-linux.

Perhaps it is good to go to trunk?

Thanks,

Martin

gcc/ChangeLog:

2021-03-29  Martin Jambor  

PR ipa/93385
* ipa-param-manipulation.h (class ipa_param_body_adjustments): New
members remap_with_debug_expressions, m_dead_ssa_debug_equiv,
m_dead_stmt_debug_equiv and prepare_debug_expressions.  Added
parameter to mark_dead_statements.
* ipa-param-manipulation.c: Include tree-phinodes.h and cfgexpand.h.
(ipa_param_body_adjustments::mark_dead_statements): New parameter
debugstack, push into it all SSA names used in debug statements,
produce m_dead_ssa_debug_equiv mapping for the removed param.
(replace_with_mapped_expr): New function.
(ipa_param_body_adjustments::remap_with_debug_expressions): Likewise.
(ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
(ipa_param_body_adjustments::common_initialization): Gather and
procecc SSA which will be removed but are in debug statements. Simplify.
(ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
new members.
* tree-inline.c (remap_gimple_stmt): Create a debug bind when possible
when avoiding a copy of an unnecessary statement.  Remap removed SSA
names in existing debug statements.
(tree_function_versioning): Do not create DEBUG_EXPR_DECL for removed
parameters if we have already done so.

gcc/testsuite/ChangeLog:

2021-03-29  Martin Jambor  

PR ipa/93385
* gcc.dg/guality/ipa-sra-1.c: New test.
---
 gcc/ipa-param-manipulation.c | 280 ++-
 gcc/ipa-param-manipulation.h |  12 +-
 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c |  45 
 gcc/tree-inline.c|  45 ++--
 4 files changed, 305 insertions(+), 77 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/ipa-sra-1.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 26b02d7aa95..c84d669521c 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "symtab-clones.h"
+#include "tree-phinodes.h"
+#include "cfgexpand.h"
 
 
 /* Actual prefixes of different newly synthetized parameters.  Keep in sync
@@ -972,10 +974,12 @@ ipa_param_body_adjustments::carry_over_param (tree t)
 
 /* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without
any replacement or splitting.  REPL is the replacement VAR_SECL to base any
-   remaining uses of a removed parameter on.  */
+   remaining uses of a removed parameter on.  Push all removed SSA names that
+   are used within debug statements to DEBUGSTACK.  */
 
 void
-ipa_param_body_adjustments::mark_dead_statements (tree dead_param)
+ipa_param_body_adjustments::mark_dead_statements (tree dead_param,
+ vec *debugstack)
 {
   /* Current IPA analyses which remove unused parameters never remove a
  non-gimple register ones which have any use except as parameters in other
@@ -987,6 +991,7 @@ 

Re: [PATCH] collect2: Fix missing cleanups.

2021-10-13 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 4:00 PM Iain Sandoe  wrote:
>
> The code that checks to see if objects have LTO content via
> simple-object was not releasing resources, fixed thus.
>
> tested on x86_64, powerpc64le linux, powerpc-aix, i686,x86_64-darwin,
> OK for master and backports?

OK.

Richard.

> thanks
> Iain
>
> Signed-off-by: Iain Sandoe 
>
> gcc/ChangeLog:
>
> * collect2.c (is_lto_object_file): Release simple-object
> resources, close files.
> ---
>  gcc/collect2.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/collect2.c b/gcc/collect2.c
> index cf04a58ba4d..d47fe3f9195 100644
> --- a/gcc/collect2.c
> +++ b/gcc/collect2.c
> @@ -2299,10 +2299,15 @@ is_lto_object_file (const char *prog_name)
> LTO_SEGMENT_NAME,
> , );
>if (!inobj)
> -return false;
> +{
> +  close (infd);
> +  return false;
> +}
>
>errmsg = simple_object_find_sections (inobj, has_lto_section,
> (void *) , );
> +  simple_object_release_read (inobj);
> +  close (infd);
>if (! errmsg && found)
>  return true;
>
> --
> 2.24.3 (Apple Git-128)
>


[PATCH] collect2: Fix missing cleanups.

2021-10-13 Thread Iain Sandoe via Gcc-patches
The code that checks to see if objects have LTO content via
simple-object was not releasing resources, fixed thus.

tested on x86_64, powerpc64le linux, powerpc-aix, i686,x86_64-darwin,
OK for master and backports?
thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* collect2.c (is_lto_object_file): Release simple-object
resources, close files.
---
 gcc/collect2.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index cf04a58ba4d..d47fe3f9195 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -2299,10 +2299,15 @@ is_lto_object_file (const char *prog_name)
LTO_SEGMENT_NAME,
, );
   if (!inobj)
-return false;
+{
+  close (infd);
+  return false;
+}
 
   errmsg = simple_object_find_sections (inobj, has_lto_section,
(void *) , );
+  simple_object_release_read (inobj);
+  close (infd);
   if (! errmsg && found)
 return true;
 
-- 
2.24.3 (Apple Git-128)



Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-13 Thread H.J. Lu via Gcc-patches
On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
 wrote:
>
> On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  wrote:
> >
> > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> >  wrote:
> > >
> > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  wrote:
> > > >
> > > > Change in the v2 patch:
> > > >
> > > > 1. Disable static trampolines by default.
> > > >
> > > >
> > > > GCC maintained a copy of libffi snapshot from 2009 and cherry-picked 
> > > > fixes
> > > > from upstream over the last 10+ years.  In the meantime, libffi upstream
> > > > has been changed significantly with new features, bug fixes and new 
> > > > target
> > > > support.  Here is a set of patches to sync with libffi 3.4.2 release and
> > > > make it easier to sync with libffi upstream:
> > > >
> > > > 1. Document how to sync with upstream.
> > > > 2. Add scripts to help sync with upstream.
> > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It is availale at
> > > >
> > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > 4. Integrate libffi build and testsuite with GCC.
> > >
> > > How did you test this?  It looks like libgo is the only consumer of
> > > libffi these days.
> > > In particular go/libgo seems to be supported on almost all targets besides
> > > darwin/windows - did you test cross and canadian configurations?
> >
> > I only tested it on Linux/i686 and Linux/x86-64.   My understanding is that
> > the upstream libffi works on Darwin and Windows.
> >
> > > I applaud the attempt to sync to upsteam but I fear you won't get any 
> > > "review"
> > > of this massive diff.
> >
> > I believe that it should just work.  Our libffi is very much out of date.
>
> Yes, you can hope.  And yes, our libffi is out of date.
>
> Can you please do the extra step to test one weird architecture, namely
> powerpc64-aix which is available on the compile-farm?

I will give it a try and report back.

> If that goes well I think it's good to "hope" at this point (and plenty of
> time to fix fallout until the GCC 12 release).
>
> Thus OK after the extra testing dance and waiting until early next
> week so others can throw in a veto.
>
> Thanks,
> Richard.
>
> > > I suppose the SONAME changes after the sync?
> >
> > Yes, SONAME is synced with upstream which was updated.
> >
> > > Thanks,
> > > Richard.
> > >
> > > > H.J. Lu (4):
> > > >   libffi: Add HOWTO_MERGE, autogen.sh and merge.sh
> > > >   libffi: Sync with libffi 3.4.2
> > > >   libffi: Integrate build with GCC
> > > >   libffi: Integrate testsuite with GCC testsuite
> > > >
> > > >  libffi/.gitattributes |4 +
> > > >  libffi/ChangeLog.libffi   | 7743 -
> > > >  libffi/HOWTO_MERGE|   13 +
> > > >  libffi/LICENSE|2 +-
> > > >  libffi/LICENSE-BUILDTOOLS |  353 +
> > > >  libffi/MERGE  |4 +
> > > >  libffi/Makefile.am|  135 +-
> > > >  libffi/Makefile.in|  219 +-
> > > >  libffi/README |  450 -
> > > >  libffi/README.md  |  495 ++
> > > >  libffi/acinclude.m4   |   38 +-
> > > >  libffi/autogen.sh |   11 +
> > > >  libffi/configure  |  487 +-
> > > >  libffi/configure.ac   |   91 +-
> > > >  libffi/configure.host |   97 +-
> > > >  libffi/doc/Makefile.am|3 +
> > > >  libffi/doc/libffi.texi|  382 +-
> > > >  libffi/doc/version.texi   |8 +-
> > > >  libffi/fficonfig.h.in |   21 +-
> > > >  libffi/generate-darwin-source-and-headers.py  |  143 +-
> > > >  libffi/include/Makefile.am|2 +-
> > > >  libffi/include/Makefile.in|3 +-
> > > >  libffi/include/ffi.h.in   |  213 +-
> > > >  libffi/include/ffi_cfi.h  |   21 +
> > > >  libffi/include/ffi_common.h   |   50 +-
> > > >  libffi/include/tramp.h|   45 +
> > > >  libffi/libffi.map.in  |   24 +-
> > > >  libffi/libffi.pc.in   |2 +-
> > > >  libffi/libffi.xcodeproj/project.pbxproj   |  530 +-
> > > >  libffi/libtool-version|   25 +-
> > > >  libffi/man/Makefile.in|1 +
> > > >  libffi/mdate-sh   |   39 +-
> > > >  libffi/merge.sh   |   51 +
> > > >  libffi/msvcc.sh   |  134 +-
> > > >  libffi/src/aarch64/ffi.c  |  536 +-
> > > >  libffi/src/aarch64/ffitarget.h|   35 +-
> > > >  libffi/src/aarch64/internal.h |   33 +
> > > >  

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-13 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 3:12 PM Martin Liška  wrote:
>
> On 10/13/21 14:50, Richard Biener wrote:
> > It does, yes.  But that's a ^ with flag_var_tracking_assignments_toggle;)
> >
> > It's also one of the more weird flags, so it could be applied after the
> > otherwise single set of flag_var_tracking_assignments ...
>
> Well, it's far from being simple.
> Can we please make a step and install the patch I sent? I mean the latest
> that does the removal of AUTODETECT_VALUE.

But parts of the patch are not obvious and you've not explained why you
remove all Init(AUTODETECT_VALUE) but for flag_var_tracking you
change it to Init(1).  I count 4 assignments to flag_var_tracking in toplev.c
and one in nvptx.c and c6x.c each.

  if (flag_var_tracking_uninit == AUTODETECT_VALUE)
flag_var_tracking_uninit = flag_var_tracking;

can probably be handled by EnabledBy, but then we also have

  if (flag_var_tracking_uninit == 1)
flag_var_tracking = 1;

which suggests the same the other way around.  I guess
positional handling might differ with say
-fvar-tracking -fno-var-tracking-uninit vs. -fno-var-tracking-uninit
-fvar-tracking
when using EnabledBy vs. the "explicit" code.

+  else if (!OPTION_SET_P (flag_var_tracking) && flag_var_tracking)
 flag_var_tracking = optimize >= 1;

I think _this_ should, instead of the Init(1), become an entry in
default_options_table with OPT_LEVELS_1_PLUS.

As said, besides flag_var_* the posted patch looks OK and is good to
commit.

Richard.

>
> Martin


Re: [PATCH] gcov: make profile merging smarter

2021-10-13 Thread Martin Liška

On 10/11/21 16:05, Martin Liška wrote:

May I install the patch now?


Pushed to master, I guess we can tweak documentation in the future
if needed.

Martin


Re: [PATCH] options: Fix variable tracking option processing.

2021-10-13 Thread Martin Liška

On 10/13/21 14:50, Richard Biener wrote:

It does, yes.  But that's a ^ with flag_var_tracking_assignments_toggle;)

It's also one of the more weird flags, so it could be applied after the
otherwise single set of flag_var_tracking_assignments ...


Well, it's far from being simple.
Can we please make a step and install the patch I sent? I mean the latest
that does the removal of AUTODETECT_VALUE.

Martin


Re: [arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]

2021-10-13 Thread Andre Vieira (lists) via Gcc-patches



On 13/10/2021 13:37, Kyrylo Tkachov wrote:

Hi Andre,


@@ -24276,7 +24271,7 @@ arm_print_operand (FILE *stream, rtx x, int code)
else if (code == POST_MODIFY || code == PRE_MODIFY)
  {
asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
-   postinc_reg = XEXP ( XEXP (x, 1), 1);
+   postinc_reg = XEXP (XEXP (addr, 1), 1);
if (postinc_reg && CONST_INT_P (postinc_reg))
  {
if (code == POST_MODIFY)

this looks like a bug fix that should be separately backported to the branches?
Otherwise, the patch looks ok for trunk to me.
Thanks,
Kyrill

Normally I'd agree with you, but this is specific for the 'E' handling, 
which is MVE only and I am pretty sure the existing code would never 
accept POST/PRE Modify codes so this issue will never trigger before my 
patch.So I'm not sure it's useful to backport a bugfix for a bug that 
won't trigger, unless we also backport the entire patch, but I suspect 
we don't want to do that?




Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-13 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  wrote:
>
> On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
>  wrote:
> >
> > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  wrote:
> > >
> > > Change in the v2 patch:
> > >
> > > 1. Disable static trampolines by default.
> > >
> > >
> > > GCC maintained a copy of libffi snapshot from 2009 and cherry-picked fixes
> > > from upstream over the last 10+ years.  In the meantime, libffi upstream
> > > has been changed significantly with new features, bug fixes and new target
> > > support.  Here is a set of patches to sync with libffi 3.4.2 release and
> > > make it easier to sync with libffi upstream:
> > >
> > > 1. Document how to sync with upstream.
> > > 2. Add scripts to help sync with upstream.
> > > 3. Sync with libffi 3.4.2. This patch is quite big.  It is availale at
> > >
> > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > 4. Integrate libffi build and testsuite with GCC.
> >
> > How did you test this?  It looks like libgo is the only consumer of
> > libffi these days.
> > In particular go/libgo seems to be supported on almost all targets besides
> > darwin/windows - did you test cross and canadian configurations?
>
> I only tested it on Linux/i686 and Linux/x86-64.   My understanding is that
> the upstream libffi works on Darwin and Windows.
>
> > I applaud the attempt to sync to upsteam but I fear you won't get any 
> > "review"
> > of this massive diff.
>
> I believe that it should just work.  Our libffi is very much out of date.

Yes, you can hope.  And yes, our libffi is out of date.

Can you please do the extra step to test one weird architecture, namely
powerpc64-aix which is available on the compile-farm?

If that goes well I think it's good to "hope" at this point (and plenty of
time to fix fallout until the GCC 12 release).

Thus OK after the extra testing dance and waiting until early next
week so others can throw in a veto.

Thanks,
Richard.

> > I suppose the SONAME changes after the sync?
>
> Yes, SONAME is synced with upstream which was updated.
>
> > Thanks,
> > Richard.
> >
> > > H.J. Lu (4):
> > >   libffi: Add HOWTO_MERGE, autogen.sh and merge.sh
> > >   libffi: Sync with libffi 3.4.2
> > >   libffi: Integrate build with GCC
> > >   libffi: Integrate testsuite with GCC testsuite
> > >
> > >  libffi/.gitattributes |4 +
> > >  libffi/ChangeLog.libffi   | 7743 -
> > >  libffi/HOWTO_MERGE|   13 +
> > >  libffi/LICENSE|2 +-
> > >  libffi/LICENSE-BUILDTOOLS |  353 +
> > >  libffi/MERGE  |4 +
> > >  libffi/Makefile.am|  135 +-
> > >  libffi/Makefile.in|  219 +-
> > >  libffi/README |  450 -
> > >  libffi/README.md  |  495 ++
> > >  libffi/acinclude.m4   |   38 +-
> > >  libffi/autogen.sh |   11 +
> > >  libffi/configure  |  487 +-
> > >  libffi/configure.ac   |   91 +-
> > >  libffi/configure.host |   97 +-
> > >  libffi/doc/Makefile.am|3 +
> > >  libffi/doc/libffi.texi|  382 +-
> > >  libffi/doc/version.texi   |8 +-
> > >  libffi/fficonfig.h.in |   21 +-
> > >  libffi/generate-darwin-source-and-headers.py  |  143 +-
> > >  libffi/include/Makefile.am|2 +-
> > >  libffi/include/Makefile.in|3 +-
> > >  libffi/include/ffi.h.in   |  213 +-
> > >  libffi/include/ffi_cfi.h  |   21 +
> > >  libffi/include/ffi_common.h   |   50 +-
> > >  libffi/include/tramp.h|   45 +
> > >  libffi/libffi.map.in  |   24 +-
> > >  libffi/libffi.pc.in   |2 +-
> > >  libffi/libffi.xcodeproj/project.pbxproj   |  530 +-
> > >  libffi/libtool-version|   25 +-
> > >  libffi/man/Makefile.in|1 +
> > >  libffi/mdate-sh   |   39 +-
> > >  libffi/merge.sh   |   51 +
> > >  libffi/msvcc.sh   |  134 +-
> > >  libffi/src/aarch64/ffi.c  |  536 +-
> > >  libffi/src/aarch64/ffitarget.h|   35 +-
> > >  libffi/src/aarch64/internal.h |   33 +
> > >  libffi/src/aarch64/sysv.S |  189 +-
> > >  libffi/src/aarch64/win64_armasm.S |  506 ++
> > >  libffi/src/alpha/ffi.c|6 +-
> > >  libffi/src/arc/ffi.c  |6 +-
> > >  libffi/src/arm/ffi.c  |  380 +-
> > >  

Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-13 Thread H.J. Lu via Gcc-patches
On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
 wrote:
>
> On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  wrote:
> >
> > Change in the v2 patch:
> >
> > 1. Disable static trampolines by default.
> >
> >
> > GCC maintained a copy of libffi snapshot from 2009 and cherry-picked fixes
> > from upstream over the last 10+ years.  In the meantime, libffi upstream
> > has been changed significantly with new features, bug fixes and new target
> > support.  Here is a set of patches to sync with libffi 3.4.2 release and
> > make it easier to sync with libffi upstream:
> >
> > 1. Document how to sync with upstream.
> > 2. Add scripts to help sync with upstream.
> > 3. Sync with libffi 3.4.2. This patch is quite big.  It is availale at
> >
> > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > 4. Integrate libffi build and testsuite with GCC.
>
> How did you test this?  It looks like libgo is the only consumer of
> libffi these days.
> In particular go/libgo seems to be supported on almost all targets besides
> darwin/windows - did you test cross and canadian configurations?

I only tested it on Linux/i686 and Linux/x86-64.   My understanding is that
the upstream libffi works on Darwin and Windows.

> I applaud the attempt to sync to upsteam but I fear you won't get any "review"
> of this massive diff.

I believe that it should just work.  Our libffi is very much out of date.

> I suppose the SONAME changes after the sync?

Yes, SONAME is synced with upstream which was updated.

> Thanks,
> Richard.
>
> > H.J. Lu (4):
> >   libffi: Add HOWTO_MERGE, autogen.sh and merge.sh
> >   libffi: Sync with libffi 3.4.2
> >   libffi: Integrate build with GCC
> >   libffi: Integrate testsuite with GCC testsuite
> >
> >  libffi/.gitattributes |4 +
> >  libffi/ChangeLog.libffi   | 7743 -
> >  libffi/HOWTO_MERGE|   13 +
> >  libffi/LICENSE|2 +-
> >  libffi/LICENSE-BUILDTOOLS |  353 +
> >  libffi/MERGE  |4 +
> >  libffi/Makefile.am|  135 +-
> >  libffi/Makefile.in|  219 +-
> >  libffi/README |  450 -
> >  libffi/README.md  |  495 ++
> >  libffi/acinclude.m4   |   38 +-
> >  libffi/autogen.sh |   11 +
> >  libffi/configure  |  487 +-
> >  libffi/configure.ac   |   91 +-
> >  libffi/configure.host |   97 +-
> >  libffi/doc/Makefile.am|3 +
> >  libffi/doc/libffi.texi|  382 +-
> >  libffi/doc/version.texi   |8 +-
> >  libffi/fficonfig.h.in |   21 +-
> >  libffi/generate-darwin-source-and-headers.py  |  143 +-
> >  libffi/include/Makefile.am|2 +-
> >  libffi/include/Makefile.in|3 +-
> >  libffi/include/ffi.h.in   |  213 +-
> >  libffi/include/ffi_cfi.h  |   21 +
> >  libffi/include/ffi_common.h   |   50 +-
> >  libffi/include/tramp.h|   45 +
> >  libffi/libffi.map.in  |   24 +-
> >  libffi/libffi.pc.in   |2 +-
> >  libffi/libffi.xcodeproj/project.pbxproj   |  530 +-
> >  libffi/libtool-version|   25 +-
> >  libffi/man/Makefile.in|1 +
> >  libffi/mdate-sh   |   39 +-
> >  libffi/merge.sh   |   51 +
> >  libffi/msvcc.sh   |  134 +-
> >  libffi/src/aarch64/ffi.c  |  536 +-
> >  libffi/src/aarch64/ffitarget.h|   35 +-
> >  libffi/src/aarch64/internal.h |   33 +
> >  libffi/src/aarch64/sysv.S |  189 +-
> >  libffi/src/aarch64/win64_armasm.S |  506 ++
> >  libffi/src/alpha/ffi.c|6 +-
> >  libffi/src/arc/ffi.c  |6 +-
> >  libffi/src/arm/ffi.c  |  380 +-
> >  libffi/src/arm/ffitarget.h|   24 +-
> >  libffi/src/arm/internal.h |   10 +
> >  libffi/src/arm/sysv.S |  304 +-
> >  libffi/src/arm/sysv_msvc_arm32.S  |  311 +
> >  libffi/src/closures.c |  489 +-
> >  libffi/src/cris/ffi.c |4 +-
> >  libffi/src/csky/ffi.c |  395 +
> >  libffi/src/csky/ffitarget.h   |   63 +
> >  libffi/src/csky/sysv.S|  371 +
> >  libffi/src/dlmalloc.c |7 +-
> >  libffi/src/frv/ffi.c  |4 +-
> >  

RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-10-13 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, October 13, 2021 12:06 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Richard Sandiford
> 
> Subject: RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2
> 
> >
> > Hmmm these patterns are identical in what they match they just have the
> > effect of printing operands 1 and 2 in a different order.
> > Perhaps it's more compact to change the output template into a
> > BYTES_BIG_ENDIAN ?
> > "uzp1\\t%0., %1., %2."" :
> > uzp1\\t%0., %2., %1."
> > and avoid having a second at all?
> >
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Sorry I should have noticed earlier but...
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
8dbeed3b0d4a44cdc17dd333ed397b39a33f386a..95b385c0c9405fe95fcd07262a9471ab13d5488e
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -270,6 +270,14 @@ (define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI])
 ;; Advanced SIMD modes for H, S and D types.
 (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
 
+;; Modes for which we can narrow the element and increase the lane counts
+;; to preserve the same register size.
+(define_mode_attr VNARROWSIMD [(V4HI "V8QI") (V8HI "V16QI") (V4SI "V8HI")
+  (V2SI "V4HI") (V2DI "V4SI")])
+
+(define_mode_attr Vnarrowsimd [(V4HI "v8qi") (V8HI "v16qi") (V4SI "v8hi")
+  (V2SI "v4hi") (V2DI "v4si")])
+

These attributes are not needed it seems.
So patch is ok without this hunk.
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md
> (*aarch64_narrow_trunc): New.
>   * config/aarch64/iterators.md (VNARROWSIMD, Vnarrowsimd):
> New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/narrow_high_combine.c: Update case.
>   * gcc.target/aarch64/xtn-combine-1.c: New test.
>   * gcc.target/aarch64/xtn-combine-2.c: New test.
>   * gcc.target/aarch64/xtn-combine-3.c: New test.
>   * gcc.target/aarch64/xtn-combine-4.c: New test.
>   * gcc.target/aarch64/xtn-combine-5.c: New test.
>   * gcc.target/aarch64/xtn-combine-6.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> 0b340b49fa06684b80d0b78cb712e49328ca92d5..b0dda554466149817a7828
> dbf4e0ed372a91872b 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1753,6 +1753,23 @@ (define_expand "aarch64_xtn2"
>}
>  )
> 
> +(define_insn "*aarch64_narrow_trunc"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +  (truncate:
> +(match_operand:VQN 1 "register_operand" "w"))
> +   (truncate:
> + (match_operand:VQN 2 "register_operand" "w"]
> +  "TARGET_SIMD"
> +{
> +  if (!BYTES_BIG_ENDIAN)
> +return "uzp1\\t%0., %1., %2.";
> +  else
> +return "uzp1\\t%0., %2., %1.";
> +}
> +  [(set_attr "type" "neon_permute")]
> +)
> +
>  ;; Packing doubles.
> 
>  (define_expand "vec_pack_trunc_"
> diff --git a/gcc/config/aarch64/iterators.md
> b/gcc/config/aarch64/iterators.md
> index
> 8dbeed3b0d4a44cdc17dd333ed397b39a33f386a..95b385c0c9405fe95fcd072
> 62a9471ab13d5488e 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -270,6 +270,14 @@ (define_mode_iterator VDQHS [V4HI V8HI V2SI
> V4SI])
>  ;; Advanced SIMD modes for H, S and D types.
>  (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
> 
> +;; Modes for which we can narrow the element and increase the lane counts
> +;; to preserve the same register size.
> +(define_mode_attr VNARROWSIMD [(V4HI "V8QI") (V8HI "V16QI") (V4SI
> "V8HI")
> +(V2SI "V4HI") (V2DI "V4SI")])
> +
> +(define_mode_attr Vnarrowsimd [(V4HI "v8qi") (V8HI "v16qi") (V4SI "v8hi")
> +(V2SI "v4hi") (V2DI "v4si")])
> +
>  ;; Advanced SIMD and scalar integer modes for H and S.
>  (define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI])
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
> b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
> index
> 50ecab002a3552d37a5cc0d8921f42f6c3dba195..fa61196d3644caa48b12151e
> 12b15dfeab8c7e71 100644
> --- a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
> +++ b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
> @@ -225,7 +225,8 @@ TEST_2_UNARY (vqmovun, uint32x4_t, int64x2_t,
> s64, u32)
>  /* { dg-final { scan-assembler-times "\\tuqshrn2\\tv" 6} }  */
>  /* { dg-final { scan-assembler-times "\\tsqrshrn2\\tv" 6} }  */
>  /* { dg-final { scan-assembler-times "\\tuqrshrn2\\tv" 6} }  */
> -/* { dg-final { scan-assembler-times "\\txtn2\\tv" 12} }  */
> +/* { dg-final { scan-assembler-times "\\txtn2\\tv" 6} }  */
> +/* { dg-final { scan-assembler-times 

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-13 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 1:59 PM Martin Liška  wrote:
>
> On 10/13/21 10:47, Richard Biener wrote:
> > Let's split this;)   The debug_inline_points part is OK.
>
> Fine.
>
> >
> > How can debug_variable_location_views be ever -1?  But the
> > debug_variable_location_views part looks OK as well.
>
> It comes from here:
> gvariable-location-views=incompat5
> Common Driver RejectNegative Var(debug_variable_location_views, -1) Init(2)
>
> but it's fine as using -gvariable-location-views=incompat5 leads to
> OPTION_SET_P(debug_variable_location_views) == true.
>
> >
> > More or less all parts that have the variable assigned in a single
> > place in gcc/ are OK (dwarf2out_as_locview_support).  But the
> > main flag_var_tracking* cases need more thorough view,
> > maybe we can convert them to single-set code first?
>
> I don't think so, your have code like
>
>if (flag_var_tracking_assignments_toggle)
> flag_var_tracking_assignments = !flag_var_tracking_assignments;
>
> which makes it more complicated. Or do I miss something?

It does, yes.  But that's a ^ with flag_var_tracking_assignments_toggle ;)

It's also one of the more weird flags, so it could be applied after the
otherwise single set of flag_var_tracking_assignments ...

Richard.

>
> Cheers,
> Martin


Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-13 Thread Richard Biener via Gcc-patches
On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  wrote:
>
> Change in the v2 patch:
>
> 1. Disable static trampolines by default.
>
>
> GCC maintained a copy of libffi snapshot from 2009 and cherry-picked fixes
> from upstream over the last 10+ years.  In the meantime, libffi upstream
> has been changed significantly with new features, bug fixes and new target
> support.  Here is a set of patches to sync with libffi 3.4.2 release and
> make it easier to sync with libffi upstream:
>
> 1. Document how to sync with upstream.
> 2. Add scripts to help sync with upstream.
> 3. Sync with libffi 3.4.2. This patch is quite big.  It is availale at
>
> https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> 4. Integrate libffi build and testsuite with GCC.

How did you test this?  It looks like libgo is the only consumer of
libffi these days.
In particular go/libgo seems to be supported on almost all targets besides
darwin/windows - did you test cross and canadian configurations?

I applaud the attempt to sync to upsteam but I fear you won't get any "review"
of this massive diff.

I suppose the SONAME changes after the sync?

Thanks,
Richard.

> H.J. Lu (4):
>   libffi: Add HOWTO_MERGE, autogen.sh and merge.sh
>   libffi: Sync with libffi 3.4.2
>   libffi: Integrate build with GCC
>   libffi: Integrate testsuite with GCC testsuite
>
>  libffi/.gitattributes |4 +
>  libffi/ChangeLog.libffi   | 7743 -
>  libffi/HOWTO_MERGE|   13 +
>  libffi/LICENSE|2 +-
>  libffi/LICENSE-BUILDTOOLS |  353 +
>  libffi/MERGE  |4 +
>  libffi/Makefile.am|  135 +-
>  libffi/Makefile.in|  219 +-
>  libffi/README |  450 -
>  libffi/README.md  |  495 ++
>  libffi/acinclude.m4   |   38 +-
>  libffi/autogen.sh |   11 +
>  libffi/configure  |  487 +-
>  libffi/configure.ac   |   91 +-
>  libffi/configure.host |   97 +-
>  libffi/doc/Makefile.am|3 +
>  libffi/doc/libffi.texi|  382 +-
>  libffi/doc/version.texi   |8 +-
>  libffi/fficonfig.h.in |   21 +-
>  libffi/generate-darwin-source-and-headers.py  |  143 +-
>  libffi/include/Makefile.am|2 +-
>  libffi/include/Makefile.in|3 +-
>  libffi/include/ffi.h.in   |  213 +-
>  libffi/include/ffi_cfi.h  |   21 +
>  libffi/include/ffi_common.h   |   50 +-
>  libffi/include/tramp.h|   45 +
>  libffi/libffi.map.in  |   24 +-
>  libffi/libffi.pc.in   |2 +-
>  libffi/libffi.xcodeproj/project.pbxproj   |  530 +-
>  libffi/libtool-version|   25 +-
>  libffi/man/Makefile.in|1 +
>  libffi/mdate-sh   |   39 +-
>  libffi/merge.sh   |   51 +
>  libffi/msvcc.sh   |  134 +-
>  libffi/src/aarch64/ffi.c  |  536 +-
>  libffi/src/aarch64/ffitarget.h|   35 +-
>  libffi/src/aarch64/internal.h |   33 +
>  libffi/src/aarch64/sysv.S |  189 +-
>  libffi/src/aarch64/win64_armasm.S |  506 ++
>  libffi/src/alpha/ffi.c|6 +-
>  libffi/src/arc/ffi.c  |6 +-
>  libffi/src/arm/ffi.c  |  380 +-
>  libffi/src/arm/ffitarget.h|   24 +-
>  libffi/src/arm/internal.h |   10 +
>  libffi/src/arm/sysv.S |  304 +-
>  libffi/src/arm/sysv_msvc_arm32.S  |  311 +
>  libffi/src/closures.c |  489 +-
>  libffi/src/cris/ffi.c |4 +-
>  libffi/src/csky/ffi.c |  395 +
>  libffi/src/csky/ffitarget.h   |   63 +
>  libffi/src/csky/sysv.S|  371 +
>  libffi/src/dlmalloc.c |7 +-
>  libffi/src/frv/ffi.c  |4 +-
>  libffi/src/ia64/ffi.c |   30 +-
>  libffi/src/ia64/ffitarget.h   |3 +-
>  libffi/src/ia64/unix.S|9 +-
>  libffi/src/java_raw_api.c |6 +-
>  libffi/src/kvx/asm.h  |5 +
>  libffi/src/kvx/ffi.c  |  273 +
>  libffi/src/kvx/ffitarget.h|   75 +
>  libffi/src/kvx/sysv.S |  127 +
>  libffi/src/m32r/ffi.c

RE: [arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]

2021-10-13 Thread Kyrylo Tkachov via Gcc-patches
Hi Andre,

> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Tuesday, October 12, 2021 5:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Christophe Lyon
> 
> Subject: [arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]
> 
> Hi,
> 
> The way we were previously dealing with addressing modes for MVE was
> preventing
> the use of pre, post and offset addressing modes for the normal loads and
> stores, including widening and narrowing.  This patch fixes that and
> adds tests to ensure we are capable of using all the available addressing
> modes.
> 
> gcc/ChangeLog:
> 2021-10-12  Andre Vieira  
> 
>      * config/arm/arm.c (thumb2_legitimate_address_p): Use
> VALID_MVE_MODE
>      when checking mve addressing modes.
>      (mve_vector_mem_operand): Fix the way we handle pre, post and
> offset
>      addressing modes.
>      (arm_print_operand): Fix printing of POST_ and PRE_MODIFY.

@@ -24276,7 +24271,7 @@ arm_print_operand (FILE *stream, rtx x, int code)
else if (code == POST_MODIFY || code == PRE_MODIFY)
  {
asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
-   postinc_reg = XEXP ( XEXP (x, 1), 1);
+   postinc_reg = XEXP (XEXP (addr, 1), 1);
if (postinc_reg && CONST_INT_P (postinc_reg))
  {
if (code == POST_MODIFY)

this looks like a bug fix that should be separately backported to the branches?
Otherwise, the patch looks ok for trunk to me.
Thanks,
Kyrill


>      * config/arm/mve.md: Use mve_memory_operand predicate
> everywhere where
>      there is a single Ux constraint.
> 
> gcc/testsuite/ChangeLog:
> 2021-10-12  Andre Vieira  
> 
>      * gcc.target/arm/mve/mve.exp: Make it test main directory.
>      * gcc.target/arm/mve/mve_load_memory_modes.c: New test.
>      * gcc.target/arm/mve/mve_store_memory_modes.c: New test.


Re: [PATCH v4] Improve integer bit test on __atomic_fetch_[or|and]_* returns

2021-10-13 Thread Richard Biener via Gcc-patches
On Sun, Oct 10, 2021 at 3:49 PM H.J. Lu  wrote:
>
> Changes in v4:
>
> 1. Bypass redundant check when inputs have been transformed to the
> equivalent canonical form with valid bit operation.
>
> Changes in v3:
>
> 1.  Check invalid bit operation.
>
> commit adedd5c173388ae505470df152b9cb3947339566
> Author: Jakub Jelinek 
> Date:   Tue May 3 13:37:25 2016 +0200
>
> re PR target/49244 (__sync or __atomic builtins will not emit 'lock 
> bts/btr/btc')
>
> optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns
> with lock bts/btr/btc by turning
>
>   mask_2 = 1 << cnt_1;
>   _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
>   _5 = _4 & mask_2;
>
> into
>
>   _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3);
>   _5 = _4;
>
> and
>
>   mask_6 = 1 << bit_5(D);
>   _1 = ~mask_6;
>   _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
>   _3 = _2 & mask_6;
>   _4 = _3 != 0;
>
> into
>
>   mask_6 = 1 << bit_5(D);
>   _1 = ~mask_6;
>   _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0);
>   _4 = _11 != 0;
>
> But it failed to optimize many equivalent, but slighly different cases:
>
> 1.
>   _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
>   _4 = (_Bool) _1;
> 2.
>   _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
>   _4 = (_Bool) _1;
> 3.
>   _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
>   _7 = ~_1;
>   _5 = (_Bool) _7;
> 4.
>   _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
>   _7 = ~_1;
>   _5 = (_Bool) _7;
> 5.
>   _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
>   _2 = (int) _1;
>   _7 = ~_2;
>   _5 = (_Bool) _7;
> 6.
>   _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
>   _2 = (int) _1;
>   _7 = ~_2;
>   _5 = (_Bool) _7;
> 7.
>   _1 = _atomic_fetch_or_4 (ptr_6, mask, _3);
>   _2 = (int) _1;
>   _5 = _2 & mask;
> 8.
>   _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
>   _5 = (signed int) _1;
>   _4 = _5 < 0;
> 9.
>   _1 = __atomic_fetch_and_4 (ptr_6, 0x7fff, _3);
>   _5 = (signed int) _1;
>   _4 = _5 < 0;
> 10.
>   _1 = 1 << bit_4(D);
>   mask_5 = (unsigned int) _1;
>   _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0);
>   _3 = _2 & mask_5;
> 11.
>   mask_7 = 1 << bit_6(D);
>   _1 = ~mask_7;
>   _2 = (unsigned int) _1;
>   _3 = __atomic_fetch_and_4 (v_9(D), _2, 0);
>   _4 = (int) _3;
>   _5 = _4 & mask_7;
>
> We make
>
>   mask_2 = 1 << cnt_1;
>   _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
>   _5 = _4 & mask_2;
>
> and
>
>   mask_6 = 1 << bit_5(D);
>   _1 = ~mask_6;
>   _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
>   _3 = _2 & mask_6;
>   _4 = _3 != 0;
>
> the canonical forms for this optimization and transform cases 1-9 to the
> equivalent canonical form.  For cases 10 and 11, we simply remove the cast
> before __atomic_fetch_or_4/__atomic_fetch_and_4 with
>
>   _1 = 1 << bit_4(D);
>   _2 = __atomic_fetch_or_4 (v_7(D), _1, 0);
>   _3 = _2 & _1;
>
> and
>
>   mask_7 = 1 << bit_6(D);
>   _1 = ~mask_7;
>   _3 = __atomic_fetch_and_4 (v_9(D), _1, 0);
>   _6 = _3 & mask_7;
>   _5 = (int) _6;
>
> gcc/
>
> PR middle-end/102566
> * tree-ssa-ccp.c (convert_atomic_bit_not): New function.
> (optimize_atomic_bit_test_and): Transform equivalent, but slighly
> different cases to their canonical forms.
>
> gcc/testsuite/
>
> PR middle-end/102566
> * g++.target/i386/pr102566-1.C: New test.
> * g++.target/i386/pr102566-2.C: Likewise.
> * g++.target/i386/pr102566-3.C: Likewise.
> * g++.target/i386/pr102566-4.C: Likewise.
> * g++.target/i386/pr102566-5a.C: Likewise.
> * g++.target/i386/pr102566-5b.C: Likewise.
> * g++.target/i386/pr102566-6a.C: Likewise.
> * g++.target/i386/pr102566-6b.C: Likewise.
> * gcc.target/i386/pr102566-1a.c: Likewise.
> * gcc.target/i386/pr102566-1b.c: Likewise.
> * gcc.target/i386/pr102566-2.c: Likewise.
> * gcc.target/i386/pr102566-3a.c: Likewise.
> * gcc.target/i386/pr102566-3b.c: Likewise.
> * gcc.target/i386/pr102566-4.c: Likewise.
> * gcc.target/i386/pr102566-5.c: Likewise.
> * gcc.target/i386/pr102566-6.c: Likewise.
> * gcc.target/i386/pr102566-7.c: Likewise.
> * gcc.target/i386/pr102566-8a.c: Likewise.
> * gcc.target/i386/pr102566-8b.c: Likewise.
> * gcc.target/i386/pr102566-9a.c: Likewise.
> * gcc.target/i386/pr102566-9b.c: Likewise.
> * gcc.target/i386/pr102566-10a.c: Likewise.
> * gcc.target/i386/pr102566-10b.c: Likewise.
> * gcc.target/i386/pr102566-11.c: Likewise.
> * gcc.target/i386/pr102566-12.c: Likewise.
> ---
>  gcc/testsuite/g++.target/i386/pr102566-1.C   |  31 ++
>  gcc/testsuite/g++.target/i386/pr102566-2.C   |  31 ++
>  gcc/testsuite/g++.target/i386/pr102566-3.C   |  31 ++
>  gcc/testsuite/g++.target/i386/pr102566-4.C   |  29 ++
>  gcc/testsuite/g++.target/i386/pr102566-5a.C  |  31 ++
>  gcc/testsuite/g++.target/i386/pr102566-5b.C  |  31 ++
>  gcc/testsuite/g++.target/i386/pr102566-6a.C  |  31 ++
>  gcc/testsuite/g++.target/i386/pr102566-6b.C  |  31 

[PATCH] Add GSI_LAST_NEW_STMT iterator update

2021-10-13 Thread Richard Biener via Gcc-patches
Currently when adding a sequence before there's no way to get the
iterator placed at the last added stmt which results in convoluted
code in the if-conversion usecase.  The following adds
GSI_LAST_NEW_STMT and corrects one obvious mistake in
execute_update_addresses_taken as well as tries to avoid the
just filed PR102726 by biasing the enum values to be outside of
the boolean 0/1 range.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Will push once that succeeded.

Richard.

2021-10-13  Richard Biener  

* gimple-iterator.h (gsi_iterator_update): Add GSI_LAST_NEW_STMT,
start at integer value 2.
* gimple-iterator.c (gsi_insert_seq_nodes_before): Update
the iterator for GSI_LAST_NEW_STMT.
(gsi_insert_seq_nodes_after): Likewise.
* tree-if-conv.c (predicate_statements): Use GSI_LAST_NEW_STMT.
* tree-ssa.c (execute_update_addresses_taken): Correct bogus
arguments to gsi_replace.
---
 gcc/gimple-iterator.c | 4 
 gcc/gimple-iterator.h | 4 ++--
 gcc/tree-if-conv.c| 6 +-
 gcc/tree-ssa.c| 4 ++--
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/gimple-iterator.c b/gcc/gimple-iterator.c
index da8c21297de..ee4e63a3bd3 100644
--- a/gcc/gimple-iterator.c
+++ b/gcc/gimple-iterator.c
@@ -162,6 +162,9 @@ gsi_insert_seq_nodes_before (gimple_stmt_iterator *i,
 case GSI_CONTINUE_LINKING:
   i->ptr = first;
   break;
+case GSI_LAST_NEW_STMT:
+  i->ptr = last;
+  break;
 case GSI_SAME_STMT:
   break;
 default:
@@ -271,6 +274,7 @@ gsi_insert_seq_nodes_after (gimple_stmt_iterator *i,
 case GSI_NEW_STMT:
   i->ptr = first;
   break;
+case GSI_LAST_NEW_STMT:
 case GSI_CONTINUE_LINKING:
   i->ptr = last;
   break;
diff --git a/gcc/gimple-iterator.h b/gcc/gimple-iterator.h
index 6047a739332..0e384f72f94 100644
--- a/gcc/gimple-iterator.h
+++ b/gcc/gimple-iterator.h
@@ -46,8 +46,8 @@ struct gphi_iterator : public gimple_stmt_iterator
  
 enum gsi_iterator_update
 {
-  GSI_NEW_STMT,/* Only valid when single statement is added, 
move
-  iterator to it.  */
+  GSI_NEW_STMT = 2,/* Move the iterator to the first statement added.  */
+  GSI_LAST_NEW_STMT,   /* Move the iterator to the last statement added.  */
   GSI_SAME_STMT,   /* Leave the iterator at the same statement.  */
   GSI_CONTINUE_LINKING /* Move iterator to whatever position is suitable
   for linking other statements in the same
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 6a67acfeaae..0b6b07cfac6 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2582,11 +2582,7 @@ predicate_statements (loop_p loop)
{
  gsi_remove (, true);
  gsi_insert_seq_before (, rewrite_to_defined_overflow (stmt),
-GSI_SAME_STMT);
- if (gsi_end_p (gsi))
-   gsi = gsi_last_bb (gimple_bb (stmt));
- else
-   gsi_prev ();
+GSI_LAST_NEW_STMT);
}
  else if (gimple_vdef (stmt))
{
diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index 0fba404babe..fde13defebf 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -2079,7 +2079,7 @@ execute_update_addresses_taken (void)
gcall *call
  = gimple_build_call_internal (IFN_ASAN_POISON, 0);
gimple_call_set_lhs (call, var);
-   gsi_replace (, call, GSI_SAME_STMT);
+   gsi_replace (, call, true);
  }
else
  {
@@ -2088,7 +2088,7 @@ execute_update_addresses_taken (void)
   previous out of scope value.  */
tree clobber = build_clobber (TREE_TYPE (var));
gimple *g = gimple_build_assign (var, clobber);
-   gsi_replace (, g, GSI_SAME_STMT);
+   gsi_replace (, g, true);
  }
continue;
  }
-- 
2.31.1


RE: [PATCH 5/7]middle-end Convert bitclear + cmp #0 into cm

2021-10-13 Thread Richard Biener via Gcc-patches
On Tue, 5 Oct 2021, Tamar Christina wrote:

> Hi All,
> 
> Here's a new version of the patch handling both scalar and vector modes
> and non-uniform constant vectors.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> In order to not break IVopts and CSE I have added a
> requirement for the scalar version to be single use.

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree.c (bitmask_inv_cst_vector_p): New.
>   * tree.h (bitmask_inv_cst_vector_p): New.
>   * match.pd: Use it in new bitmask compare pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/bic-bitmask-10.c: New test.
>   * gcc.dg/bic-bitmask-11.c: New test.
>   * gcc.dg/bic-bitmask-12.c: New test.
>   * gcc.dg/bic-bitmask-13.c: New test.
>   * gcc.dg/bic-bitmask-14.c: New test.
>   * gcc.dg/bic-bitmask-15.c: New test.
>   * gcc.dg/bic-bitmask-16.c: New test.
>   * gcc.dg/bic-bitmask-17.c: New test.
>   * gcc.dg/bic-bitmask-18.c: New test.
>   * gcc.dg/bic-bitmask-19.c: New test.
>   * gcc.dg/bic-bitmask-2.c: New test.
>   * gcc.dg/bic-bitmask-20.c: New test.
>   * gcc.dg/bic-bitmask-21.c: New test.
>   * gcc.dg/bic-bitmask-22.c: New test.
>   * gcc.dg/bic-bitmask-23.c: New test.
>   * gcc.dg/bic-bitmask-3.c: New test.
>   * gcc.dg/bic-bitmask-4.c: New test.
>   * gcc.dg/bic-bitmask-5.c: New test.
>   * gcc.dg/bic-bitmask-6.c: New test.
>   * gcc.dg/bic-bitmask-7.c: New test.
>   * gcc.dg/bic-bitmask-8.c: New test.
>   * gcc.dg/bic-bitmask-9.c: New test.
>   * gcc.dg/bic-bitmask.h: New test.
>   * gcc.target/aarch64/bic-bitmask-1.c: New test.
> 
> --- inline copy of patch --
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 0fcfd0ea62c043dc217d0d560ce5b7e569b70e7d..7d2a24dbc5e9644a09968f877e12a824d8ba1caa
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -37,7 +37,8 @@ along with GCC; see the file COPYING3.  If not see
> integer_pow2p
> uniform_integer_cst_p
> HONOR_NANS
> -   uniform_vector_p)
> +   uniform_vector_p
> +   bitmask_inv_cst_vector_p)
>  
>  /* Operator lists.  */
>  (define_operator_list tcc_comparison
> @@ -4900,6 +4901,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(eqcmp (bit_and @1 { wide_int_to_tree (ty, mask - rhs); })
>{ build_zero_cst (ty); }))
>  
> +/* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
> +   where ~Y + 1 == pow2 and Z = ~Y.  */
> +(for cst (VECTOR_CST INTEGER_CST)
> + (for cmp (le eq ne ge gt)
> +  icmp (le le gt le gt)
> + (simplify
> +  (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> +   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> + (switch
> +  (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
> +&& (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> +   (icmp @0 { csts; }))
> +  (if (csts && !TYPE_UNSIGNED (TREE_TYPE (@1))
> +&& (cmp == EQ_EXPR || cmp == NE_EXPR)
> +&& (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> +   (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> + (icmp (convert:utype @0) { csts; }
> +
>  /* -A CMP -B -> B CMP A.  */
>  (for cmp (tcc_comparison)
>   scmp (swapped_tcc_comparison)
> diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-10.c 
> b/gcc/testsuite/gcc.dg/bic-bitmask-10.c
> new file mode 100644
> index 
> ..76a22a2313137a2a75dd711c2c15c2d3a34e15aa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/bic-bitmask-10.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> +
> +#include 
> +
> +__attribute__((noinline, noipa))
> +void fun1(int32_t *x, int n)
> +{
> +for (int i = 0; i < (n & -16); i++)
> +  x[i] = (x[i]&(~255)) == 0;
> +}
> +
> +__attribute__((noinline, noipa, optimize("O1")))
> +void fun2(int32_t *x, int n)
> +{
> +for (int i = 0; i < (n & -16); i++)
> +  x[i] = (x[i]&(~255)) == 0;
> +}
> +
> +#define TYPE int32_t
> +#include "bic-bitmask.h"
> +
> +/* { dg-final { scan-tree-dump {<=\s*.+\{ 255,.+\}} dce7 } } */
> +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967290,.+\}} dce7 } } */
> +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target { aarch64*-*-* 
> } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-11.c 
> b/gcc/testsuite/gcc.dg/bic-bitmask-11.c
> new file mode 100644
> index 
> ..32553d7ba2f823f7a21237451990d0a216d2f912
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/bic-bitmask-11.c
> @@ -0,0 +1,25 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> +
> +#include 
> +
> +__attribute__((noinline, noipa))
> +void fun1(uint32_t *x, int n)
> +{
> +for (int i = 0; i < (n & -16); i++)
> +  x[i] = (x[i]&(~255)) != 0;
> +}
> +
> +__attribute__((noinline, noipa, optimize("O1")))
> +void fun2(uint32_t *x, int n)
> +{
> +for (int i = 0; i < (n 

RE: [PATCH]middle-end convert negate + right shift into compare greater.

2021-10-13 Thread Richard Biener via Gcc-patches
On Mon, 11 Oct 2021, Tamar Christina wrote:

> Hi all,
> 
> Here's a new version of the patch.
> 
> > >>> " If an exceptional condition occurs during the evaluation of an
> > >>> expression
> > >> (that is, if the result is not mathematically defined or not in the
> > >> range of representable values for its type), the behavior is undefined."
> > >>>
> > >>> So it should still be acceptable to do in this case.
> > >>
> > >> -fwrapv
> > >
> > > If I understand correctly, you're happy with this is I guard it on ! 
> > > flag_wrapv ?
> > 
> > I did some more digging.  Right shift of a negative value is IMP_DEF (not
> > UNDEF - this keeps catching me out).  So yes, wrapping this with !wrapv
> > would address my concern.
> > 
> > I've not reviewed the patch itself, though.  I've never even written a patch
> > for match.pd, so don't feel qualified to do that.
> 
> No problem, thanks for catching this! I'm sure one of the Richards will 
> review it when
> they have a chance.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * match.pd: New negate+shift pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/signbit-2.c: New test.
>   * gcc.dg/signbit-3.c: New test.
>   * gcc.target/aarch64/signbit-1.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..3d48eda826f889483a83267409c3f278ee907b57
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -826,6 +826,38 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  { tree utype = unsigned_type_for (type); }
>  (convert (rshift (lshift (convert:utype @0) @2) @3))
>  
> +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */
> +(for cst (INTEGER_CST VECTOR_CST)
> + (simplify
> +  (rshift (negate:s @0) cst@1)
> +   (if (!flag_wrapv)

Don't test flag_wrapv directly, instead use the appropriate
TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates.  But I'm not sure
what we are protecting against?  Right-shift of signed integers
is implementation-defined and GCC treats it as you'd expect,
sign-extending the result.

> +(with { tree ctype = TREE_TYPE (@0);
> + tree stype = TREE_TYPE (@1);
> + tree bt = truth_type_for (ctype); }
> + (switch
> +  /* Handle scalar case.  */
> +  (if (INTEGRAL_TYPE_P (ctype)
> +&& !VECTOR_TYPE_P (ctype)
> +&& !TYPE_UNSIGNED (ctype)
> +&& canonicalize_math_after_vectorization_p ()
> +&& wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
> +   (convert:bt (gt:bt @0 { build_zero_cst (stype); })))

I'm not sure why the result is of type 'bt' rather than the
original type of the expression?

In that regard for non-vectors we'd have to add the sign
extension from unsigned bool, in the vector case we'd
hope the type of the comparison is correct.  I think in
both cases it might be convenient to use

  (cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst 
(ctype); } { build_zero_cost (ctype); })

to compute the correct result and rely on (cond ..) simplifications
to simplify that if possible.

Btw, 'stype' should be irrelevant - you need to look at
the precision of 'ctype', no?

Richard.

> +  /* Handle vector case with a scalar immediate.  */
> +  (if (VECTOR_INTEGER_TYPE_P (ctype)
> +&& !VECTOR_TYPE_P (stype)
> +&& !TYPE_UNSIGNED (ctype)
> +   && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
> +   (convert:bt (gt:bt @0 { build_zero_cst (ctype); })))
> +  /* Handle vector case with a vector immediate.   */
> +  (if (VECTOR_INTEGER_TYPE_P (ctype)
> +&& VECTOR_TYPE_P (stype)
> +&& !TYPE_UNSIGNED (ctype)
> +&& uniform_vector_p (@1))
> +   (with { tree cst = vector_cst_elt (@1, 0);
> +tree t = TREE_TYPE (cst); }
> +(if (wi::eq_p (wi::to_wide (cst), TYPE_PRECISION (t) - 1))
> + (convert:bt (gt:bt @0 { build_zero_cst (ctype); }))
> +
>  /* Fold (C1/X)*C2 into (C1*C2)/X.  */
>  (simplify
>   (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
> diff --git a/gcc/testsuite/gcc.dg/signbit-2.c 
> b/gcc/testsuite/gcc.dg/signbit-2.c
> new file mode 100644
> index 
> ..fc0157cbc5c7996b481f2998bc30176c96a669bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/signbit-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O3 --save-temps -fdump-tree-optimized" } */
> +
> +#include 
> +
> +void fun1(int32_t *x, int n)
> +{
> +for (int i = 0; i < (n & -16); i++)
> +  x[i] = (-x[i]) >> 31;
> +}
> +
> +void fun2(int32_t *x, int n)
> +{
> +for (int i = 0; i < (n & -16); i++)
> +  x[i] = (-x[i]) >> 30;
> +}
> +
> +/* { dg-final { scan-tree-dump-times {\s+>\s+\{ 0, 0, 0, 0 \}} 1 optimized } 
> } */
> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */
> diff --git 

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-13 Thread Martin Liška

On 10/13/21 10:47, Richard Biener wrote:

Let's split this;)   The debug_inline_points part is OK.


Fine.



How can debug_variable_location_views be ever -1?  But the
debug_variable_location_views part looks OK as well.


It comes from here:
gvariable-location-views=incompat5
Common Driver RejectNegative Var(debug_variable_location_views, -1) Init(2)

but it's fine as using -gvariable-location-views=incompat5 leads to
OPTION_SET_P(debug_variable_location_views) == true.



More or less all parts that have the variable assigned in a single
place in gcc/ are OK (dwarf2out_as_locview_support).  But the
main flag_var_tracking* cases need more thorough view,
maybe we can convert them to single-set code first?


I don't think so, your have code like

  if (flag_var_tracking_assignments_toggle)
flag_var_tracking_assignments = !flag_var_tracking_assignments;

which makes it more complicated. Or do I miss something?

Cheers,
Martin


Re: [PATCH] Fix handling of flag_rename_registers.

2021-10-13 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 12:02 PM Martin Liška  wrote:
>
> On 10/13/21 10:39, Richard Biener wrote:
> > On Tue, Oct 12, 2021 at 5:11 PM Martin Liška  wrote:
> >>
> >> On 10/12/21 15:37, Richard Biener wrote:
> >>> by adding EnabledBy(funroll-loops) to the respective options instead
> >>> (and funroll-loops EnabledBy(funroll-all-loops))
> >>
> >> All right, so the suggested approach works correctly.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> >   funroll-all-loops
> > -Common Var(flag_unroll_all_loops) Optimization
> > +Common Var(flag_unroll_all_loops) Optimization EnabledBy(funroll-all-loops)
> >
> > that should be on funroll-loops?
>
> Yeah, what a stupid error.
>
> >
> > Can you verify that the two-step -funroll-all-loops -> -funroll-loops
> > -> -frename-registers
>
> Yes, verified that in debugger, it's not dependent on an ordering.
>
> > works and that it is not somehow dependent on ordering?  Otherwise we have 
> > to
> > use EnabledBy(funroll-loops,funroll-all-loops) on frename-registers.
> > I guess the
> > EnabledBy doesn't work if the target decides to set flag_unroll_loop in one 
> > of
> > its hooks rather than via its option table override?  (as said, this
> > is all a mess...)
>
> It's a complete mess. The only override happens in
> rs6000_override_options_after_change. I think it can also utilize EnabledBy, 
> but
> I would like to do it in a different patch.
>
> >
> > But grep should be your friend telling whether any target overrides
> > any of the flags...
> >
> > I do hope we can eventually reduce the number of 
> > pre-/post-/lang/target/common
> > processing phases for options :/  Meh.
>
> Huh.
>
> May I install this fixed patch once it's tested?

Yes please.

Thanks,
Richard.

> Martin
>
> >
> > Richard.
> >
> >> Thanks,
> >> Martin
>


Re: [PATCH] tree-optimization/102659 - avoid undefined overflow after if-conversion

2021-10-13 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 13 Oct 2021, Richard Sandiford wrote:
>
>> Richard Biener via Gcc-patches  writes:
>> > The following makes sure to rewrite arithmetic with undefined behavior
>> > on overflow to a well-defined variant when moving them to be always
>> > executed as part of doing if-conversion for loop vectorization.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > Any comments?
>> 
>> LGTM FWIW, although…
>> 
>> > Thanks,
>> > Richard.
>> >
>> > 2021-10-11  Richard Biener  
>> >
>> >PR tree-optimization/102659
>> >* tree-if-conv.c (need_to_rewrite_undefined): New flag.
>> >(if_convertible_gimple_assign_stmt_p): Mark the loop for
>> >rewrite when stmts with undefined behavior on integer
>> >overflow appear.
>> >(combine_blocks): Predicate also when we need to rewrite stmts.
>> >(predicate_statements): Rewrite affected stmts to something
>> >with well-defined behavior on overflow.
>> >(tree_if_conversion): Initialize need_to_rewrite_undefined.
>> >
>> >* gcc.dg/torture/pr69760.c: Adjust the testcase.
>> >* gcc.target/i386/avx2-vect-mask-store-move1.c: Expect to move
>> >the conversions to unsigned as well.
>> > ---
>> >  gcc/testsuite/gcc.dg/torture/pr69760.c|  3 +-
>> >  .../i386/avx2-vect-mask-store-move1.c |  2 +-
>> >  gcc/tree-if-conv.c| 28 ++-
>> >  3 files changed, 29 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/gcc/testsuite/gcc.dg/torture/pr69760.c 
>> > b/gcc/testsuite/gcc.dg/torture/pr69760.c
>> > index 53733c7c6a4..47e01ae59bd 100644
>> > --- a/gcc/testsuite/gcc.dg/torture/pr69760.c
>> > +++ b/gcc/testsuite/gcc.dg/torture/pr69760.c
>> > @@ -1,11 +1,10 @@
>> >  /* PR tree-optimization/69760 */
>> >  /* { dg-do run { target { { *-*-linux* *-*-gnu* *-*-uclinux* } && mmap } 
>> > } } */
>> > -/* { dg-options "-O2" } */
>> >  
>> >  #include 
>> >  #include 
>> >  
>> > -__attribute__((noinline, noclone)) void
>> > +__attribute__((noinline, noclone)) static void
>> >  test_func (double *a, int L, int m, int n, int N)
>> >  {
>> >int i, k;
>> > diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c 
>> > b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
>> > index 989ba402e0e..6a47a09c835 100644
>> > --- a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
>> > +++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
>> > @@ -78,4 +78,4 @@ avx2_test (void)
>> >abort ();
>> >  }
>> >  
>> > -/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } 
>> > } */
>> > +/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 10 "vect" 
>> > } } */
>> > diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> > index d7b7b309309..6a67acfeaae 100644
>> > --- a/gcc/tree-if-conv.c
>> > +++ b/gcc/tree-if-conv.c
>> > @@ -132,6 +132,11 @@ along with GCC; see the file COPYING3.  If not see
>> > predicate_statements for the kinds of predication we support.  */
>> >  static bool need_to_predicate;
>> >  
>> > +/* True if we have to rewrite stmts that may invoke undefined behavior
>> > +   when a condition C was false so it doesn't if it is always executed.
>> > +   See predicate_statements for the kinds of predication we support.  */
>> > +static bool need_to_rewrite_undefined;
>> > +
>> >  /* Indicate if there are any complicated PHIs that need to be handled in
>> > if-conversion.  Complicated PHI has more than two arguments and can't
>> > be degenerated to two arguments PHI.  See more information in comment
>> > @@ -1042,6 +1047,12 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
>> >fprintf (dump_file, "tree could trap...\n");
>> >return false;
>> >  }
>> > +  else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>> > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
>> > + && arith_code_with_undefined_signed_overflow
>> > +  (gimple_assign_rhs_code (stmt)))
>> > +/* We have to rewrite stmts with undefined overflow.  */
>> > +need_to_rewrite_undefined = true;
>> >  
>> >/* When if-converting stores force versioning, likewise if we
>> >   ended up generating store data races.  */
>> > @@ -2563,6 +2574,20 @@ predicate_statements (loop_p loop)
>> >  
>> >  gsi_replace (, new_stmt, true);
>> >}
>> > +else if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
>> > + && TYPE_OVERFLOW_UNDEFINED
>> > +  (TREE_TYPE (gimple_assign_lhs (stmt)))
>> > + && arith_code_with_undefined_signed_overflow
>> > +  (gimple_assign_rhs_code (stmt)))
>> > +  {
>> > +gsi_remove (, true);
>> > +gsi_insert_seq_before (, rewrite_to_defined_overflow (stmt),
>> > +   GSI_SAME_STMT);
>> > +if (gsi_end_p (gsi))
>> > +  gsi = gsi_last_bb (gimple_bb (stmt));
>> > +else
>> > + 

Re: [PATCH] tree-optimization/102659 - avoid undefined overflow after if-conversion

2021-10-13 Thread Richard Biener via Gcc-patches
On Wed, 13 Oct 2021, Richard Sandiford wrote:

> Richard Biener via Gcc-patches  writes:
> > The following makes sure to rewrite arithmetic with undefined behavior
> > on overflow to a well-defined variant when moving them to be always
> > executed as part of doing if-conversion for loop vectorization.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Any comments?
> 
> LGTM FWIW, although…
> 
> > Thanks,
> > Richard.
> >
> > 2021-10-11  Richard Biener  
> >
> > PR tree-optimization/102659
> > * tree-if-conv.c (need_to_rewrite_undefined): New flag.
> > (if_convertible_gimple_assign_stmt_p): Mark the loop for
> > rewrite when stmts with undefined behavior on integer
> > overflow appear.
> > (combine_blocks): Predicate also when we need to rewrite stmts.
> > (predicate_statements): Rewrite affected stmts to something
> > with well-defined behavior on overflow.
> > (tree_if_conversion): Initialize need_to_rewrite_undefined.
> >
> > * gcc.dg/torture/pr69760.c: Adjust the testcase.
> > * gcc.target/i386/avx2-vect-mask-store-move1.c: Expect to move
> > the conversions to unsigned as well.
> > ---
> >  gcc/testsuite/gcc.dg/torture/pr69760.c|  3 +-
> >  .../i386/avx2-vect-mask-store-move1.c |  2 +-
> >  gcc/tree-if-conv.c| 28 ++-
> >  3 files changed, 29 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/torture/pr69760.c 
> > b/gcc/testsuite/gcc.dg/torture/pr69760.c
> > index 53733c7c6a4..47e01ae59bd 100644
> > --- a/gcc/testsuite/gcc.dg/torture/pr69760.c
> > +++ b/gcc/testsuite/gcc.dg/torture/pr69760.c
> > @@ -1,11 +1,10 @@
> >  /* PR tree-optimization/69760 */
> >  /* { dg-do run { target { { *-*-linux* *-*-gnu* *-*-uclinux* } && mmap } } 
> > } */
> > -/* { dg-options "-O2" } */
> >  
> >  #include 
> >  #include 
> >  
> > -__attribute__((noinline, noclone)) void
> > +__attribute__((noinline, noclone)) static void
> >  test_func (double *a, int L, int m, int n, int N)
> >  {
> >int i, k;
> > diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c 
> > b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
> > index 989ba402e0e..6a47a09c835 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
> > @@ -78,4 +78,4 @@ avx2_test (void)
> >abort ();
> >  }
> >  
> > -/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } 
> > } */
> > +/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 10 "vect" } 
> > } */
> > diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> > index d7b7b309309..6a67acfeaae 100644
> > --- a/gcc/tree-if-conv.c
> > +++ b/gcc/tree-if-conv.c
> > @@ -132,6 +132,11 @@ along with GCC; see the file COPYING3.  If not see
> > predicate_statements for the kinds of predication we support.  */
> >  static bool need_to_predicate;
> >  
> > +/* True if we have to rewrite stmts that may invoke undefined behavior
> > +   when a condition C was false so it doesn't if it is always executed.
> > +   See predicate_statements for the kinds of predication we support.  */
> > +static bool need_to_rewrite_undefined;
> > +
> >  /* Indicate if there are any complicated PHIs that need to be handled in
> > if-conversion.  Complicated PHI has more than two arguments and can't
> > be degenerated to two arguments PHI.  See more information in comment
> > @@ -1042,6 +1047,12 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
> > fprintf (dump_file, "tree could trap...\n");
> >return false;
> >  }
> > +  else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> > +  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
> > +  && arith_code_with_undefined_signed_overflow
> > +   (gimple_assign_rhs_code (stmt)))
> > +/* We have to rewrite stmts with undefined overflow.  */
> > +need_to_rewrite_undefined = true;
> >  
> >/* When if-converting stores force versioning, likewise if we
> >   ended up generating store data races.  */
> > @@ -2563,6 +2574,20 @@ predicate_statements (loop_p loop)
> >  
> >   gsi_replace (, new_stmt, true);
> > }
> > + else if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
> > +  && TYPE_OVERFLOW_UNDEFINED
> > +   (TREE_TYPE (gimple_assign_lhs (stmt)))
> > +  && arith_code_with_undefined_signed_overflow
> > +   (gimple_assign_rhs_code (stmt)))
> > +   {
> > + gsi_remove (, true);
> > + gsi_insert_seq_before (, rewrite_to_defined_overflow (stmt),
> > +GSI_SAME_STMT);
> > + if (gsi_end_p (gsi))
> > +   gsi = gsi_last_bb (gimple_bb (stmt));
> > + else
> > +   gsi_prev ();
> 
> …perhaps there should be a GSI_* for this idiom.

Yeah, the issue is that gsi_remove 

RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-10-13 Thread Tamar Christina via Gcc-patches
> 
> Hmmm these patterns are identical in what they match they just have the
> effect of printing operands 1 and 2 in a different order.
> Perhaps it's more compact to change the output template into a
> BYTES_BIG_ENDIAN ?
> "uzp1\\t%0., %1., %2."" :
> uzp1\\t%0., %2., %1."
> and avoid having a second at all?
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_narrow_trunc): New.
* config/aarch64/iterators.md (VNARROWSIMD, Vnarrowsimd): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/narrow_high_combine.c: Update case.
* gcc.target/aarch64/xtn-combine-1.c: New test.
* gcc.target/aarch64/xtn-combine-2.c: New test.
* gcc.target/aarch64/xtn-combine-3.c: New test.
* gcc.target/aarch64/xtn-combine-4.c: New test.
* gcc.target/aarch64/xtn-combine-5.c: New test.
* gcc.target/aarch64/xtn-combine-6.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
0b340b49fa06684b80d0b78cb712e49328ca92d5..b0dda554466149817a7828dbf4e0ed372a91872b
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1753,6 +1753,23 @@ (define_expand "aarch64_xtn2"
   }
 )
 
+(define_insn "*aarch64_narrow_trunc"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+  (truncate:
+(match_operand:VQN 1 "register_operand" "w"))
+ (truncate:
+   (match_operand:VQN 2 "register_operand" "w"]
+  "TARGET_SIMD"
+{
+  if (!BYTES_BIG_ENDIAN)
+return "uzp1\\t%0., %1., %2.";
+  else
+return "uzp1\\t%0., %2., %1.";
+}
+  [(set_attr "type" "neon_permute")]
+)
+
 ;; Packing doubles.
 
 (define_expand "vec_pack_trunc_"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
8dbeed3b0d4a44cdc17dd333ed397b39a33f386a..95b385c0c9405fe95fcd07262a9471ab13d5488e
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -270,6 +270,14 @@ (define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI])
 ;; Advanced SIMD modes for H, S and D types.
 (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
 
+;; Modes for which we can narrow the element and increase the lane counts
+;; to preserve the same register size.
+(define_mode_attr VNARROWSIMD [(V4HI "V8QI") (V8HI "V16QI") (V4SI "V8HI")
+  (V2SI "V4HI") (V2DI "V4SI")])
+
+(define_mode_attr Vnarrowsimd [(V4HI "v8qi") (V8HI "v16qi") (V4SI "v8hi")
+  (V2SI "v4hi") (V2DI "v4si")])
+
 ;; Advanced SIMD and scalar integer modes for H and S.
 (define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c 
b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
index 
50ecab002a3552d37a5cc0d8921f42f6c3dba195..fa61196d3644caa48b12151e12b15dfeab8c7e71
 100644
--- a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
+++ b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
@@ -225,7 +225,8 @@ TEST_2_UNARY (vqmovun, uint32x4_t, int64x2_t, s64, u32)
 /* { dg-final { scan-assembler-times "\\tuqshrn2\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tsqrshrn2\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tuqrshrn2\\tv" 6} }  */
-/* { dg-final { scan-assembler-times "\\txtn2\\tv" 12} }  */
+/* { dg-final { scan-assembler-times "\\txtn2\\tv" 6} }  */
+/* { dg-final { scan-assembler-times "\\tuzp1\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tuqxtn2\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tsqxtn2\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tsqxtun2\\tv" 6} }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/xtn-combine-1.c 
b/gcc/testsuite/gcc.target/aarch64/xtn-combine-1.c
new file mode 100644
index 
..14e0414cd1478f1cb7b17766aa8d4451c5659977
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/xtn-combine-1.c
@@ -0,0 +1,16 @@
+/* { dg-do assemble } */
+/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
+
+#define SIGN signed
+#define TYPE1 char
+#define TYPE2 short
+
+void d2 (SIGN TYPE1 * restrict a, SIGN TYPE2 *b, int n)
+{
+for (int i = 0; i < n; i++)
+  a[i] = b[i];
+}
+
+/* { dg-final { scan-assembler-times {\tuzp1\t} 1 } } */
+/* { dg-final { scan-assembler-not {\txtn\t} } } */
+/* { dg-final { scan-assembler-not {\txtn2\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/xtn-combine-2.c 
b/gcc/testsuite/gcc.target/aarch64/xtn-combine-2.c
new file mode 100644
index 
..c259010442bca4ba008706e47b3ffcc50a910b52
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/xtn-combine-2.c
@@ -0,0 +1,16 @@
+/* { dg-do assemble } */
+/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
+
+#define SIGN signed
+#define TYPE1 

Re: [PATCH] tree-optimization/102659 - avoid undefined overflow after if-conversion

2021-10-13 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> The following makes sure to rewrite arithmetic with undefined behavior
> on overflow to a well-defined variant when moving them to be always
> executed as part of doing if-conversion for loop vectorization.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Any comments?

LGTM FWIW, although…

> Thanks,
> Richard.
>
> 2021-10-11  Richard Biener  
>
>   PR tree-optimization/102659
>   * tree-if-conv.c (need_to_rewrite_undefined): New flag.
>   (if_convertible_gimple_assign_stmt_p): Mark the loop for
>   rewrite when stmts with undefined behavior on integer
>   overflow appear.
>   (combine_blocks): Predicate also when we need to rewrite stmts.
>   (predicate_statements): Rewrite affected stmts to something
>   with well-defined behavior on overflow.
>   (tree_if_conversion): Initialize need_to_rewrite_undefined.
>
>   * gcc.dg/torture/pr69760.c: Adjust the testcase.
>   * gcc.target/i386/avx2-vect-mask-store-move1.c: Expect to move
>   the conversions to unsigned as well.
> ---
>  gcc/testsuite/gcc.dg/torture/pr69760.c|  3 +-
>  .../i386/avx2-vect-mask-store-move1.c |  2 +-
>  gcc/tree-if-conv.c| 28 ++-
>  3 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr69760.c 
> b/gcc/testsuite/gcc.dg/torture/pr69760.c
> index 53733c7c6a4..47e01ae59bd 100644
> --- a/gcc/testsuite/gcc.dg/torture/pr69760.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr69760.c
> @@ -1,11 +1,10 @@
>  /* PR tree-optimization/69760 */
>  /* { dg-do run { target { { *-*-linux* *-*-gnu* *-*-uclinux* } && mmap } } } 
> */
> -/* { dg-options "-O2" } */
>  
>  #include 
>  #include 
>  
> -__attribute__((noinline, noclone)) void
> +__attribute__((noinline, noclone)) static void
>  test_func (double *a, int L, int m, int n, int N)
>  {
>int i, k;
> diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c 
> b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
> index 989ba402e0e..6a47a09c835 100644
> --- a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
> @@ -78,4 +78,4 @@ avx2_test (void)
>abort ();
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } 
> */
> +/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 10 "vect" } } 
> */
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index d7b7b309309..6a67acfeaae 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -132,6 +132,11 @@ along with GCC; see the file COPYING3.  If not see
> predicate_statements for the kinds of predication we support.  */
>  static bool need_to_predicate;
>  
> +/* True if we have to rewrite stmts that may invoke undefined behavior
> +   when a condition C was false so it doesn't if it is always executed.
> +   See predicate_statements for the kinds of predication we support.  */
> +static bool need_to_rewrite_undefined;
> +
>  /* Indicate if there are any complicated PHIs that need to be handled in
> if-conversion.  Complicated PHI has more than two arguments and can't
> be degenerated to two arguments PHI.  See more information in comment
> @@ -1042,6 +1047,12 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
>   fprintf (dump_file, "tree could trap...\n");
>return false;
>  }
> +  else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> +&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
> +&& arith_code_with_undefined_signed_overflow
> + (gimple_assign_rhs_code (stmt)))
> +/* We have to rewrite stmts with undefined overflow.  */
> +need_to_rewrite_undefined = true;
>  
>/* When if-converting stores force versioning, likewise if we
>   ended up generating store data races.  */
> @@ -2563,6 +2574,20 @@ predicate_statements (loop_p loop)
>  
> gsi_replace (, new_stmt, true);
>   }
> +   else if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
> +&& TYPE_OVERFLOW_UNDEFINED
> + (TREE_TYPE (gimple_assign_lhs (stmt)))
> +&& arith_code_with_undefined_signed_overflow
> + (gimple_assign_rhs_code (stmt)))
> + {
> +   gsi_remove (, true);
> +   gsi_insert_seq_before (, rewrite_to_defined_overflow (stmt),
> +  GSI_SAME_STMT);
> +   if (gsi_end_p (gsi))
> + gsi = gsi_last_bb (gimple_bb (stmt));
> +   else
> + gsi_prev ();

…perhaps there should be a GSI_* for this idiom.

Thanks,
Richard

> + }
> else if (gimple_vdef (stmt))
>   {
> tree lhs = gimple_assign_lhs (stmt);
> @@ -2647,7 +2672,7 @@ combine_blocks (class loop *loop)
>insert_gimplified_predicates (loop);
>

[PATCH v2 14/14] arm: Add VPR_REG to ALL_REGS

2021-10-13 Thread Christophe Lyon via Gcc-patches
VPR_REG should be part of ALL_REGS, this patch fixes this omission.

2021-10-13  Christophe Lyon  

gcc/
* config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index eae1b1cd0fb..fab39d05916 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1346,7 +1346,7 @@ enum reg_class
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
   { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.  */ \
 }
 
 #define FP_SYSREGS \
-- 
2.25.1



[PATCH v2 13/14] arm: Convert more MVE/CDE builtins to predicate qualifiers

2021-10-13 Thread Christophe Lyon via Gcc-patches
This patch covers a few non-load/store builtins where we do not use
the  iterator and thus we cannot use .

We need to update the expected code in cde-mve-full-assembly.c because
we now use mve_movv16qi instead of movhi to generate the vmsr
instruction.

2021-10-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): Use
predicate.
(CX_BINARY_UNONE_QUALIFIERS): Likewise.
(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
* config/arm/mve.md: Use VxBI instead of HI.

gcc/testsuite/
* gcc.target/arm/acle/cde-mve-full-assembly.c: Remove expected '@ 
movhi'.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index e58580bb828..d725458f1ad 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -344,7 +344,7 @@ static enum arm_type_qualifiers
 arm_cx_unary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_UNARY_UNONE_QUALIFIERS (arm_cx_unary_unone_qualifiers)
 
 /* T (immediate, T, T, unsigned immediate).  */
@@ -353,7 +353,7 @@ arm_cx_binary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_BINARY_UNONE_QUALIFIERS (arm_cx_binary_unone_qualifiers)
 
 /* T (immediate, T, T, T, unsigned immediate).  */
@@ -362,7 +362,7 @@ arm_cx_ternary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_TERNARY_UNONE_QUALIFIERS (arm_cx_ternary_unone_qualifiers)
 
 /* The first argument (return type) of a store should be void type,
@@ -558,12 +558,6 @@ 
arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \
   (arm_ternop_none_none_none_imm_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
@@ -616,13 +610,6 @@ 
arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_none_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
-qualifier_unsigned };
-#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_quadop_none_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
@@ -637,13 +624,6 @@ 
arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \
   (arm_quadop_none_none_none_imm_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-qualifier_unsigned, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index bb79edf83ca..0fb53d866ec 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -87,8 +87,8 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, 
v2di)
 VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
-VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
-VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
+VAR1 (BINOP_NONE_NONE_PRED, vaddlvq_p_s, v4si)
+VAR1 

[PATCH v2 12/14] arm: Convert more load/store MVE builtins to predicate qualifiers

2021-10-13 Thread Christophe Lyon via Gcc-patches
This patch covers a few builtins where we do not use the 
iterator and thus we cannot use .

For v2di instructions, we use the V8BI mode for predicates.

2021-10-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 06ff9d2278a..e58580bb828 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -738,13 +738,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
 
 static enum arm_type_qualifiers
@@ -780,13 +780,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -826,7 +826,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -842,13 +842,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -864,13 +864,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
 
 static enum arm_type_qualifiers
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 81ad488155d..c07487c0750 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -7282,7 +7282,7 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si"
[(match_operand:V4SI 0 "s_register_operand" "w")
 (match_operand:SI 1 "immediate_operand" "i")
 (match_operand:V4SI 2 "s_register_operand" "w")
-(match_operand:HI 3 "vpr_register_operand" "Up")]
+(match_operand:V4BI 3 "vpr_register_operand" "Up")]
 VSTRWSBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7371,7 +7371,7 @@ (define_insn "mve_vldrwq_gather_base_z_v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=")
(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
  (match_operand:SI 2 "immediate_operand" "i")
- (match_operand:HI 3 "vpr_register_operand" "Up")]
+ (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 VLDRWGBQ))
   ]
   "TARGET_HAVE_MVE"
@@ 

[PATCH v2 10/14] arm: Convert remaining MVE vcmp builtins to predicate qualifiers

2021-10-13 Thread Christophe Lyon via Gcc-patches
This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

2021-10-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (BINOP_UNONE_NONE_NONE_QUALIFIERS):
Delete.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
predicated qualifiers.
* config/arm/mve.md (mve_vcmpq_n_)
(mve_vcmp*q_m_f): Use MVE_VPRED instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 6e3638869f1..b3455d87d4f 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -487,12 +487,6 @@ arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_NONE_NONE_UNONE_QUALIFIERS \
   (arm_binop_none_none_unone_qualifiers)
 
-static enum arm_type_qualifiers
-arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none };
-#define BINOP_UNONE_NONE_NONE_QUALIFIERS \
-  (arm_binop_unone_none_none_qualifiers)
-
 static enum arm_type_qualifiers
 arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_predicate, qualifier_none, qualifier_none };
@@ -553,10 +547,10 @@ 
arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_unone_unone_imm_unone_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_unone_none_none_unone_qualifiers)
+arm_ternop_pred_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none, qualifier_predicate 
};
+#define TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_none_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -602,6 +596,13 @@ 
arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
   (arm_ternop_unone_unone_unone_pred_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_pred_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 58a05e61bd9..91ed2073918 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -118,9 +118,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvq_p_u, v16qi, v8hi, v4si)
@@ -142,17 +142,17 @@ VAR3 (BINOP_UNONE_UNONE_NONE, vbrsrq_n_u, v16qi, v8hi, 
v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, 

[PATCH v2 09/14] arm: Fix vcond_mask expander for MVE (PR target/100757)

2021-10-13 Thread Christophe Lyon via Gcc-patches
The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp,
vec_cmpu and vcond_mask_, and we can
move vec_cmp, vec_cmpu and
vcond_mask_ back to neon.md since they are not
used by MVE anymore.  The new * patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_ before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(! || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
if (a[b] != 2.0f)
  c = 5.0f;
  return c;
}

fn1:
ldr r3, .L3+48
vldr.64 d4, .L3  // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32q0, [r3] // q0=a(0..3)
addsr3, r3, #16
vcmp.f32eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32q1, [r3] // q1=a(4..7)
vmrs r3, P0
vcmp.f32eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrsr2, P0  @ movhi
andsr3, r3, r2   // r3=select(a(0..3]) & select(a(4..7))
vldr.64 d4, .L3+16   // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr P0, r3
vldr.64 d6, .L3+32   // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1]// keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32s11, s12
vmovs15, r2
vmovs14, r3
vmaxnm.f32  s15, s11, s15
vmaxnm.f32  s15, s15, s14
vmovs14, r0
vmaxnm.f32  s15, s15, s14
vmovr0, s15
bx  lr
.L4:
.align  3
.L3:
.word   1073741824  // 2.0f
.word   1073741824
.word   1073741824
.word   1073741824
.word   1084227584  // 5.0f
.word   1084227584
.word   1084227584
.word   1084227584
.word   1082130432  // 4.0f
.word   1082130432
.word   1082130432
.word   1082130432

2021-10-13  Christophe Lyon  

PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.c (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support for VxBI modes.
(arm_expand_vector_compare): Remove useless generation of vpsel.
(arm_expand_vcond): Fix select operands.
(arm_get_mask_mode): New.
* config/arm/mve.md (vec_cmp): New.
(vec_cmpu): New.
(vcond_mask_): New.
* config/arm/vec-common.md (vec_cmp)
(vec_cmpu): Move to ...
* config/arm/neon.md (vec_cmp)
(vec_cmpu): ... here
and disable for MVE.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9b1f61394ad..9e3d71e0c29 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -201,6 +201,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, 
tree, rtx, tree);
 extern bool arm_pad_reg_upward (machine_mode, tree, int);
 #endif
 extern int arm_apply_result_size (void);
+extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
 
 #endif /* RTX_CODE */
 
@@ -372,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, 
rtx, rtx, rtx, rtx,
 extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
 extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
-extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool);
+extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool);
 #endif /* RTX_CODE */
 
 extern bool 

Re: [PATCH 06/13] arm: Fix mve_vmvnq_n_ argument mode

2021-10-13 Thread Christophe Lyon via Gcc-patches
On Mon, Oct 11, 2021 at 4:10 PM Richard Sandiford via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Christophe Lyon via Gcc-patches  writes:
> > The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
> >  iterator instead of HI in mve_vmvnq_n_.
> >
> > 2021-09-03  Christophe Lyon  
> >
> >   gcc/
> >   * config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
> >   for operand 1.
> >
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index e393518ea88..14d17060290 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
> >  (define_insn "mve_vmvnq_n_"
> >[
> > (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> > - (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
> > + (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
> >VMVNQ_N))
> >]
> >"TARGET_HAVE_MVE"
>
> I agree this is correct, but there's also the issue that the
> predicate is too broad.  At the moment it allows any immediate,
> so things like:
>
>   #include 
>   int32x4_t foo(void) { return vmvnq_n_s32(0x12345678); }
>
> are accepted by the compiler and only rejected by the assembler.
> Not your bug to fix, just saying :-)
>
>
Right, and it seems to be the case for vbicq_n, vorrq_n, ...
I'll check that separately.



> Thanks,
> Richard
>


[PATCH v2 08/14] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates

2021-10-13 Thread Christophe Lyon via Gcc-patches
We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

New mov patterns are introduced to handle the new modes.

2021-10-13  Christophe Lyon 

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
(BINOP_PRED_NONE_NONE_QUALIFIERS)
(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm.c (arm_hard_regno_mode_ok): Handle new VxBI
modes.
(arm_mode_to_pred_mode): New.
(arm_expand_vector_compare): Use the right VxBI mode instead of
HI.
(arm_expand_vcond): Likewise.
* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
(vpselq_s, vpselq_f): Use new predicated qualifiers.
* config/arm/iterators.md (MVE_7): New mode iterator.
(MVE_VPRED, MVE_vpred): New attribute iterators.
* config/arm/mve.md (@mve_vcmpq_)
(@mve_vcmpq_f, @mve_vpselq_)
(@mve_vpselq_f): Use MVE_VPRED instead of HI.
(@mve_vpselq_v2di): Define separately.
(mov): New expander for VxBI modes.
(mve_mov): New insn for VxBI modes.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 771759f0cdd..6e3638869f1 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -469,6 +469,12 @@ 
arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_pred_unone_unone_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_none, qualifier_immediate };
@@ -487,6 +493,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
   (arm_binop_unone_none_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none };
+#define BINOP_PRED_NONE_NONE_QUALIFIERS \
+  (arm_binop_pred_none_none_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
@@ -558,6 +570,12 @@ 
arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
   (arm_ternop_none_none_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
+#define TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_none_none_none_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned 
};
@@ -577,6 +595,13 @@ 
arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_ternop_unone_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_unone_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9f52a152444..94696d528f8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25304,7 +25304,10 @@ arm_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
 return false;
 
   if (IS_VPR_REGNUM (regno))
-return mode == HImode;
+return mode == HImode
+  || mode == V16BImode
+  || mode == V8BImode
+  || mode == V4BImode;
 
   if (TARGET_THUMB1)
 /* For the Thumb we only allow values bigger than SImode in
@@ -30991,6 +30994,19 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, 
rtx new_out, rtx mem,
 

[PATCH v2 07/14] arm: Implement MVE predicates as vectors of booleans

2021-10-13 Thread Christophe Lyon via Gcc-patches
This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

2021-10-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (arm_type_qualifiers): Add 
qualifier_predicate.
(arm_init_simd_builtin_types): Add new simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-modes.def (V16BI, V8BI, V4BI): New modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* simplify-rtx.c (test_vector_ops_duplicate): Skip vec_merge test
with vectors of booleans.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 3a9ff8f26b8..771759f0cdd 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -92,7 +92,9 @@ enum arm_type_qualifiers
   qualifier_lane_pair_index = 0x1000,
   /* Lane indices selected in quadtuplets - must be within range of previous
  argument = a vector.  */
-  qualifier_lane_quadtup_index = 0x2000
+  qualifier_lane_quadtup_index = 0x2000,
+  /* MVE vector predicates.  */
+  qualifier_predicate = 0x4000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -1633,6 +1635,13 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
   arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
 
+  if (TARGET_HAVE_MVE)
+{
+  arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
+}
+
   for (i = 0; i < nelts; i++)
 {
   tree eltype = arm_simd_types[i].eltype;
@@ -1780,6 +1789,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum 
*d,
   if (qualifiers & qualifier_map_mode)
op_mode = d->mode;
 
+  /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
+short.  */
+  if (qualifiers & qualifier_predicate)
+   op_mode = HImode;
+
   /* For pointers, we want a pointer to the basic type
 of the vector.  */
   if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
@@ -3024,6 +3038,11 @@ arm_expand_builtin_args (rtx target, machine_mode 
map_mode, int fcode,
case ARG_BUILTIN_COPY_TO_REG:
  if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
op[argc] = convert_memory_address (Pmode, op[argc]);
+
+ /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  
*/
+ if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
+   op[argc] = gen_lowpart (mode[argc], op[argc]);
+
  /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
  if (!(*insn_data[icode].operand[opno].predicate)
  (op[argc], mode[argc]))
@@ -3229,6 +3248,13 @@ constant_arg:
   else
 emit_insn (insn);
 
+  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
+{
+  rtx HItarget = gen_reg_rtx (HImode);
+  emit_move_insn (HItarget, gen_lowpart (HImode, target));
+  return HItarget;
+}
+
   return target;
 }
 
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index a5e74ba3943..b414a709a62 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -84,6 +84,11 @@ VECTOR_MODE (FLOAT, BF, 2);   /* V2BF.  */
 VECTOR_MODE (FLOAT, BF, 4);   /*V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*V8BF.  */
 
+/* Predicates for MVE.  */
+VECTOR_BOOL_MODE (V16BI, 16, 2);
+VECTOR_BOOL_MODE (V8BI, 8, 2);
+VECTOR_BOOL_MODE (V4BI, 4, 2);
+
 /* Fraction and accumulator vector modes.  */
 VECTOR_MODES (FRACT, 4);  /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4); /* V4UQQ V2UHQ */
diff --git a/gcc/config/arm/arm-simd-builtin-types.def 
b/gcc/config/arm/arm-simd-builtin-types.def
index c19a1b6e3eb..d3987985b4c 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -51,3 +51,7 @@
   ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
   ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
   ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
+
+  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
+  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
+  ENTRY 

[PATCH v2 06/14] arm: Fix mve_vmvnq_n_ argument mode

2021-10-13 Thread Christophe Lyon via Gcc-patches
The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
 iterator instead of HI in mve_vmvnq_n_.

2021-10-13  Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
for operand 1.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index e393518ea88..14d17060290 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
 (define_insn "mve_vmvnq_n_"
   [
(set (match_operand:MVE_5 0 "s_register_operand" "=w")
-   (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
+   (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
 VMVNQ_N))
   ]
   "TARGET_HAVE_MVE"
-- 
2.25.1



[PATCH v2 05/14] arm: Add support for VPR_REG in arm_class_likely_spilled_p

2021-10-13 Thread Christophe Lyon via Gcc-patches
VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p.  No test fails without this patch, but
it seems it should be implemented.

2021-10-13  Christophe Lyon  

gcc/
* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 11dafc70067..9f52a152444 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29307,7 +29307,7 @@ arm_class_likely_spilled_p (reg_class_t rclass)
   || rclass  == CC_REG)
 return true;
 
-  return false;
+  return default_class_likely_spilled_p (rclass);
 }
 
 /* Implements target hook small_register_classes_for_mode_p.  */
-- 
2.25.1



[PATCH v2 04/14] arm: Add GENERAL_AND_VPR_REGS regclass

2021-10-13 Thread Christophe Lyon via Gcc-patches
At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

2021-10-13  Christophe Lyon  

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 015299c1534..eae1b1cd0fb 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1286,6 +1286,7 @@ enum reg_class
   SFP_REG,
   AFP_REG,
   VPR_REG,
+  GENERAL_AND_VPR_REGS,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1315,6 +1316,7 @@ enum reg_class
   "SFP_REG",   \
   "AFP_REG",   \
   "VPR_REG",   \
+  "GENERAL_AND_VPR_REGS", \
   "ALL_REGS"   \
 }
 
@@ -1343,6 +1345,7 @@ enum reg_class
   { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */\
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
   { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
 }
 
-- 
2.25.1



[PATCH v2 03/14] arm: Add tests for PR target/101325

2021-10-13 Thread Christophe Lyon via Gcc-patches
These tests are derived from the one provided in the PR: there is a
compile-only test because I did not have access to anything that could
execute MVE code until recently.
I have been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

2021-10-13  Christophe Lyon  

gcc/testsuite/
PR target/101325
* gcc.target/arm/simd/pr101325.c: New.
* gcc.target/arm/simd/pr101325-2.c: New.
* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
add_options_for_arm_v8_1m_mve_fp.

add executable test and update check_effective_target_arm_mve_hw

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c 
b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
new file mode 100644
index 000..7907a386385
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_mve_hw } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_v8_1m_mve } */
+
+#include 
+
+
+__attribute((noinline,noipa))
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+
+int main(void)
+{
+  if (foo (vdupq_n_s8(0), vdupq_n_s8(0)) != 0xU)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325.c 
b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
new file mode 100644
index 000..a466683a0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+/* { dg-final { scan-assembler {\tvcmp.i8  eq} } } */
+/* { dg-final { scan-assembler {\tvmrs\t r[0-9]+, P0} } } */
+/* { dg-final { scan-assembler {\tuxth} } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index e030e4f376b..b0e35b602af 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4889,6 +4889,7 @@ proc check_effective_target_arm_cmse_hw { } {
}
 } "-mcmse -Wl,--section-start,.gnu.sgstubs=0x0040"]
 }
+
 # Return 1 if the target supports executing MVE instructions, 0
 # otherwise.
 
@@ -4904,7 +4905,7 @@ proc check_effective_target_arm_mve_hw {} {
   : "0" (a), "r" (b));
  return (a != 2);
}
-} ""]
+} [add_options_for_arm_v8_1m_mve_fp ""]]
 }
 
 # Return 1 if this is an ARM target where ARMv8-M Security Extensions with
-- 
2.25.1



[PATCH v2 02/14] arm: Add tests for PR target/100757

2021-10-13 Thread Christophe Lyon via Gcc-patches
These tests currently trigger an ICE which is fixed later in the patch
series.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

2021-10-13  Christophe Lyon  

gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
new file mode 100644
index 000..c2262b4d81e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+int fn1(int d) {
+  int c = 4;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t4\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t5\n} 4 } } */ /* Possible value 
for c.  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
new file mode 100644
index 000..e604555c04c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Copied from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+float fn1(int d) {
+  float c = 4.0f;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5.0f;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1084227584\n} 4 } } */ /* 
Initial value for c (4.0).  */
+/* { dg-final { scan-assembler-times {\t.word\t1082130432\n} 4 } } */ /* 
Possible value for c (5.0).  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
new file mode 100644
index 000..c12040c517f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+unsigned int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value 
for c.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
new file mode 100644
index 000..41d6e4e2d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value 
for c.  */
-- 
2.25.1



[PATCH v2 01/14] arm: Add new tests for comparison vectorization with Neon and MVE

2021-10-13 Thread Christophe Lyon via Gcc-patches
This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.

mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.

2021-10-13  Christophe Lyon 

gcc/testsuite/
* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-compare-1.c: New.
* gcc.target/arm/simd/neon-compare-2.c: New.
* gcc.target/arm/simd/neon-compare-3.c: New.
* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
* gcc.target/arm/simd/neon-vcmp-f16.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
* gcc.target/arm/simd/neon-vcmp-f32.c: New.
* gcc.target/arm/simd/neon-vcmp.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
new file mode 100644
index 000..917a95bf141
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include 
+
+#define NB 4
+
+#define FUNC(OP, NAME) \
+  void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \
+int i; \
+for (i=0; i, vcmpgt)
+FUNC(>=, vcmpge)
+
+/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /* 
Constant 3.0f.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c 
b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
new file mode 100644
index 000..2e0222a71f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-compare-1.c"
+
+/* 64-bit vectors.  */
+/* vmvn is used by 'ne' comparisons: 3 sizes * 2 (signed/unsigned) * 2
+   (register/zero) = 12.  */
+/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 12 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }. */
+/* ne uses eq, lt/le only apply to comparison with zero, they use gt/ge
+   otherwise.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, #0\n} 4 } } 
*/
+/* { dg-final { scan-assembler-times {\tvclt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcle.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { 

[PATCH v2 00/14] ARM/MVE use vectors of boolean for predicates

2021-10-13 Thread Christophe Lyon via Gcc-patches
This is v2 of this patch series, addressing the comments I received.
The changes v1 -> v2 are:

- Patch 3: added an executable test, and updated
  check_effective_target_arm_mve_hw
- Patch 4: split into patch 4 and patch 14 (to keep numbering the same
  for the other patches)
- Patch 5: updated arm_class_likely_spilled_p as suggested.
- Patch 7: updated test_vector_ops_duplicate in simplify-rtx.c as
  suggested.
- Patch 8: added V2DI -> HI/hi mapping in MVE_VPRED/MVE_vpred
  iterators, removed now useless mve_vpselq_v2di, and fixed
  mov expander.
- Patch 9: arm_mode_to_pred_mode now returns opt_machine_mode, removed
  useless floating-point checks in vec_cmpu.
- Patch 12: replaced hi with v8bi in v2di load/store instructions

I'll squash patch 2 with patch patch 9 and patch 3 with patch 8.

Original text:

This patch series addresses PR 100757 and 101325 by representing
vectors of predicates (MVE VPR.P0 register) as vectors of booleans
rather than using HImode.

As this implies a lot of mostly mechanical changes, I have tried to
split the patches in a way that should help reviewers, but the split
is a bit artificial.

Patches 1-3 add new tests.

Patches 4-6 are small independent improvements.

Patch 7 implements the predicate qualifier, but does not change any
builtin yet.

Patch 8 is the first of the two main patches, and uses the new
qualifier to describe the vcmp and vpsel builtins that are useful for
auto-vectorization of comparisons.

Patch 9 is the second main patch, which fixes the vcond_mask expander.

Patches 10-13 convert almost all the remaining builtins with HI
operands to use the predicate qualifier.  After these, there are still
a few builtins with HI operands left, about which I am not sure: vctp,
vpnot, load-gather and store-scatter with v2di operands.  In fact,
patches 11/12 update some STR/LDR qualifiers in a way that breaks
these v2di builtins although existing tests still pass.

Christophe Lyon (14):
  arm: Add new tests for comparison vectorization with Neon and MVE
  arm: Add tests for PR target/100757
  arm: Add tests for PR target/101325
  arm: Add GENERAL_AND_VPR_REGS regclass
  arm: Add support for VPR_REG in arm_class_likely_spilled_p
  arm: Fix mve_vmvnq_n_ argument mode
  arm: Implement MVE predicates as vectors of booleans
  arm: Implement auto-vectorized MVE comparisons with vectors of boolean
predicates
  arm: Fix vcond_mask expander for MVE (PR target/100757)
  arm: Convert remaining MVE vcmp builtins to predicate qualifiers
  arm: Convert more MVE builtins to predicate qualifiers
  arm: Convert more load/store MVE builtins to predicate qualifiers
  arm: Convert more MVE/CDE builtins to predicate qualifiers
  arm: Add VPR_REG to ALL_REGS

 gcc/config/arm/arm-builtins.c | 228 +++--
 gcc/config/arm/arm-modes.def  |   5 +
 gcc/config/arm/arm-protos.h   |   3 +-
 gcc/config/arm/arm-simd-builtin-types.def |   4 +
 gcc/config/arm/arm.c  | 130 ++-
 gcc/config/arm/arm.h  |   5 +-
 gcc/config/arm/arm_mve_builtins.def   | 746 
 gcc/config/arm/iterators.md   |   5 +
 gcc/config/arm/mve.md | 832 ++
 gcc/config/arm/neon.md|  39 +
 gcc/config/arm/vec-common.md  |  52 --
 gcc/simplify-rtx.c|  26 +-
 .../arm/acle/cde-mve-full-assembly.c  | 264 +++---
 .../gcc.target/arm/simd/mve-vcmp-f32-2.c  |  32 +
 .../gcc.target/arm/simd/neon-compare-1.c  |  78 ++
 .../gcc.target/arm/simd/neon-compare-2.c  |  13 +
 .../gcc.target/arm/simd/neon-compare-3.c  |  14 +
 .../arm/simd/neon-compare-scalar-1.c  |  57 ++
 .../gcc.target/arm/simd/neon-vcmp-f16.c   |  12 +
 .../gcc.target/arm/simd/neon-vcmp-f32-2.c |  15 +
 .../gcc.target/arm/simd/neon-vcmp-f32-3.c |  12 +
 .../gcc.target/arm/simd/neon-vcmp-f32.c   |  12 +
 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
 .../gcc.target/arm/simd/pr100757-2.c  |  20 +
 .../gcc.target/arm/simd/pr100757-3.c  |  20 +
 .../gcc.target/arm/simd/pr100757-4.c  |  19 +
 gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
 .../gcc.target/arm/simd/pr101325-2.c  |  19 +
 gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
 gcc/testsuite/lib/target-supports.exp |   3 +-
 30 files changed, 1611 insertions(+), 1109 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-2.c
 create mode 

Re: [PATCH] Fix handling of flag_rename_registers.

2021-10-13 Thread Martin Liška

On 10/13/21 10:39, Richard Biener wrote:

On Tue, Oct 12, 2021 at 5:11 PM Martin Liška  wrote:


On 10/12/21 15:37, Richard Biener wrote:

by adding EnabledBy(funroll-loops) to the respective options instead
(and funroll-loops EnabledBy(funroll-all-loops))


All right, so the suggested approach works correctly.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


  funroll-all-loops
-Common Var(flag_unroll_all_loops) Optimization
+Common Var(flag_unroll_all_loops) Optimization EnabledBy(funroll-all-loops)

that should be on funroll-loops?


Yeah, what a stupid error.



Can you verify that the two-step -funroll-all-loops -> -funroll-loops
-> -frename-registers


Yes, verified that in debugger, it's not dependent on an ordering.


works and that it is not somehow dependent on ordering?  Otherwise we have to
use EnabledBy(funroll-loops,funroll-all-loops) on frename-registers.
I guess the
EnabledBy doesn't work if the target decides to set flag_unroll_loop in one of
its hooks rather than via its option table override?  (as said, this
is all a mess...)


It's a complete mess. The only override happens in
rs6000_override_options_after_change. I think it can also utilize EnabledBy, but
I would like to do it in a different patch.



But grep should be your friend telling whether any target overrides
any of the flags...

I do hope we can eventually reduce the number of pre-/post-/lang/target/common
processing phases for options :/  Meh.


Huh.

May I install this fixed patch once it's tested?

Martin



Richard.


Thanks,
Martin




[committed] libstdc++: Ensure language linkage of std::__terminate()

2021-10-13 Thread Jonathan Wakely via Gcc-patches

On 11/10/21 20:38 +0100, Jonathan Wakely wrote:

On 08/10/21 12:23 +0100, Jonathan Wakely wrote:

This adds an inline wrapper for std::terminate that doesn't add the
declaration of std::terminate to namespace std. This allows the
library to terminate without including all of .

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h: Remove unused header.
* include/bits/c++config (std:__terminate): Define.
* include/bits/semaphore_base.h: Remove  and use
__terminate instead of terminate.
* include/bits/std_thread.h: Likewise.
* libsupc++/eh_terminate.cc (std::terminate): Use qualified-id
to call __cxxabiv1::__terminate.


This avoids including a few thousand lines of  just for one
function declaration. Any objections or better ideas?


I've pushed this to trunk now.


And now I've pushed this fix for it too.

Tested x86_64-linux.


commit c1b6c360fcf3fc1c0045c7358d61a83c91b6fa25
Author: Jonathan Wakely 
Date:   Wed Oct 13 10:35:44 2021

libstdc++: Ensure language linkage of std::__terminate()

This is needed because people still find it necessary to do:

  extern "C" {
  #include 
  }

libstdc++-v3/ChangeLog:

* include/bits/c++config (__terminate): Add extern "C++".

diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index b76ffeb2562..a6495809671 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -296,7 +296,7 @@ namespace std
 
   // This allows the library to terminate without including all of 
   // and without making the declaration of std::terminate visible to users.
-  __attribute__ ((__noreturn__, __always_inline__))
+  extern "C++" __attribute__ ((__noreturn__, __always_inline__))
   inline void __terminate() _GLIBCXX_USE_NOEXCEPT
   {
 void terminate() _GLIBCXX_USE_NOEXCEPT __attribute__ ((__noreturn__));


[PATCH] ipa/102714 - IPA SRA eliding volatile

2021-10-13 Thread Richard Biener via Gcc-patches
The following fixes the volatileness check of IPA SRA which was
looking at the innermost reference when checking TREE_THIS_VOLATILE
but the reference to check is the outermost one.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-13  Richard Biener  

PR ipa/102714
* ipa-sra.c (ptr_parm_has_nonarg_uses): Fix volatileness
check.

* gcc.dg/ipa/pr102714.c: New testcase.
---
 gcc/ipa-sra.c   |  40 +-
 gcc/testsuite/gcc.dg/ipa/pr102714.c | 117 
 2 files changed, 139 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr102714.c

diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
index 965e246d788..88036590425 100644
--- a/gcc/ipa-sra.c
+++ b/gcc/ipa-sra.c
@@ -1005,15 +1005,17 @@ ptr_parm_has_nonarg_uses (cgraph_node *node, function 
*fun, tree parm,
   if (gimple_assign_single_p (stmt))
{
  tree rhs = gimple_assign_rhs1 (stmt);
- while (handled_component_p (rhs))
-   rhs = TREE_OPERAND (rhs, 0);
- if (TREE_CODE (rhs) == MEM_REF
- && TREE_OPERAND (rhs, 0) == name
- && integer_zerop (TREE_OPERAND (rhs, 1))
- && types_compatible_p (TREE_TYPE (rhs),
-TREE_TYPE (TREE_TYPE (name)))
- && !TREE_THIS_VOLATILE (rhs))
-   uses_ok++;
+ if (!TREE_THIS_VOLATILE (rhs))
+   {
+ while (handled_component_p (rhs))
+   rhs = TREE_OPERAND (rhs, 0);
+ if (TREE_CODE (rhs) == MEM_REF
+ && TREE_OPERAND (rhs, 0) == name
+ && integer_zerop (TREE_OPERAND (rhs, 1))
+ && types_compatible_p (TREE_TYPE (rhs),
+TREE_TYPE (TREE_TYPE (name
+   uses_ok++;
+   }
}
   else if (is_gimple_call (stmt))
{
@@ -1047,15 +1049,17 @@ ptr_parm_has_nonarg_uses (cgraph_node *node, function 
*fun, tree parm,
  continue;
}
 
- while (handled_component_p (arg))
-   arg = TREE_OPERAND (arg, 0);
- if (TREE_CODE (arg) == MEM_REF
- && TREE_OPERAND (arg, 0) == name
- && integer_zerop (TREE_OPERAND (arg, 1))
- && types_compatible_p (TREE_TYPE (arg),
-TREE_TYPE (TREE_TYPE (name)))
- && !TREE_THIS_VOLATILE (arg))
-   uses_ok++;
+ if (!TREE_THIS_VOLATILE (arg))
+   {
+ while (handled_component_p (arg))
+   arg = TREE_OPERAND (arg, 0);
+ if (TREE_CODE (arg) == MEM_REF
+ && TREE_OPERAND (arg, 0) == name
+ && integer_zerop (TREE_OPERAND (arg, 1))
+ && types_compatible_p (TREE_TYPE (arg),
+TREE_TYPE (TREE_TYPE (name
+   uses_ok++;
+   }
}
}
 
diff --git a/gcc/testsuite/gcc.dg/ipa/pr102714.c 
b/gcc/testsuite/gcc.dg/ipa/pr102714.c
new file mode 100644
index 000..65dd86f5c15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr102714.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-strict-aliasing -fdump-ipa-sra-details 
-fdump-tree-optimized" } */
+
+typedef _Bool bool;
+
+enum {
+ false = 0,
+ true = 1
+};
+
+struct xarray {
+ unsigned int xa_lock;
+ unsigned int xa_flags;
+ void * xa_head;
+
+};
+
+struct list_head {
+ struct list_head *next, *prev;
+};
+
+struct callback_head {
+ struct callback_head *next;
+ void (*func)(struct callback_head *head);
+} __attribute__((aligned(sizeof(void *;
+
+struct xa_node {
+ unsigned char shift;
+ unsigned char offset;
+ unsigned char count;
+ unsigned char nr_values;
+ struct xa_node *parent;
+ struct xarray *array;
+ union {
+  struct list_head private_list;
+  struct callback_head callback_head;
+ };
+ void *slots[(1UL << (0 ? 4 : 6))];
+ union {
+  unsigned long tags[3][1UL << (0 ? 4 : 6))) + (64) - 1) / (64))];
+  unsigned long marks[3][1UL << (0 ? 4 : 6))) + (64) - 1) / (64))];
+ };
+};
+
+static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) 
__attribute__((no_instrument_function)) unsigned long shift_maxindex(unsigned 
int shift)
+{
+ return ((1UL << (0 ? 4 : 6)) << shift) - 1;
+}
+
+static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) 
__attribute__((no_instrument_function)) unsigned long node_maxindex(const 
struct xa_node *node)
+{
+ return shift_maxindex(node->shift);
+}
+
+static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) 
__attribute__((no_instrument_function)) struct xa_node *entry_to_node(void *ptr)
+{
+ return (void *)((unsigned long)ptr & ~2UL);
+}
+
+static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) 
__attribute__((no_instrument_function)) bool 

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-13 Thread HAO CHEN GUI via Gcc-patches



On 13/10/2021 下午 4:29, Richard Biener wrote:

On Wed, Oct 13, 2021 at 9:43 AM HAO CHEN GUI  wrote:

Richard,

Thanks so much for your comments.

As far as I know, VSX/altivec min/max instructions don't conform with 
C-Sytle Min/Max Macro. The fold converts it to MIN/MAX_EXPR then it has a 
chance to be implemented by scalar min/max instructions which conform with 
C-Sytle Min/Max Macro. That's why I made this patch.

As to IEEE behavior, do you mean "Minimum and maximum operations" defined 
in IEEE-754 2019?  If so, I think VSX/altivec min/max instructions don't conform with it. 
It demands a quite NaN if either operand is a NaN while our instructions don't.

IEEE-754 2019 maximum(x, y) is xif x>y, yif y>x, and a quiet NaN if either 
operand is a NaN, according to 6.2. For this operation, +0 compares greater than −0. 
Otherwise (i.e., when x=y and signs are the same) it is either xor y. Actions for 
xvmaxdp

Hmm, then I do not understand the reason for the patch - people using
the intrinsics cannot expect IEEE semantics then.
So you are concerned that people don't get the 1:1 machine instruction
but eventually the IEEE conforming MIN/MAX_EXPR?
But that can then still happen with -ffast-math so I wonder what's the point.

Richard.


The reason for the patch is to keep compatibility between different Power 
servers.  The old servers don't have C-style Min/Max instructions and all are 
implemented by VSX/altivec min/max instructions. So I just want to keep the 
compatibility. For fast-math, the C-style Min/Max should be acceptable, I think.

The IEEE standard changed. VSX/altivec min/max instructions conform with IEEE 
754-2008 (the old standard), but not with IEEE 754-2019 (the last one).

As of 2019, the formerly required/minNum, maxNum, minNumMag, and maxNumMag/in 
IEEE 754-2008 are now deleted due to their non-associativity. Instead, two sets 
of new/minimum and maximum operations/are recommended.

754-2008

maxNum(x, y) is the canonicalized number yif x< y, xif y< x, the canonicalized 
number if one
operand is a number and the other a quiet NaN. Otherwise it is either xor y, 
canonicalized (this
means results might differ among implementations). When either xor yis a 
signalingNaN, then the
result is according to 6.2.

Thanks again.

Gui Haochen




On 12/10/2021 下午 5:57, Richard Biener wrote:

On Tue, Oct 12, 2021 at 10:59 AM HAO CHEN GUI via Gcc-patches
 wrote:

Hi,

  This patch disables gimple folding for float or double vec_min/max when 
fast-math is not set. It makes vec_min/max conform with the guide.

Bootstrapped and tested on powerpc64le-linux with no regressions. Is this okay 
for trunk? Any recommendations? Thanks a lot.

  I re-send the patch as previous one is messed up in email thread. Sorry 
for that.

If the VSX/altivec min/max instructions conform to IEEE behavior then
you could instead fold
to .F{MIN,MAX} internal functions and define the f{min,max} optabs.

Otherwise the patch looks correct to me - MIN_EXPR and MAX_EXPR are
not IEEE conforming.
Note a better check would be to use HONOR_NANS/HONOR_SIGNED_ZEROS on
the argument type
(that also works for the integer types with the obvious answer).

Richard.


ChangeLog

2021-08-25 Haochen Gui 

gcc/
   * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
   Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
   VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

gcc/testsuite/
   * gcc.target/powerpc/vec-minmax-1.c: New test.
   * gcc.target/powerpc/vec-minmax-2.c: Likewise.


patch.diff

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b4e13af4dc6..90527734ceb 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
  return true;
/* flavors of vec_min.  */
case VSX_BUILTIN_XVMINDP:
+case ALTIVEC_BUILTIN_VMINFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+   return false;
+  /* Fall through to MIN_EXPR.  */
+  gcc_fallthrough ();
case P8V_BUILTIN_VMINSD:
case P8V_BUILTIN_VMINUD:
case ALTIVEC_BUILTIN_VMINSB:
@@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
case ALTIVEC_BUILTIN_VMINUB:
case ALTIVEC_BUILTIN_VMINUH:
case ALTIVEC_BUILTIN_VMINUW:
-case ALTIVEC_BUILTIN_VMINFP:
  arg0 = gimple_call_arg (stmt, 0);
  arg1 = gimple_call_arg (stmt, 1);
  lhs = gimple_call_lhs (stmt);
@@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
  return true;
/* flavors of vec_max.  */
case VSX_BUILTIN_XVMAXDP:
+case ALTIVEC_BUILTIN_VMAXFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+   return false;
+  /* Fall through to MAX_EXPR.  */
+  gcc_fallthrough ();
case P8V_BUILTIN_VMAXSD:
case 

Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.

2021-10-13 Thread Uros Bizjak via Gcc-patches
On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle  wrote:
>
>
> Good catch.  I agree with Hongtao that although my testing revealed
> no problems with the previous version of this patch, it makes sense to
> call gen_reg_rtx to generate an pseudo intermediate instead of attempting
> to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
> intermediate.  I've left the existing behaviour the same, so that
> memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-10-13  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
> pseudo intermediate when moving a SUBREG into a hard register,
> by checking ix86_hardreg_mov_ok.
> (ix86_expand_vector_extract): Store zero-extended SImode
> intermediate in a pseudo, then set target using a SUBREG_PROMOTED
> annotated subreg.
> * config/i386/sse.md (mov_internal): Prevent CSE creating
> complex (SUBREG) sets of (vector) hard registers before reload, by
> checking ix86_hardreg_mov_ok.

OK.

Thanks,
Uros.

>
> Thanks,
> Roger
>
> -Original Message-
> From: Hongtao Liu 
> Sent: 11 October 2021 12:29
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 
> backend.
>
> On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle  
> wrote:
> > gcc/ChangeLog
> > * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
> > pseudo intermediate when moving a SUBREG into a hard register,
> > by checking ix86_hardreg_mov_ok.
>
>/* Make operand1 a register if it isn't already.  */
>if (can_create_pseudo_p ()
> -  && !register_operand (op0, mode)
> -  && !register_operand (op1, mode))
> +  && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0,
> + mode)
> +  && !register_operand (op1, mode
>  {
>rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
>
> ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want 
> a pseudo register.
>
> --
> BR,
> Hongtao
>


[PATCH] AVX512FP16: Adjust builtin for mask complex fma

2021-10-13 Thread Hongyu Wang via Gcc-patches
Hi,

Current mask/mask3 implementation for complex fma contains
duplicated parameter in macro, which may cause error at -O0.
Refactor macro implementation to builtins to avoid potential
error.

For round intrinsic with NO_ROUND as input, ix86_erase_embedded_rounding
erases embedded_rounding upspec but could break other emit_insn in
expanders. Skip those expanders with multiple emit_insn for this
function and check rounding in expander with subst.

Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}.
OK for master?

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_mask_fcmadd_pch):
Adjust builtin call.
(_mm512_mask3_fcmadd_pch): Likewise.
(_mm512_mask_fmadd_pch): Likewise
(_mm512_mask3_fmadd_pch): Likewise
(_mm512_mask_fcmadd_round_pch): Likewise
(_mm512_mask3_fcmadd_round_pch): Likewise
(_mm512_mask_fmadd_round_pch): Likewise
(_mm512_mask3_fmadd_round_pch): Likewise
(_mm_mask_fcmadd_sch): Likewise
(_mm_mask3_fcmadd_sch): Likewise
(_mm_mask_fmadd_sch): Likewise
(_mm_mask3_fmadd_sch): Likewise
(_mm_mask_fcmadd_round_sch): Likewise
(_mm_mask3_fcmadd_round_sch): Likewise
(_mm_mask_fmadd_round_sch): Likewise
(_mm_mask3_fmadd_round_sch): Likewise
(_mm_fcmadd_round_sch): Likewise
* config/i386/avx512fp16vlintrin.h (_mm_mask_fmadd_pch):
Adjust builtin call.
(_mm_mask3_fmadd_pch): Likewise
(_mm256_mask_fmadd_pch): Likewise
(_mm256_mask3_fmadd_pch): Likewise
(_mm_mask_fcmadd_pch): Likewise
(_mm_mask3_fcmadd_pch): Likewise
(_mm256_mask_fcmadd_pch): Likewise
(_mm256_mask3_fcmadd_pch): Likewise
* config/i386/i386-builtin.def: Add mask3 builtin for complex
fma, and adjust mask_builtin to corresponding expander.
* config/i386/i386-expand.c (ix86_expand_round_builtin):
Skip eraseing embedded rounding for expanders that emits
multiple insns.
* config/i386/sse.md (complexmove): New mode_attr.
(_fmaddc__mask1): New expander.
(_fcmaddc__mask1): Likewise.
(avx512fp16_fmaddcsh_v8hf_mask1): Likewise.
(avx512fp16_fcmaddcsh_v8hf_mask1): Likewise.
(avx512fp16_fcmaddcsh_v8hf_mask3): Likewise.
(avx512fp16_fmaddcsh_v8hf_mask3): Likewise.
* config/i386/subst.md (round_embedded_complex): New subst.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add new mask3 builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Add scanning for
mask/mask3 intrinsic.
* gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Ditto.
* gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c: New test for
-mavx512vl.
* gcc.target/i386/avx512fp16-vfmaddcsh-1c.c: Ditto.
---
 gcc/config/i386/avx512fp16intrin.h| 261 ++
 gcc/config/i386/avx512fp16vlintrin.h  |  56 ++--
 gcc/config/i386/i386-builtin.def  |  24 +-
 gcc/config/i386/i386-expand.c |  22 +-
 gcc/config/i386/sse.md| 183 
 gcc/config/i386/subst.md  |   3 +
 gcc/testsuite/gcc.target/i386/avx-1.c |   4 +
 .../i386/avx512fp16-vfcmaddcsh-1a.c   |   2 +
 .../i386/avx512fp16-vfcmaddcsh-1c.c   |  13 +
 .../gcc.target/i386/avx512fp16-vfmaddcsh-1a.c |   2 +
 .../gcc.target/i386/avx512fp16-vfmaddcsh-1c.c |  13 +
 gcc/testsuite/gcc.target/i386/sse-13.c|   4 +
 gcc/testsuite/gcc.target/i386/sse-23.c|   4 +
 13 files changed, 375 insertions(+), 216 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1c.c

diff --git a/gcc/config/i386/avx512fp16intrin.h 
b/gcc/config/i386/avx512fp16intrin.h
index 29cf6792335..5e49447a020 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -6258,13 +6258,11 @@ extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_fcmadd_pch (__m512h __A, __mmask16 __B, __m512h __C, __m512h __D)
 {
-  return (__m512h) __builtin_ia32_movaps512_mask
-((__v16sf)
- __builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
- (__v32hf) __C,
- (__v32hf) __D, __B,
- _MM_FROUND_CUR_DIRECTION),
- (__v16sf) __A, __B);
+  return (__m512h)
+__builtin_ia32_vfcmaddcph512_mask_round ((__v32hf) __A,
+(__v32hf) __C,
+(__v32hf) __D, __B,
+_MM_FROUND_CUR_DIRECTION);
 }
 
 extern __inline __m512h
@@ -6272,10 +6270,10 @@ __attribute__ 

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 12, 2021 at 5:21 PM Martin Liška  wrote:
>
> On 10/11/21 15:45, Richard Biener wrote:
> > Btw, I'd be more comfortable when the move of the code would be
> > independent of the adjustment to not rely on AUTODETECT_VALUE.
> > Can we do the latter change first (IIRC the former one failed already)?
>
> All right, so I'm doing the first step by eliminating AUTODETECT_VALUE.
> Note we can't easily use EnabledBy, the option logic is more complicated 
> (like optimize >= 1).
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Let's split this ;)  The debug_inline_points part is OK.

How can debug_variable_location_views be ever -1?  But the
debug_variable_location_views part looks OK as well.

More or less all parts that have the variable assigned in a single
place in gcc/ are OK (dwarf2out_as_locview_support).  But the
main flag_var_tracking* cases need more thorough view,
maybe we can convert them to single-set code first?

Richard.

> Thanks,
> Martin


[Patch] Fortran: dump-parse-tree.c fixes for OpenMP

2021-10-13 Thread Tobias Burnus

I recently saw that 'ancestor:' wasn't handled in -fdump-parse-tree.
I also did run once into an ICE for the swap flag in atomics when
dumping. – This fixes the two.

Additionally, I changed 'ancestor' to a bit field.

Comments? If not, I will commit it later today.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: dump-parse-tree.c fixes for OpenMP

gcc/fortran/ChangeLog:

	* dump-parse-tree.c (show_omp_clauses): Handle ancestor modifier,
	avoid ICE for GFC_OMP_ATOMIC_SWAP.
	* gfortran.h (gfc_omp_clauses): Change 'anecestor' into a bitfield.

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 64e04c043f6..14a307856fc 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1750,6 +1750,8 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   if (omp_clauses->device)
 {
   fputs (" DEVICE(", dumpfile);
+  if (omp_clauses->ancestor)
+	fputs ("ANCESTOR:", dumpfile);
   show_expr (omp_clauses->device);
   fputc (')', dumpfile);
 }
@@ -1894,7 +1896,7 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   if (omp_clauses->atomic_op != GFC_OMP_ATOMIC_UNSET)
 {
   const char *atomic_op;
-  switch (omp_clauses->atomic_op)
+  switch (omp_clauses->atomic_op & GFC_OMP_ATOMIC_MASK)
 	{
 	case GFC_OMP_ATOMIC_READ: atomic_op = "READ"; break;
 	case GFC_OMP_ATOMIC_WRITE: atomic_op = "WRITE"; break;
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index c25d1cca3a8..b2b0254a3c3 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1482,12 +1482,11 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *dist_chunk_size;
   struct gfc_expr *message;
   const char *critical_name;
-  bool ancestor;
   enum gfc_omp_default_sharing default_sharing;
   enum gfc_omp_atomic_op atomic_op;
   enum gfc_omp_defaultmap defaultmap[OMP_DEFAULTMAP_CAT_NUM];
   int collapse, orderedc;
-  unsigned nowait:1, ordered:1, untied:1, mergeable:1;
+  unsigned nowait:1, ordered:1, untied:1, mergeable:1, ancestor:1;
   unsigned inbranch:1, notinbranch:1, nogroup:1;
   unsigned sched_simd:1, sched_monotonic:1, sched_nonmonotonic:1;
   unsigned simd:1, threads:1, depend_source:1, destroy:1, order_concurrent:1;


Re: [PATCH] Fix handling of flag_rename_registers.

2021-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 12, 2021 at 5:11 PM Martin Liška  wrote:
>
> On 10/12/21 15:37, Richard Biener wrote:
> > by adding EnabledBy(funroll-loops) to the respective options instead
> > (and funroll-loops EnabledBy(funroll-all-loops))
>
> All right, so the suggested approach works correctly.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

 funroll-all-loops
-Common Var(flag_unroll_all_loops) Optimization
+Common Var(flag_unroll_all_loops) Optimization EnabledBy(funroll-all-loops)

that should be on funroll-loops?

Can you verify that the two-step -funroll-all-loops -> -funroll-loops
-> -frename-registers
works and that it is not somehow dependent on ordering?  Otherwise we have to
use EnabledBy(funroll-loops,funroll-all-loops) on frename-registers.
I guess the
EnabledBy doesn't work if the target decides to set flag_unroll_loop in one of
its hooks rather than via its option table override?  (as said, this
is all a mess...)

But grep should be your friend telling whether any target overrides
any of the flags...

I do hope we can eventually reduce the number of pre-/post-/lang/target/common
processing phases for options :/  Meh.

Richard.

> Thanks,
> Martin


Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-13 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 9:43 AM HAO CHEN GUI  wrote:
>
> Richard,
>
>Thanks so much for your comments.
>
>As far as I know, VSX/altivec min/max instructions don't conform with 
> C-Sytle Min/Max Macro. The fold converts it to MIN/MAX_EXPR then it has a 
> chance to be implemented by scalar min/max instructions which conform with 
> C-Sytle Min/Max Macro. That's why I made this patch.
>
>As to IEEE behavior, do you mean "Minimum and maximum operations" defined 
> in IEEE-754 2019?  If so, I think VSX/altivec min/max instructions don't 
> conform with it. It demands a quite NaN if either operand is a NaN while our 
> instructions don't.
>
> IEEE-754 2019 maximum(x, y) is xif x>y, yif y>x, and a quiet NaN if either 
> operand is a NaN, according to 6.2. For this operation, +0 compares greater 
> than −0. Otherwise (i.e., when x=y and signs are the same) it is either xor 
> y. Actions for xvmaxdp

Hmm, then I do not understand the reason for the patch - people using
the intrinsics cannot expect IEEE semantics then.
So you are concerned that people don't get the 1:1 machine instruction
but eventually the IEEE conforming MIN/MAX_EXPR?
But that can then still happen with -ffast-math so I wonder what's the point.

Richard.

> On 12/10/2021 下午 5:57, Richard Biener wrote:
> > On Tue, Oct 12, 2021 at 10:59 AM HAO CHEN GUI via Gcc-patches
> >  wrote:
> >> Hi,
> >>
> >>  This patch disables gimple folding for float or double vec_min/max 
> >> when fast-math is not set. It makes vec_min/max conform with the guide.
> >>
> >> Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> >> okay for trunk? Any recommendations? Thanks a lot.
> >>
> >>  I re-send the patch as previous one is messed up in email thread. 
> >> Sorry for that.
> > If the VSX/altivec min/max instructions conform to IEEE behavior then
> > you could instead fold
> > to .F{MIN,MAX} internal functions and define the f{min,max} optabs.
> >
> > Otherwise the patch looks correct to me - MIN_EXPR and MAX_EXPR are
> > not IEEE conforming.
> > Note a better check would be to use HONOR_NANS/HONOR_SIGNED_ZEROS on
> > the argument type
> > (that also works for the integer types with the obvious answer).
> >
> > Richard.
> >
> >> ChangeLog
> >>
> >> 2021-08-25 Haochen Gui 
> >>
> >> gcc/
> >>   * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
> >>   Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
> >>   VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.
> >>
> >> gcc/testsuite/
> >>   * gcc.target/powerpc/vec-minmax-1.c: New test.
> >>   * gcc.target/powerpc/vec-minmax-2.c: Likewise.
> >>
> >>
> >> patch.diff
> >>
> >> diff --git a/gcc/config/rs6000/rs6000-call.c 
> >> b/gcc/config/rs6000/rs6000-call.c
> >> index b4e13af4dc6..90527734ceb 100644
> >> --- a/gcc/config/rs6000/rs6000-call.c
> >> +++ b/gcc/config/rs6000/rs6000-call.c
> >> @@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> >> *gsi)
> >>  return true;
> >>/* flavors of vec_min.  */
> >>case VSX_BUILTIN_XVMINDP:
> >> +case ALTIVEC_BUILTIN_VMINFP:
> >> +  if (!flag_finite_math_only || flag_signed_zeros)
> >> +   return false;
> >> +  /* Fall through to MIN_EXPR.  */
> >> +  gcc_fallthrough ();
> >>case P8V_BUILTIN_VMINSD:
> >>case P8V_BUILTIN_VMINUD:
> >>case ALTIVEC_BUILTIN_VMINSB:
> >> @@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> >> *gsi)
> >>case ALTIVEC_BUILTIN_VMINUB:
> >>case ALTIVEC_BUILTIN_VMINUH:
> >>case ALTIVEC_BUILTIN_VMINUW:
> >> -case ALTIVEC_BUILTIN_VMINFP:
> >>  arg0 = gimple_call_arg (stmt, 0);
> >>  arg1 = gimple_call_arg (stmt, 1);
> >>  lhs = gimple_call_lhs (stmt);
> >> @@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> >> *gsi)
> >>  return true;
> >>/* flavors of vec_max.  */
> >>case VSX_BUILTIN_XVMAXDP:
> >> +case ALTIVEC_BUILTIN_VMAXFP:
> >> +  if (!flag_finite_math_only || flag_signed_zeros)
> >> +   return false;
> >> +  /* Fall through to MAX_EXPR.  */
> >> +  gcc_fallthrough ();
> >>case P8V_BUILTIN_VMAXSD:
> >>case P8V_BUILTIN_VMAXUD:
> >>case ALTIVEC_BUILTIN_VMAXSB:
> >> @@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> >> *gsi)
> >>case ALTIVEC_BUILTIN_VMAXUB:
> >>case ALTIVEC_BUILTIN_VMAXUH:
> >>case ALTIVEC_BUILTIN_VMAXUW:
> >> -case ALTIVEC_BUILTIN_VMAXFP:
> >>  arg0 = gimple_call_arg (stmt, 0);
> >>  arg1 = gimple_call_arg (stmt, 1);
> >>  lhs = gimple_call_lhs (stmt);
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c 
> >> b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
> >> new file mode 100644
> >> index 000..547798fd65c
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c

Re: [PATCH] Warray-bounds: Warn only for generic address spaces

2021-10-13 Thread Siddhesh Poyarekar

On 10/13/21 13:50, Richard Biener wrote:

On Tue, Oct 12, 2021 at 8:34 PM Siddhesh Poyarekar  wrote:


The warning is falsely triggered for THREAD_SELF in glibc when
accessing TCB through the segment register.


I think this is a more generic bug - the warning is also bogus if the
general address space is involved.


I agree, Martin pointed me to a different fix that he posted earlier:

https://patchwork.sourceware.org/project/gcc/patch/806cc630-86ef-2e3f-f72b-68bab2cd3...@gmail.com/

Thanks,
Siddhesh


Re: [PATCH] check to see if null pointer is dereferenceable [PR102630]

2021-10-13 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 3:32 AM Martin Sebor via Gcc-patches
 wrote:
>
> On 10/11/21 6:26 PM, Joseph Myers wrote:
> > The testcase uses the __seg_fs address space, which is x86-specific, but
> > it isn't in an x86-specific directory or otherwise restricted to x86
> > targets; thus, I'd expect it to fail for other architectures.
> >
> > This is not a review of the rest of the patch.
> >
>
> Good point!  I thought I might make the test target-independent
> (via macros) but it looks like just i386 defines the hook to
> something other than false so I should probably move it under
> i386.

The patch is OK with the testcase moved.

Note I don't think we should warn about *(int *)0xdeadbee0,

   /* Pointer constants other than null are most likely the result
-of erroneous null pointer addition/subtraction.  Set size to
-zero.  For null pointers, set size to the maximum for now
-since those may be the result of jump threading.  */

there's too much "may be" and "most likely" for my taste.  How can
the user mark a deliberate valid constant address?

Maybe we can use better (target dependent?) heuristic based on
what virtual addresses are likely unmapped (the zero page, the
page "before" the zero page)?

Richard.


> Thanks
> Martin


[PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.

2021-10-13 Thread Roger Sayle

Good catch.  I agree with Hongtao that although my testing revealed
no problems with the previous version of this patch, it makes sense to
call gen_reg_rtx to generate an pseudo intermediate instead of attempting
to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
intermediate.  I've left the existing behaviour the same, so that
memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.

Ok for mainline?


2021-10-13  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
pseudo intermediate when moving a SUBREG into a hard register,
by checking ix86_hardreg_mov_ok.
(ix86_expand_vector_extract): Store zero-extended SImode
intermediate in a pseudo, then set target using a SUBREG_PROMOTED
annotated subreg.
* config/i386/sse.md (mov_internal): Prevent CSE creating
complex (SUBREG) sets of (vector) hard registers before reload, by
checking ix86_hardreg_mov_ok.

Thanks,
Roger

-Original Message-
From: Hongtao Liu  
Sent: 11 October 2021 12:29
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 
backend.

On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle  wrote:
> gcc/ChangeLog
> * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
> pseudo intermediate when moving a SUBREG into a hard register,
> by checking ix86_hardreg_mov_ok.

   /* Make operand1 a register if it isn't already.  */
   if (can_create_pseudo_p ()
-  && !register_operand (op0, mode)
-  && !register_operand (op1, mode))
+  && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0, 
+ mode)
+  && !register_operand (op1, mode
 {
   rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));

ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a 
pseudo register.

--
BR,
Hongtao

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 3e6f7d8e..4a8fa2f 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -615,6 +615,16 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   return;
 }
 
+  /* If operand0 is a hard register, make operand1 a pseudo.  */
+  if (can_create_pseudo_p ()
+  && !ix86_hardreg_mov_ok (op0, op1))
+{
+  rtx tmp = gen_reg_rtx (GET_MODE (op0));
+  emit_move_insn (tmp, op1);
+  emit_move_insn (op0, tmp);
+  return;
+}
+
   /* Make operand1 a register if it isn't already.  */
   if (can_create_pseudo_p ()
   && !register_operand (op0, mode)
@@ -16005,11 +16015,15 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
rtx vec, int elt)
   /* Let the rtl optimizers know about the zero extension performed.  */
   if (inner_mode == QImode || inner_mode == HImode)
{
+ rtx reg = gen_reg_rtx (SImode);
  tmp = gen_rtx_ZERO_EXTEND (SImode, tmp);
- target = gen_lowpart (SImode, target);
+ emit_move_insn (reg, tmp);
+ tmp = gen_lowpart (inner_mode, reg);
+ SUBREG_PROMOTED_VAR_P (tmp) = 1;
+ SUBREG_PROMOTED_SET (tmp, 1);
}
 
-  emit_insn (gen_rtx_SET (target, tmp));
+  emit_move_insn (target, tmp);
 }
   else
 {
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4559b0c..e43f597 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1270,7 +1270,8 @@
 " C,,vm,v"))]
   "TARGET_SSE
&& (register_operand (operands[0], mode)
-   || register_operand (operands[1], mode))"
+   || register_operand (operands[1], mode))
+   && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
   switch (get_attr_type (insn))
 {


Re: [PATCH] Warray-bounds: Warn only for generic address spaces

2021-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 12, 2021 at 8:34 PM Siddhesh Poyarekar  wrote:
>
> The warning is falsely triggered for THREAD_SELF in glibc when
> accessing TCB through the segment register.

I think this is a more generic bug - the warning is also bogus if the
general address space is involved.

Martin?

> gcc/ChangeLog:
>
> * gimple-array-bounds.cc
> (array_bounds_checker::check_mem_ref): Bail out for
> non-generic address spaces.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/addr-space-3.c: New test case.
>
> Signed-off-by: Siddhesh Poyarekar 
> ---
>  gcc/gimple-array-bounds.cc   | 3 +++
>  gcc/testsuite/gcc.target/i386/addr-space-3.c | 5 +
>  2 files changed, 8 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/addr-space-3.c
>
> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
> index 0517e5ddd8e..36fc1dbe3f8 100644
> --- a/gcc/gimple-array-bounds.cc
> +++ b/gcc/gimple-array-bounds.cc
> @@ -432,6 +432,9 @@ array_bounds_checker::check_mem_ref (location_t location, 
> tree ref,
>if (aref.offset_in_range (axssize))
>  return false;
>
> +  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (axstype)))
> +return false;
> +
>if (TREE_CODE (aref.ref) == SSA_NAME)
>  {
>gimple *def = SSA_NAME_DEF_STMT (aref.ref);
> diff --git a/gcc/testsuite/gcc.target/i386/addr-space-3.c 
> b/gcc/testsuite/gcc.target/i386/addr-space-3.c
> new file mode 100644
> index 000..4bd940e696a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c
> @@ -0,0 +1,5 @@
> +/* Verify that __seg_fs/gs marked variables do not trigger an array bounds
> +   warning.  */
> +/* { dg-do compile */
> +/* { dg-options "-O2 -Warray-bounds" } */
> +#include "addr-space-2.c"
> --
> 2.31.1
>


Re: [PATCH v2] detect out-of-bounds stores by atomic functions [PR102453]

2021-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 12, 2021 at 9:44 PM Martin Sebor  wrote:
>
> On 10/12/21 12:52 AM, Richard Biener wrote:
> > On Mon, Oct 11, 2021 at 11:25 PM Martin Sebor  wrote:
> >>
> >> The attached change extends GCC's warnings for out-of-bounds
> >> stores to cover atomic (and __sync) built-ins.
> >>
> >> Rather than hardcoding the properties of these built-ins just
> >> for the sake of the out-of-bounds detection, on the assumption
> >> that it might be useful for future optimizations as well, I took
> >> the approach of extending class attr_fnspec to express their
> >> special property that they encode the size of the access in their
> >> name.
> >>
> >> I also took the liberty of making attr_fnspec assignable (something
> >> the rest of my patch relies on), and updating some comments for
> >> the characters the class uses to encode function properties, based
> >> on my understanding of their purpose.
> >>
> >> Tested on x86_64-linux.
> >
> > Hmm, so you place 'A' at an odd place (where the return value is specified),
> > but you do not actually specify the behavior on the return value.  Shoudln't
> >
> > + 'A'specifies that the function atomically accesses a constant
> > +   1 << N bytes where N is indicated by character 3+2i
> >
> > maybe read
> >
> >  'A' specifies that the function returns the memory pointed to
> > by argument
> >   one of size 1 << N bytes where N is indicated by
> > character 3 +2i accessed atomically
> >
> > ?
>
> I didn't think the return value would be interesting because in
> general (parallel accesses) it's not related (in an observable
> way) to the value of the dereferenced operand.  Not all
> the built-ins also return a value (e.g., atomic_store), and
> whether or not one does return the argument would need to be
> encoded somehow because it cannot be determined from the return
> type (__atomic_compare_exchange and __atomic_test_and_set return
> bool that's not necessarily the value of the operand).  Also,
> since the functions return the operand value either before or
> after the update, we'd need another letter to describe that.
> (This alone could be dealt with simply by using 'A' and 'a',
> but that's not enough for the other cases.)
>
> So with all these possibilities I don't think encoding
> the return value at this point is worthwhile.  If/when this
> enhancement turns out to be used for optimization and we think
> encoding the return value would be helpful, I'd say let's
> revisit it then.  The accessor APIs should make it a fairly
> straightforward exercise.

I though it would be useful for points-to analysis since knowing how
the return value is composed improves the points-to result for it.

Note that IPA mod-ref now synthesizes fn-spec and might make use
of 'A' if it were not narrowly defined.  Sure it's probably difficult to
fully specify the RMW cycle that's eventually done but since we
have a way to specify a non-constant size of accesses as passed
by a parameter it would be nice to allow specifying a constant size
anyhow.  It just occured to me we could use "fake" parameters to
encode those, so for

void foo (int *);

use like ". R2c4" saying that parameter 1 is read with the size
specified by (non-existing) parameter 2 which is specified as
'c'onstant 1 << 4.

Alternatively a constant size specification could use alternate
encoding 'a' to 'f'.  That said, if 'A' is not suppose to specify
the return value it shouldn't be in the return value specification...

> > I also wonder if it's necessary to constrain this to 'atomic' accesses
> > for the purpose of the patch and whether that detail could be omitted to
> > eventually make more use of it?
>
> I pondered the same question but I couldn't think of any other
> built-ins with similar semantics (read-write-modify, return
> a result either pre- or post-modification), so I opted for
> simplicity.  I am open to generalizing it if/when there is
> a function I could test it with, although I'm not sure
> the current encoding scheme has enough letters and letter
> positions to describe the effects in their full generality.
>
> >
> > Likewise
> >
> > + '0'...'9'  specifies the size of value written/read is given either
> > +   by the specified argument, or for atomic functions, by
> > +   2 ^ N where N is the constant value denoted by the character
> >
> > should mention (excluding '0') for the argument position.
>
> Sure, I'll update the comment if you think this change is worth
> pursuing.
>
> >
> > /* length of the fn spec string.  */
> > -  const unsigned len;
> > +  unsigned len;
> >
> > why that?
>
> The const member is what prevents the struct from being assignable,
> which is what the rest of the patch depends on.
>
> >
> > +  /* Return true of the function is an __atomic or __sync built-in.  */
> >
> > you didn't specify that for 'A' ...
> >
> > +  bool
> > +  atomic_p () const
> > +  {
> > +return str[0] == 'A';
> > +  }
> >
> > +attr_fnspec
> 

[COMMITTED] dwarf2ctf: fix typo in comment

2021-10-13 Thread Jose E. Marchesi via Gcc-patches
gcc/ChangeLog:

* dwarf2ctf.c: Fix typo in comment.
---
 gcc/dwarf2ctf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/dwarf2ctf.c b/gcc/dwarf2ctf.c
index b686bafda44..c9e70798a3b 100644
--- a/gcc/dwarf2ctf.c
+++ b/gcc/dwarf2ctf.c
@@ -33,7 +33,7 @@ static ctf_id_t
 gen_ctf_type (ctf_container_ref, dw_die_ref);
 
 /* All the DIE structures we handle come from the DWARF information
-   generated by GCC.  However, there are two situations where we need
+   generated by GCC.  However, there are three situations where we need
to create our own created DIE structures because GCC doesn't
provide them.
 
-- 
2.11.0



Re: [SVE] [gimple-isel] PR93183 - SVE does not use neg as conditional

2021-10-13 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Mon, 11 Oct 2021 at 20:42, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Fri, 8 Oct 2021 at 21:19, Richard Sandiford
>> >  wrote:
>> >>
>> >> Thanks for looking at this.
>> >>
>> >> Prathamesh Kulkarni  writes:
>> >> > Hi,
>> >> > As mentioned in PR, for the following test-case:
>> >> >
>> >> > typedef unsigned char uint8_t;
>> >> >
>> >> > static inline uint8_t
>> >> > x264_clip_uint8(uint8_t x)
>> >> > {
>> >> >   uint8_t t = -x;
>> >> >   uint8_t t1 = x & ~63;
>> >> >   return (t1 != 0) ? t : x;
>> >> > }
>> >> >
>> >> > void
>> >> > mc_weight(uint8_t *restrict dst, uint8_t *restrict src, int n)
>> >> > {
>> >> >   for (int x = 0; x < n*16; x++)
>> >> > dst[x] = x264_clip_uint8(src[x]);
>> >> > }
>> >> >
>> >> > -O3 -mcpu=generic+sve generates following code for the inner loop:
>> >> >
>> >> > .L3:
>> >> > ld1bz0.b, p0/z, [x1, x2]
>> >> > movprfx z2, z0
>> >> > and z2.b, z2.b, #0xc0
>> >> > movprfx z1, z0
>> >> > neg z1.b, p1/m, z0.b
>> >> > cmpeq   p2.b, p1/z, z2.b, #0
>> >> > sel z0.b, p2, z0.b, z1.b
>> >> > st1bz0.b, p0, [x0, x2]
>> >> > add x2, x2, x4
>> >> > whilelo p0.b, w2, w3
>> >> > b.any   .L3
>> >> >
>> >> > The sel is redundant since we could conditionally negate z0 based on
>> >> > the predicate
>> >> > comparing z2 with 0.
>> >> >
>> >> > As suggested in the PR, the attached patch, introduces a new
>> >> > conditional internal function .COND_NEG, and in gimple-isel replaces
>> >> > the following sequence:
>> >> >op2 = -op1
>> >> >op0 = A cmp B
>> >> >lhs = op0 ? op1 : op2
>> >> >
>> >> > with:
>> >> >op0 = A inverted_cmp B
>> >> >lhs = .COND_NEG (op0, op1, op1).
>> >> >
>> >> > lhs = .COD_NEG (op0, op1, op1)
>> >> > implies
>> >> > lhs = neg (op1) if cond is true OR fall back to op1 if cond is false.
>> >> >
>> >> > With patch, it generates the following code-gen:
>> >> > .L3:
>> >> > ld1bz0.b, p0/z, [x1, x2]
>> >> > movprfx z1, z0
>> >> > and z1.b, z1.b, #0xc0
>> >> > cmpne   p1.b, p2/z, z1.b, #0
>> >> > neg z0.b, p1/m, z0.b
>> >> > st1bz0.b, p0, [x0, x2]
>> >> > add x2, x2, x4
>> >> > whilelo p0.b, w2, w3
>> >> > b.any   .L3
>> >> >
>> >> > While it seems to work for this test-case, I am not entirely sure if
>> >> > the patch is correct. Does it look in the right direction ?
>> >>
>> >> For binary ops we use match.pd rather than isel:
>> >>
>> >> (for uncond_op (UNCOND_BINARY)
>> >>  cond_op (COND_BINARY)
>> >>  (simplify
>> >>   (vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
>> >>   (with { tree op_type = TREE_TYPE (@4); }
>> >>(if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
>> >> op_type)
>> >> && is_truth_type_for (op_type, TREE_TYPE (@0)))
>> >> (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))
>> >>  (simplify
>> >>   (vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
>> >>   (with { tree op_type = TREE_TYPE (@4); }
>> >>(if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
>> >> op_type)
>> >> && is_truth_type_for (op_type, TREE_TYPE (@0)))
>> >> (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type 
>> >> @1)))
>> >>
>> >> I think it'd be good to do the same here, using new (UN)COND_UNARY
>> >> iterators.  (The iterators will only have one value to start with,
>> >> but other unary ops could get the same treatment in future.)
>> > Thanks for the suggestions.
>> > The attached patch adds a pattern to match.pd to replace:
>> > cond = a cmp b
>> > r = cond ? x : -x
>> > with:
>> > cond = a inverted_cmp b
>> > r = cond ? -x : x
>> >
>> > Code-gen with patch for inner loop:
>> > .L3:
>> > ld1bz0.b, p0/z, [x1, x2]
>> > movprfx z1, z0
>> > and z1.b, z1.b, #0xc0
>> > cmpne   p1.b, p2/z, z1.b, #0
>> > neg z0.b, p1/m, z0.b
>> > st1bz0.b, p0, [x0, x2]
>> > add x2, x2, x4
>> > whilelo p0.b, w2, w3
>> > b.any   .L3
>> >
>> > Does it look OK ?
>> > I didn't add it under (UN)COND_UNARY since it inverts the comparison,
>> > which we might not want to do for other unary ops ?
>>
>> I think we should follow the structure of the current binary and
>> ternary patterns: cope with unary operations in either arm of the
>> vec_cond and use bit_not for the case in which the unary operation
>> is in the “false” arm of the vec_cond.
>>
>> The bit_not will be folded away if the comparison can be inverted,
>> but it will be left in-place if the comparison can't be inverted
>> (as for some FP comparisons).
> Ah indeed, done in the attached patch.
> Does it look OK ?

Yeah, this is OK, thanks.

Richard

>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Richard
>>
>> >
>> > Also, I am not sure, how to test if target supports conditional
>> > 

  1   2   >