Re: [PATCH] Optimize _Float16 usage for non AVX512FP16.

2021-11-28 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 29, 2021 at 8:46 AM liuhongt  wrote:
>
> As discussed in PR, this patch do optimizations:
> 1. No memory is needed to move HI/HFmode between GPR and SSE registers
> under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
> AVX512FP16.
> 2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace
> ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant
> initialization cound be eliminated.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and
> x86_64-pc-linux-gnu{-m32\ -march=cadcadelake,\ -march=cascadelake}
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/102811
> * config/i386/i386.c (inline_secondary_memory_needed): HImode
> move between GPR and SSE registers is supported under
> TARGET_SSE2 and above.
> * config/i386/i386.md (extendhfsf2): Optimize expander.
> (truncsfhf2): Ditto.
> * config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to
> align with V8HImode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr102811-2.c: New test.
> * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new
> scan-assembler-times.
> ---
>  gcc/config/i386/i386.c|  5 +++--
>  gcc/config/i386/i386.md   | 18 +++
>  gcc/config/i386/sse.md|  2 +-
>  .../i386/avx512vl-vcvtps2ph-pr102811.c|  2 +-
>  gcc/testsuite/gcc.target/i386/pr102811-2.c| 22 +++
>  5 files changed, 41 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102811-2.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 7cf599f57f7..2657e7817ae 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19437,8 +19437,9 @@ inline_secondary_memory_needed (machine_mode mode, 
> reg_class_t class1,
>if (msize > UNITS_PER_WORD)
> return true;
>
> -  /* In addition to SImode moves, AVX512FP16 also enables HImode moves.  
> */
> -  int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
> +  /* In addition to SImode moves, HImode moves are supported for SSE2 
> and above,
> +Use vmovw with AVX512FP16, or pinsrw/pextrw without AVX512FP16.  */
> +  int minsize = GET_MODE_SIZE (TARGET_SSE2 ? HImode : SImode);
>
>if (msize < minsize)
> return true;
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 2cb3e727588..070758edb66 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -4617,9 +4617,18 @@ (define_expand "extendhfsf2"
>if (!TARGET_AVX512FP16)
>  {
>rtx res = gen_reg_rtx (V4SFmode);
> -  rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
> +  rtx tmp = gen_reg_rtx (V8HFmode);
> +  rtx zero = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
>
> -  ix86_expand_vector_set (false, tmp, operands[1], 0);
> +  if (TARGET_AVX2)
> +   {
> + rtx dup = gen_reg_rtx (V8HFmode);
> + emit_move_insn (dup, gen_rtx_VEC_DUPLICATE (V8HFmode, operands[1]));
> + emit_move_insn (tmp, gen_rtx_VEC_MERGE (V8HFmode, dup,
> + zero, const1_rtx));
> +   }
> +  else
> +   emit_insn (gen_sse2_pinsrph (tmp, zero, operands[1], const1_rtx));
>emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp)));
>emit_move_insn (operands[0], gen_lowpart (SFmode, res));
>DONE;
> @@ -4833,9 +4842,10 @@ (define_expand "truncsfhf2"
>  if (!TARGET_AVX512FP16)
>  {
>rtx res = gen_reg_rtx (V8HFmode);
> -  rtx tmp = force_reg (V4SFmode, CONST0_RTX (V4SFmode));
> +  rtx tmp = gen_reg_rtx (V4SFmode);
> +  rtx zero = force_reg (V4SFmode, CONST0_RTX (V4SFmode));
>
> -  ix86_expand_vector_set (false, tmp, operands[1], 0);
> +  emit_insn (gen_vec_setv4sf_0 (tmp, zero, operands[1]));
>emit_insn (gen_vcvtps2ph (gen_lowpart (V8HImode, res), tmp, GEN_INT 
> (4)));
>emit_move_insn (operands[0], gen_lowpart (HFmode, res));
>DONE;
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 5229b23af98..b371b140eb1 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17272,7 +17272,7 @@ (define_mode_iterator PINSR_MODE
> (V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
>
>  (define_mode_attr sse2p4_1
> -  [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1")
> +  [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse2")
> (V4SI "sse4_1") (V2DI "sse4_1")])
>
>  (define_mode_attr pinsr_evex_isa
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c 
> b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
> index dfbfb167953..9a6c432c866 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mf16c -mno-avx512fp16" } */
> 

Re: [PATCH] Fix regression introduced by r12-5536.

2021-11-28 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 29, 2021 at 2:32 AM liuhongt  wrote:
>
> There're several failures reported in [1]:
> 1.  unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
> %vpextrw should be used in output templates.
> 2. ICE in get_attr_memory for movhi_internal since some alternatives
> are marked as TYPE_SSELOG.
> Explicitly set memory_attr for those alternatives.
>
> Also this patch fixs a typo and some latent bugs which are related to
> moving HImode from/to sse register w/o TARGET_AVX512FP16.
>
> For optimization issues discussed in PR102811, I'll create another patch for
> it.
> [1] https://gcc.gnu.org/pipermail/gcc-regression/2021-November/075893.html
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and
> x86_64-pc-linux-gnu{-m32\ -march=cascadelake,\ -march=cascadelake}
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.c (ix86_secondary_reload): Without
> TARGET_SSE4_1, General register is needed to move HImode from
> sse register to memory.
> * config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of
> pextrw in output templates.
> * config/i386/i386.md (movhi_internal): Ditto, also fix typo of
> MEM_P (operands[1]) and adjust memory/mode/prefix/type
> attribute for alternatives related to sse register.

OK, but please use sselog1 type instead so you don't need to introduce
the memory attribute.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c  |  2 +-
>  gcc/config/i386/i386.md | 44 ++---
>  gcc/config/i386/sse.md  |  6 +++---
>  3 files changed, 36 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 3dedf522c42..7cf599f57f7 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19277,7 +19277,7 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t 
> rclass,
>  }
>
>/* Require movement to gpr, and then store to memory.  */
> -  if (mode == HFmode
> +  if ((mode == HFmode || mode == HImode)
>&& !TARGET_SSE4_1
>&& SSE_CLASS_P (rclass)
>&& !in_p && MEM_P (x))
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 68606e57e60..2cb3e727588 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2528,12 +2528,12 @@ (define_insn "*movhi_internal"
>  case TYPE_SSELOG:
>if (SSE_REG_P (operands[0]))
> return MEM_P (operands[1])
> - ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> - : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> + ? "%vpinsrw\t{$0, %1, %0|%0, %1, 0}"
> + : "%vpinsrw\t{$0, %k1, %0|%0, %k1, 0}";
>else
> -   return MEM_P (operands[1])
> - ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> - : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +   return MEM_P (operands[0])
> + ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
> + : "%vpextrw\t{$0, %1, %k0|%k0, %1, 0}";
>
>  case TYPE_MSKLOG:
>if (operands[1] == const0_rtx)
> @@ -2557,12 +2557,14 @@ (define_insn "*movhi_internal"
>]
>(const_string "*")))
> (set (attr "type")
> - (cond [(eq_attr "alternative" "9,10,11,12,13")
> + (cond [(eq_attr "alternative" "9,10,12,13")
>   (if_then_else (match_test "TARGET_AVX512FP16")
> (const_string "ssemov")
> (const_string "sselog"))
> (eq_attr "alternative" "4,5,6,7")
>   (const_string "mskmov")
> +   (eq_attr "alternative" "11")
> + (const_string "ssemov")
> (eq_attr "alternative" "8")
>   (const_string "msklog")
> (match_test "optimize_function_for_size_p (cfun)")
> @@ -2579,15 +2581,33 @@ (define_insn "*movhi_internal"
>   (const_string "imovx")
>]
>(const_string "imov")))
> +(set (attr "memory")
> +(cond [(eq_attr "alternative" "9,10")
> + (const_string "none")
> +   (eq_attr "alternative" "12")
> + (const_string "load")
> +   (eq_attr "alternative" "13")
> + (const_string "store")
> +   ]
> +   (const_string "*")))

Please use sselog1 type instead, and the memory attribute will be
calculated correctly.

>  (set (attr "prefix")
> -  (if_then_else (eq_attr "alternative" "4,5,6,7,8")
> -   (const_string "vex")
> -   (const_string "orig")))
> +(cond [(eq_attr "alternative" "9,10,11,12,13")
> + (const_string "maybe_evex")
> +   (eq_attr "alternative" "4,5,6,7,8")
> + (const_string "vex")
> +  ]
> +  (const_string "orig")))
>  (set (attr "mode")
>(cond [(eq_attr "type" "imovx")
>(const_string "SI")
> +(eq_attr "alternative" "9,10,12,13")
> +  (if_then_else (match_test "TARGET_AVX512FP16")
> +(const_string "HI")
> + 

[PATCH] Optimize _Float16 usage for non AVX512FP16.

2021-11-28 Thread liuhongt via Gcc-patches
As discussed in PR, this patch do optimizations:
1. No memory is needed to move HI/HFmode between GPR and SSE registers
under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
AVX512FP16.
2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace
ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant
initialization cound be eliminated.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and
x86_64-pc-linux-gnu{-m32\ -march=cadcadelake,\ -march=cascadelake}
Ok for trunk?

gcc/ChangeLog:

PR target/102811
* config/i386/i386.c (inline_secondary_memory_needed): HImode
move between GPR and SSE registers is supported under
TARGET_SSE2 and above.
* config/i386/i386.md (extendhfsf2): Optimize expander.
(truncsfhf2): Ditto.
* config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to
align with V8HImode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102811-2.c: New test.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new
scan-assembler-times.
---
 gcc/config/i386/i386.c|  5 +++--
 gcc/config/i386/i386.md   | 18 +++
 gcc/config/i386/sse.md|  2 +-
 .../i386/avx512vl-vcvtps2ph-pr102811.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr102811-2.c| 22 +++
 5 files changed, 41 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102811-2.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7cf599f57f7..2657e7817ae 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19437,8 +19437,9 @@ inline_secondary_memory_needed (machine_mode mode, 
reg_class_t class1,
   if (msize > UNITS_PER_WORD)
return true;
 
-  /* In addition to SImode moves, AVX512FP16 also enables HImode moves.  */
-  int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
+  /* In addition to SImode moves, HImode moves are supported for SSE2 and 
above,
+Use vmovw with AVX512FP16, or pinsrw/pextrw without AVX512FP16.  */
+  int minsize = GET_MODE_SIZE (TARGET_SSE2 ? HImode : SImode);
 
   if (msize < minsize)
return true;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2cb3e727588..070758edb66 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4617,9 +4617,18 @@ (define_expand "extendhfsf2"
   if (!TARGET_AVX512FP16)
 {
   rtx res = gen_reg_rtx (V4SFmode);
-  rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
+  rtx tmp = gen_reg_rtx (V8HFmode);
+  rtx zero = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
 
-  ix86_expand_vector_set (false, tmp, operands[1], 0);
+  if (TARGET_AVX2)
+   {
+ rtx dup = gen_reg_rtx (V8HFmode);
+ emit_move_insn (dup, gen_rtx_VEC_DUPLICATE (V8HFmode, operands[1]));
+ emit_move_insn (tmp, gen_rtx_VEC_MERGE (V8HFmode, dup,
+ zero, const1_rtx));
+   }
+  else
+   emit_insn (gen_sse2_pinsrph (tmp, zero, operands[1], const1_rtx));
   emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp)));
   emit_move_insn (operands[0], gen_lowpart (SFmode, res));
   DONE;
@@ -4833,9 +4842,10 @@ (define_expand "truncsfhf2"
 if (!TARGET_AVX512FP16)
 {
   rtx res = gen_reg_rtx (V8HFmode);
-  rtx tmp = force_reg (V4SFmode, CONST0_RTX (V4SFmode));
+  rtx tmp = gen_reg_rtx (V4SFmode);
+  rtx zero = force_reg (V4SFmode, CONST0_RTX (V4SFmode));
 
-  ix86_expand_vector_set (false, tmp, operands[1], 0);
+  emit_insn (gen_vec_setv4sf_0 (tmp, zero, operands[1]));
   emit_insn (gen_vcvtps2ph (gen_lowpart (V8HImode, res), tmp, GEN_INT 
(4)));
   emit_move_insn (operands[0], gen_lowpart (HFmode, res));
   DONE;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5229b23af98..b371b140eb1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17272,7 +17272,7 @@ (define_mode_iterator PINSR_MODE
(V2DI "TARGET_SSE4_1 && TARGET_64BIT")])
 
 (define_mode_attr sse2p4_1
-  [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse4_1")
+  [(V16QI "sse4_1") (V8HI "sse2") (V8HF "sse2")
(V4SI "sse4_1") (V2DI "sse4_1")])
 
 (define_mode_attr pinsr_evex_isa
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c 
b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
index dfbfb167953..9a6c432c866 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mf16c -mno-avx512fp16" } */
-/* { dg-final { scan-assembler-times "vpxor\[ \\t\]" 2 } } */
+/* { dg-final { scan-assembler-times "vpxor\[ \\t\]" 1 } } */
 /* { dg-final { scan-assembler-times "vcvtph2ps\[ \\t\]" 2 } } */
 /* { dg-final { scan-assembler-times "vcvtps2ph\[ \\t\]" 1 } } */
 /* { 

[PATCH]middle-end cse: Make sure duplicate elements are not entered into the equivalence set [PR103404]

2021-11-28 Thread Tamar Christina via Gcc-patches
Hi All,

CSE uses equivalence classes to keep track of expressions that all have the same
values at the current point in the program.

Normal equivalences through SETs only insert and perform lookups in this set but
equivalence determined from comparisons, e.g.

(insn 46 44 47 7 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 105 [ iD.2893 ])
(const_int 0 [0]))) "cse.c":18:22 7 {*cmpsi_ccno_1}
 (expr_list:REG_DEAD (reg:SI 105 [ iD.2893 ])
(nil)))

creates the equivalence EQ on (reg:SI 105 [ iD.2893 ]) and (const_int 0 [0]).

This causes a merge to happen between the two equivalence sets denoted by
(const_int 0 [0]) and (reg:SI 105 [ iD.2893 ]) respectively.

The operation happens through merge_equiv_classes however this function has an
invariant that the classes to be merge not contain any duplicates.  This is
because it frees entries before merging.

The given testcase when using the supplied flags trigger an ICE due to the
equivalence set being

(rr) p dump_class (class1)
Equivalence chain for (reg:SI 105 [ iD.2893 ]):
(reg:SI 105 [ iD.2893 ])
$3 = void

(rr) p dump_class (class2)
Equivalence chain for (const_int 0 [0]):
(const_int 0 [0])
(reg:SI 97 [ _10 ])
(reg:SI 97 [ _10 ])
$4 = void

This happens because the original INSN being recorded is

(insn 18 17 24 2 (set (subreg:V1SI (reg:SI 97 [ _10 ]) 0)
(const_vector:V1SI [
(const_int 0 [0])
])) "cse.c":11:9 1363 {*movv1si_internal}
 (expr_list:REG_UNUSED (reg:SI 97 [ _10 ])
(nil)))

and we end up generating two equivalences. the first one is simply that
reg:SI 97 is 0.  The second one is that 0 can be extracted from the V1SI, so
subreg (subreg:V1SI (reg:SI 97) 0) 0 == 0.  This nested subreg gets folded away
to just reg:SI 97 and we re-insert the same equivalence.

This patch changes it so that once we figure out the bucket to insert into we
check if the equivalence set already contains the entry and if so just return
the existing entry and exit.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.


Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR rtl-optimization/103404
* cse.c (insert_with_costs): Check if item exists already before adding
a new entry in the equivalence class.

gcc/testsuite/ChangeLog:

PR rtl-optimization/103404
* gcc.target/i386/pr103404.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/cse.c b/gcc/cse.c
index 
c1c7d0ca27b73c4b944b4719f95fece74e0358d5..08295246c594109e947276051c6776e4cabca4ec
 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -1537,6 +1537,17 @@ insert_with_costs (rtx x, struct table_elt *classp, 
unsigned int hash,
   if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
 add_to_hard_reg_set (_regs_in_table, GET_MODE (x), REGNO (x));
 
+  /* We cannot allow a duplicate to be entered into the equivalence sets
+ and so we should perform a check before we do any allocations or
+ change the buckets.  */
+  if (classp)
+{
+  struct table_elt *p;
+  for (p = classp; p; p = p->next_same_value)
+   if (exp_equiv_p (p->exp, x, 1, false))
+ return p;
+}
+
   /* Put an element for X into the right hash bucket.  */
 
   elt = free_element_chain;
diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c 
b/gcc/testsuite/gcc.target/i386/pr103404.c
new file mode 100644
index 
..66f33645301db09503fc0977fd0f061a19e56ea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103404.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce 
-fno-early-inlining -fgcse -fharden-conditional-branches -frerun-cse-after-loop 
-fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
+
+typedef unsigned __attribute__((__vector_size__ (4))) U;
+typedef unsigned __attribute__((__vector_size__ (16))) V;
+typedef unsigned __attribute__((__vector_size__ (64))) W;
+
+int x, y;
+
+V v;
+W w;
+
+inline
+int bar (U a)
+{
+  a |= x;
+  W k =
+__builtin_shufflevector (v, 5 / a,
+2, 4, 0, 2, 4, 1, 0, 1,
+1, 2, 1, 3, 0, 4, 4, 0);
+  w = k;
+  y = 0;
+}
+
+int
+foo ()
+{
+  bar ((U){0x});
+  for (unsigned i; i < sizeof (foo);)
+;
+}
+


-- 
diff --git a/gcc/cse.c b/gcc/cse.c
index c1c7d0ca27b73c4b944b4719f95fece74e0358d5..08295246c594109e947276051c6776e4cabca4ec 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -1537,6 +1537,17 @@ insert_with_costs (rtx x, struct table_elt *classp, unsigned int hash,
   if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER)
 add_to_hard_reg_set (_regs_in_table, GET_MODE (x), REGNO (x));
 
+  /* We cannot allow a duplicate to be entered into the equivalence sets
+ and so we should perform a check before we do any allocations or
+ change the buckets.  */
+  if (classp)
+{
+  struct table_elt *p;
+  for (p = classp; p; p = p->next_same_value)
+	if (exp_equiv_p (p->exp, x, 1, 

Re: [PATCH] rs6000/test: Add emulated gather test case

2021-11-28 Thread Kewen.Lin via Gcc-patches
on 2021/11/27 上午12:24, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Nov 25, 2021 at 11:20:57AM +0800, Kewen.Lin wrote:
>> This patch is to add a test case similar to the one in i386
>> to add testing coverage for 510.parest_r hotspots.
> 
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vect-gather-1.c: New test.
> 
> This is okay for trunk.  Thanks!
> 

Thanks Segher!  Committed as r12-5569.

BR,
Kewen


[PATCH] Make the path to etags used in the build system configurable [PR103021]

2021-11-28 Thread Eric Gallager via Gcc-patches
The attached patch allows users to specify a path to their `etags`
executable for use when doing `make tags`, which is meant to close PR
other/103021: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103021
I based this patch off of this one from upstream automake:
https://git.savannah.gnu.org/cgit/automake.git/commit/m4?id=d2ccbd7eb38d6a4277d6f42b994eb5a29b1edf29
This means that I just supplied variables that the user can override
for the tags programs, rather than having the configure scripts
actually check for them. I handle etags and ctags separately because
the intl subdirectory has separate targets for them. Tested with `make
tags`; the changes I made work successfully, but some of the
subdirectories still have broken tags targets, so I had to switch to
`make -k tags` part way through. This isn't because of anything I did,
though; the `-k` flag is only necessary because of errors that were
already there before I touched anything. Also note that this patch
only affects the subdirectories that use handwritten Makefiles; the
ones that use automake will have to wait until we update the version
of automake used to be 1.16.4 or newer before they'll be fixed.


patch-configurable-etags.diff
Description: Binary data


[PATCH] Fix regression introduced by r12-5536.

2021-11-28 Thread liuhongt via Gcc-patches
There're several failures reported in [1]:
1.  unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
%vpextrw should be used in output templates.
2. ICE in get_attr_memory for movhi_internal since some alternatives
are marked as TYPE_SSELOG.
Explicitly set memory_attr for those alternatives.

Also this patch fixs a typo and some latent bugs which are related to
moving HImode from/to sse register w/o TARGET_AVX512FP16.

For optimization issues discussed in PR102811, I'll create another patch for
it.
[1] https://gcc.gnu.org/pipermail/gcc-regression/2021-November/075893.html


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and
x86_64-pc-linux-gnu{-m32\ -march=cascadelake,\ -march=cascadelake}
Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.c (ix86_secondary_reload): Without
TARGET_SSE4_1, General register is needed to move HImode from
sse register to memory.
* config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of
pextrw in output templates.
* config/i386/i386.md (movhi_internal): Ditto, also fix typo of
MEM_P (operands[1]) and adjust memory/mode/prefix/type
attribute for alternatives related to sse register.
---
 gcc/config/i386/i386.c  |  2 +-
 gcc/config/i386/i386.md | 44 ++---
 gcc/config/i386/sse.md  |  6 +++---
 3 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3dedf522c42..7cf599f57f7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19277,7 +19277,7 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t 
rclass,
 }
 
   /* Require movement to gpr, and then store to memory.  */
-  if (mode == HFmode
+  if ((mode == HFmode || mode == HImode)
   && !TARGET_SSE4_1
   && SSE_CLASS_P (rclass)
   && !in_p && MEM_P (x))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 68606e57e60..2cb3e727588 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2528,12 +2528,12 @@ (define_insn "*movhi_internal"
 case TYPE_SSELOG:
   if (SSE_REG_P (operands[0]))
return MEM_P (operands[1])
- ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
- : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+ ? "%vpinsrw\t{$0, %1, %0|%0, %1, 0}"
+ : "%vpinsrw\t{$0, %k1, %0|%0, %k1, 0}";
   else
-   return MEM_P (operands[1])
- ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
- : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+   return MEM_P (operands[0])
+ ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
+ : "%vpextrw\t{$0, %1, %k0|%k0, %1, 0}";
 
 case TYPE_MSKLOG:
   if (operands[1] == const0_rtx)
@@ -2557,12 +2557,14 @@ (define_insn "*movhi_internal"
   ]
   (const_string "*")))
(set (attr "type")
- (cond [(eq_attr "alternative" "9,10,11,12,13")
+ (cond [(eq_attr "alternative" "9,10,12,13")
  (if_then_else (match_test "TARGET_AVX512FP16")
(const_string "ssemov")
(const_string "sselog"))
(eq_attr "alternative" "4,5,6,7")
  (const_string "mskmov")
+   (eq_attr "alternative" "11")
+ (const_string "ssemov")
(eq_attr "alternative" "8")
  (const_string "msklog")
(match_test "optimize_function_for_size_p (cfun)")
@@ -2579,15 +2581,33 @@ (define_insn "*movhi_internal"
  (const_string "imovx")
   ]
   (const_string "imov")))
+(set (attr "memory")
+(cond [(eq_attr "alternative" "9,10")
+ (const_string "none")
+   (eq_attr "alternative" "12")
+ (const_string "load")
+   (eq_attr "alternative" "13")
+ (const_string "store")
+   ]
+   (const_string "*")))
 (set (attr "prefix")
-  (if_then_else (eq_attr "alternative" "4,5,6,7,8")
-   (const_string "vex")
-   (const_string "orig")))
+(cond [(eq_attr "alternative" "9,10,11,12,13")
+ (const_string "maybe_evex")
+   (eq_attr "alternative" "4,5,6,7,8")
+ (const_string "vex")
+  ]
+  (const_string "orig")))
 (set (attr "mode")
   (cond [(eq_attr "type" "imovx")
   (const_string "SI")
+(eq_attr "alternative" "9,10,12,13")
+  (if_then_else (match_test "TARGET_AVX512FP16")
+(const_string "HI")
+(const_string "TI"))
 (eq_attr "alternative" "11")
-  (const_string "HF")
+  (if_then_else (match_test "TARGET_AVX512FP16")
+(const_string "HF")
+(const_string "SF"))
 (and (eq_attr "alternative" "1,2")
  (match_operand:HI 1 "aligned_operand"))
   (const_string "SI")
@@ -3791,9 +3811,9 @@ (define_insn "*movhf_internal"
   ? "pinsrw\t{$0, %1, 

Re: [PATCH] tree-optimization: [PR101540] Simplify CONSTRUCTOR for vector(1) to be VCE

2021-11-28 Thread Andrew Pinski via Gcc-patches
On Sun, Nov 28, 2021 at 12:25 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 11/28/2021 10:56 AM, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski 
> >
> > This just adds a simplification to simplify_vector_constructor for
> > vector of 1 element to be VCE which should reduce memory usage in
> > the compiler and maybe allow for some more optimizations.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> >   PR tree-optimization/101540
> >
> > gcc/ChangeLog:
> >
> >   * tree-ssa-forwprop.c (simplify_vector_constructor):
> >   Simplify constructor of vector of 1 element to just
> >   be a VIEW_CONVERT_EXPR.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/tree-ssa/pr101540-1.c: New test.
> So why generate a VCE here if the type conversion is useless?  Why not
> just a NOP_EXPR?  Is there something special about converting between
> the element type and the outer vector type that requires VCE rather than
> NOP_EXR?  Neither an ACK or NAK, just trying to understand it a bit better.


Because right now tree-cfg.c has this check for vector types for NOP_EXPR:
/* Allow conversions between vectors with the same number of elements,
   provided that the conversion is OK for the element types too.  */
if (VECTOR_TYPE_P (lhs_type)
&& VECTOR_TYPE_P (rhs1_type)
&& known_eq (TYPE_VECTOR_SUBPARTS (lhs_type),
 TYPE_VECTOR_SUBPARTS (rhs1_type)))
  {
lhs_type = TREE_TYPE (lhs_type);
rhs1_type = TREE_TYPE (rhs1_type);
  }
else if (VECTOR_TYPE_P (lhs_type) || VECTOR_TYPE_P (rhs1_type))
  {
error ("invalid vector types in nop conversion");
debug_generic_expr (lhs_type);
debug_generic_expr (rhs1_type);
return true;
  }

We can change this check here for NOP_EXPR and vector types but VCE is
still a nop in most cases and handled as such really. But I wonder if
the rest of the compiler is ready for it though.

Thanks,
Andrew Pinski

>
> Jeff
>
>


Re: [PATCH] Fix PR 19089: Environment variable TMP may yield gcc: abort

2021-11-28 Thread Andrew Pinski via Gcc-patches
On Sun, Nov 28, 2021 at 12:14 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 11/27/2021 7:49 PM, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski 
> >
> > Even though I cannot reproduce the ICE any more, this is still
> > a bug. We check already to see if we can access the directory
> > but never check to see if the path is actually a directory.
> >
> > This adds the check and now we reject the file as not usable
> > as a tmp directory.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > libiberty/ChangeLog:
> >
> >   * make-temp-file.c (try_dir): Check to see if the dir
> >   is actually a directory.
> > ---
> >   libiberty/make-temp-file.c | 16 +++-
> >   1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/libiberty/make-temp-file.c b/libiberty/make-temp-file.c
> > index 31f87fbcfde..11eb03d12ec 100644
> > --- a/libiberty/make-temp-file.c
> > +++ b/libiberty/make-temp-file.c
> > @@ -39,6 +39,10 @@ Boston, MA 02110-1301, USA.  */
> >   #if defined(_WIN32) && !defined(__CYGWIN__)
> >   #include 
> >   #endif
> > +#if HAVE_SYS_STAT_H
> > +#include 
> > +#endif
> > +
> >
> >   #ifndef R_OK
> >   #define R_OK 4
> > @@ -76,7 +80,17 @@ try_dir (const char *dir, const char *base)
> >   return base;
> > if (dir != 0
> > && access (dir, R_OK | W_OK | X_OK) == 0)
> > -return dir;
> > +{
> > +  /* Check to make sure dir is actually a directory. */
> > +#ifdef S_ISDIR
> > +  struct stat s;
> > +  if (stat(dir, ))
> Formatting nit, missing whitespace between stat and open paren.
>
> Presumably this doesn't fix the problem in the case where S_ISDIR is not
> defined.  But it's still an improvement.  OK with the nit fixed.

Correct, though I don't know of any host where S_ISDIR is not defined.
Mingw has them defined. So does cygwin. glibc (and all libc on Linux)
has them defined, Solaris and AIX has them defined. So Does Mac OS X.


MSVC does not define them but we don't support MSVC to compile GCC so
that should not be an issue.

Thanks,
Andrew

>
> jeff
>


Re: [committed 03/12] d: Insert null terminator in obstack buffers

2021-11-28 Thread Iain Buclaw via Gcc-patches
Excerpts from Iain Buclaw's message of November 26, 2021 1:35 pm:
> Excerpts from Martin Liška's message of November 25, 2021 3:09 pm:
>> On 7/30/21 13:01, Iain Buclaw via Gcc-patches wrote:
>>> |Covers cases where functions that handle the extracted strings ignore the 
>>> explicit length. This isn't something that's known to happen in the current 
>>> front-end, but the self-hosted front-end has been observed to do this in 
>>> its conversions between D and C-style strings.|
>> 
>> Can you please cherry pick this for gcc-11 branch as I see nasty output when 
>> using --verbose:
>> 
>> $ gcc /home/marxin/Programming/gcc/gcc/testsuite/gdc.dg/attr_optimize4.d -c 
>> --verbose
>> ...
>> predefs   GNU D_Version2 LittleEndian GNU_DWARF2_Exceptions 
>> GNU_StackGrowsDown GNU_InlineAsm D_LP64 assert D_ModuleInfo D_Exceptions 
>> D_TypeInfo all X86_64 D_HardFloat Posix linux CRuntime_Glibc 
>> CppRuntime_Gcc��...
>> 
>> 
> 
> Ouch, I'll have a look at gcc-9 and 10 too to see if they are the same.
> 

FYI, patch applied cleanly to gcc-11 branch and has been committed.
Saw no regressions on x86_64-linux-gnu in both bootstrap and tests.

Checked other branches, however earlier releases used the dmd
front-end's OutBuffer, so are unaffected.

Iain.


[PATCH] Extend usage of user hint in _Hashtable

2021-11-28 Thread François Dumont via Gcc-patches

    libstdc++: In _Hashtable, use insertion hint as much as possible.

    Make use in unordered containers of the user provided hint iterator 
as much as possible.


    Hint is now used:
    - As a hint for allocation, in order to limit memory fragmentation when
    allocator is making use of it.
    - For unordered_set/unordered_map we check if it does not match the 
key of the

    element to insert, before computing the hash code.
    - For unordered_multiset/unordered_multimap, if equals to the key 
of the element
    to insert, the hash code is taken from the hint so that we can take 
advantage of

    the potential hash code cache.

    Moreover, in _M_count_tr and _M_equal_range_tr reuse the first 
matching node key
    to check for other matching nodes to avoid any temporary 
instantiations.


    libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h 
(_NodeBuilder<>::_S_build): Add _NodePtr template

    parameter.
    (_ReuseOrAllocNode::operator()): Add __node_ptr parameter.
    (_AllocNode::operator()): Likewise.
    (_Insert_base::try_emplace): Adapt to use hint.
    (_Hash_code_base<>::_M_hash_code(const 
_Hash_node_value<>&)): New.
    (_Hashtable_base<>::_M_equals<>(const _Kt&, const 
_Hash_node_value<>&)): New.
    (_Hashtable_base<>::_M_equals<>(const _Kt&, __hash_code, 
const _Hash_node_value<>&)):

    Adapt, use latter.
    (_Hashtable_base<>::_M_equals_tr<>(const _Kt&, const 
_Hash_node_value<>&)): New.
    (_Hashtable_base<>::_M_equals_tr<>(const _Kt&, __hash_code, 
const _Hash_node_value<>&)):

    Adapt, use latter.
(_Hashtable_alloc<>::_M_allocate_node(__node_ptr, _Args&&...)): Add 
__node_ptr parameter.

    * include/bits/hashtable.h
(_Hashtable<>::_Scope_node<>(__hashtable_alloc*, __node_ptr, _Args&&...)):
    Add __node_ptr parameter.
    (_Hashtable<>::_M_get_node_hint(size_type, __node_ptr)): New.
    (_Hashtable<>::_M_emplace_unique(const_iterator, 
_Args&&...)): New.
    (_Hashtable<>::_M_emplace_multi(const_iterator, 
_Args&&...)): New.

    (_Hashtable<>::_M_emplace()): Adapt to use latter.
    (_Hashtable<>::_M_insert_unique(const_iterator, _Kt&&, 
_Arg&&, const _NodeGenerator&)):
    (_Hashtable<>::_M_reinsert_node(const_iterator, 
node_type&&)): Add const_iterator.

    Add const_iterator parameter.
    * include/bits/unordered_map.h 
(unordered_map<>::insert(node_type&&)): Pass cend as

    hint.
    (unordered_map<>::insert(const_iterator, node_type&&)): 
Adapt to use hint.
    * include/bits/unordered_set.h 
(unordered_set<>::insert(node_type&&)): Pass cend as

    hint.
    (unordered_set<>::insert(const_iterator, node_type&&)): 
Adapt to use hint.


Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 6e2d4c10cfe..5010cefcd77 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -301,9 +301,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 	// Allocate a node and construct an element within it.
 	template
-	  _Scoped_node(__hashtable_alloc* __h, _Args&&... __args)
+	  _Scoped_node(__hashtable_alloc* __h,
+		   __node_ptr __hint, _Args&&... __args)
 	  : _M_h(__h),
-	_M_node(__h->_M_allocate_node(std::forward<_Args>(__args)...))
+	_M_node(__h->_M_allocate_node(__hint,
+	  std::forward<_Args>(__args)...))
 	  { }
 
 	// Destroy element and deallocate node.
@@ -818,6 +820,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  return nullptr;
 	}
 
+  // Gets a hint after which a node should be allocated given a bucket.
+  __node_ptr
+  _M_get_node_hint(size_type __bkt, __node_ptr __hint = nullptr) const
+  {
+	__node_base_ptr __node;
+	if (__node = _M_buckets[__bkt])
+	  return __node != &_M_before_begin
+	? static_cast<__node_ptr>(__node) : __hint;
+
+	return __hint;
+  }
+
   // Insert a node at the beginning of a bucket.
   void
   _M_insert_bucket_begin(size_type, __node_ptr);
@@ -846,26 +860,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 	std::pair
-	_M_emplace(true_type __uks, _Args&&... __args);
+	_M_emplace_unique(const_iterator, _Args&&... __args);
 
   template
 	iterator
-	_M_emplace(false_type __uks, _Args&&... __args)
-	{ return _M_emplace(cend(), __uks, std::forward<_Args>(__args)...); }
+	_M_emplace_multi(const_iterator, _Args&&... __args);
+
+  template
+	std::pair
+	_M_emplace(true_type /*__uks*/, _Args&&... __args)
+	{ return _M_emplace_unique(cend(), std::forward<_Args>(__args)...); }
 
-  // Emplace with hint, useless when keys are unique.
   template
 	iterator
-	_M_emplace(const_iterator, true_type __uks, _Args&&... __args)
-	{ return _M_emplace(__uks, std::forward<_Args>(__args)...).first; }
+	_M_emplace(false_type 

Re: [PATCH] tree-optimization: [PR101540] Simplify CONSTRUCTOR for vector(1) to be VCE

2021-11-28 Thread Jeff Law via Gcc-patches




On 11/28/2021 10:56 AM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

This just adds a simplification to simplify_vector_constructor for
vector of 1 element to be VCE which should reduce memory usage in
the compiler and maybe allow for some more optimizations.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/101540

gcc/ChangeLog:

* tree-ssa-forwprop.c (simplify_vector_constructor):
Simplify constructor of vector of 1 element to just
be a VIEW_CONVERT_EXPR.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr101540-1.c: New test.
So why generate a VCE here if the type conversion is useless?  Why not 
just a NOP_EXPR?  Is there something special about converting between 
the element type and the outer vector type that requires VCE rather than 
NOP_EXR?  Neither an ACK or NAK, just trying to understand it a bit better.


Jeff




Re: [PATCH] Fix PR 19089: Environment variable TMP may yield gcc: abort

2021-11-28 Thread Jeff Law via Gcc-patches




On 11/27/2021 7:49 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

Even though I cannot reproduce the ICE any more, this is still
a bug. We check already to see if we can access the directory
but never check to see if the path is actually a directory.

This adds the check and now we reject the file as not usable
as a tmp directory.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

libiberty/ChangeLog:

* make-temp-file.c (try_dir): Check to see if the dir
is actually a directory.
---
  libiberty/make-temp-file.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/libiberty/make-temp-file.c b/libiberty/make-temp-file.c
index 31f87fbcfde..11eb03d12ec 100644
--- a/libiberty/make-temp-file.c
+++ b/libiberty/make-temp-file.c
@@ -39,6 +39,10 @@ Boston, MA 02110-1301, USA.  */
  #if defined(_WIN32) && !defined(__CYGWIN__)
  #include 
  #endif
+#if HAVE_SYS_STAT_H
+#include 
+#endif
+
  
  #ifndef R_OK

  #define R_OK 4
@@ -76,7 +80,17 @@ try_dir (const char *dir, const char *base)
  return base;
if (dir != 0
&& access (dir, R_OK | W_OK | X_OK) == 0)
-return dir;
+{
+  /* Check to make sure dir is actually a directory. */
+#ifdef S_ISDIR
+  struct stat s;
+  if (stat(dir, ))

Formatting nit, missing whitespace between stat and open paren.

Presumably this doesn't fix the problem in the case where S_ISDIR is not 
defined.  But it's still an improvement.  OK with the nit fixed.


jeff



Re: [PATCH] Fix PR 62157: disclean in libsanitizer not working

2021-11-28 Thread Jeff Law via Gcc-patches




On 11/27/2021 6:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

So what is happening is DIST_SUBDIRS contains the conditional
directories which is wrong, so we need to force DIST_SUBDIRS
to be the same as SUBDIRS as recommened by the automake manual.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Also now make distclean works inside libsanitizer directory.

libsanitizer/ChangeLog:

PR sanitizer/62157
* Makefile.am: Force DIST_SUBDIRS to be SUBDIRS.
* Makefile.in: Regenerate.
* asan/Makefile.in: Likewise.
* hwasan/Makefile.in: Likewise.
* interception/Makefile.in: Likewise.
* libbacktrace/Makefile.in: Likewise.
* lsan/Makefile.in: Likewise.
* sanitizer_common/Makefile.in: Likewise.
* tsan/Makefile.in: Likewise.
* ubsan/Makefile.in: Likewise.

OK
jeff



Re: [RFC][PATCH] c++/46476 - implement -Wunreachable-code-return

2021-11-28 Thread Jeff Law via Gcc-patches




On 11/26/2021 5:18 AM, Richard Biener via Gcc-patches wrote:

This implements a subset of -Wunreachable-code, unreachable code
after a return stmt.  Contrary to the previous attemt at CFG
construction time this implements the bits during GIMPLE lowering
where there are still all GIMPLE return stmts in the IL.

The lowering phase keeps track of whether stmts can fallthru
which is used to determine if the following stmt is reachable.
The implementation only considers labels here.

The fallthru flag is transparently extended to allow tracking
a reason for non-fallthruness which is used to mark returns.

This patch runs in to the same stray return/gcc_unreachable as the
previous one and thus requires cleanup across the GCC code base
which seems controversical.  So I'm putting this on hold unless
I receive some OK for cleanup in any way, meaning this isn't
going to make stage3.

Sorry.

Richard.

2021-11-26  Richard Biener  

PR c++/46476
gcc/cp/
* decl.c (finish_function): Set input_location to
BUILTINS_LOCATION around the code building the return 0
for main().
* cp-gimplify.c (genericize_if_stmt): Avoid optimizing if (true)
and if (false) when -Wunreachable-code-return is in effect.

gcc/
* common.opt (Wunreachable-code): Re-enable.
(Wunreachable-code-return): New diagnostic, enabled by
-Wextra and -Wunreachable-code.
* doc/invoke.texi (Wunreachable-code): Document.
(Wunreachable-code-return): Likewise.
* gimple-low.c: Include diagnostic.h.
(struct cft_reason): New.
(lower_data::cannot_fallthru): Make a cft_reason.
(lower_stmt): Diagnose unreachable stmts after a return.
* Makefile.in (insn-emit.o-warn): Disable
-Wunreachable-code-return.

gcc/testsuite/
* c-c++-common/Wunreachable-code-return-1.c: New testcase.

I wouldn't object to this moving forward.  I've already ACK'd the cleanups.

Jeff



Re: [PATCH] x86_64: PR target/100711: Splitters for pandn

2021-11-28 Thread Uros Bizjak via Gcc-patches
On Sun, Nov 28, 2021 at 2:25 PM Roger Sayle  wrote:
>
>
> This patch addresses PR target/100711 by introducing define_split
> patterns so that not/broadcast/pand may be simplified (by combine)
> to broadcast/pandn.  This introduces two splitters one for optimizing
> pandn on TARGET_SSE for V4SI and V2DI, and another for vpandn on
> TARGET_AVX2 for V16QI, V8HI, V32QI, V16HI and V8SI.  Each splitter
> has its own new testcase.
>
> I've also confirmed that not/broadcast/pandn is already getting
> simplified to broadcast/pand by the middle-end optimizers.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?
>
>
> 2021-11-28  Roger Sayle  
>
> gcc/ChangeLog
> PR target/100711
> * config/i386/sse.md (define_split): New splitters to simplify
> not;vec_duplicate;and as vec_duplicate;andn.
>
> gcc/testsuite/ChangeLog
> PR target/100711
> * gcc.target/i386/pr100711-1.c: New test case.
> * gcc.target/i386/pr100711-2.c: New test case.


+;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
+(define_split
+  [(set (match_operand:VI48_128 0 "register_operand")
+ (and:VI48_128
+  (vec_duplicate:VI48_128
+(not:
+  (match_operand: 1 "register_operand")))
+  (match_operand:VI48_128 2 "register_operand")))]

You can use "vector_operand" here, the resulting PANDN can handle these.

+  "TARGET_SSE && can_create_pseudo_p ()"

This is a combine splitter, so can_create_pseudo_p () is not needed,
because it runs only during the combine phase.

FYI, the combine splitter is somehow different than normal splitter,
the important part from the documentation is, that the insn is *not*
matched by some define_insn pattern, and the split results in exactly
two patterns:

The insn combiner phase also splits putative insns.  If three insns are
merged into one insn with a complex expression that cannot be matched by
some 'define_insn' pattern, the combiner phase attempts to split the
complex pattern into two insns that are recognized.  Usually it can
break the complex pattern into two patterns by splitting out some
subexpression.  However, in some other cases, such as performing an
addition of a large constant in two insns on a RISC machine, the way to
split the addition into two insns is machine-dependent.

+  [(set (match_dup 3)
+ (vec_duplicate:VI48_128 (match_dup 1)))
+   (set (match_dup 0)
+ (and:VI48_128 (not:VI48_128 (match_dup 3))
+  (match_dup 2)))]
+  "operands[3] = gen_reg_rtx (mode);")
+
+;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
+(define_split
+  [(set (match_operand:VI124_AVX2 0 "register_operand")
+ (and:VI124_AVX2
+  (vec_duplicate:VI124_AVX2
+(not:
+  (match_operand: 1 "register_operand")))
+  (match_operand:VI124_AVX2 2 "register_operand")))]
+  "TARGET_AVX2 && can_create_pseudo_p ()"
+  [(set (match_dup 3)
+ (vec_duplicate:VI124_AVX2 (match_dup 1)))
+   (set (match_dup 0)
+ (and:VI124_AVX2 (not:VI124_AVX2 (match_dup 3))
+ (match_dup 2)))]
+  "operands[3] = gen_reg_rtx (mode);")

Same here as above.

+/* { dg-do compile } */
+/* { dg-options "-O2" } */

Please add -msse2 here, 32bit targets do not enable SSE by default,
and please check if they handle DImode long long at all.

Also, please run tests for x86_64 and i386 targets. The testsuite
should be ran with:

make -k check RUNTESTFLAGS="--target_board=unix\{,-m32\}"

(Eventually, you can use check-gcc instead of check and/or add
i386.exp after --target-board.)

Uros.

+typedef int v4si __attribute__((vector_size (16)));
+typedef long long v2di __attribute__((vector_size (16)));
+
+v4si foo (int a, v4si b)
+{
+return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b;
+}
+
+v2di bar (long long a, v2di b)
+{
+return (__extension__ (v2di) {~a, ~a}) & b;
+}

>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] Restore can_be_invalidated_p semantics to before refactoring

2021-11-28 Thread Jeff Law via Gcc-patches




On 11/26/2021 12:53 AM, Richard Biener via Gcc-patches wrote:

This restores the semantics of can_be_invalidated_p to the original
semantics of the function this was split out from tree-ssa-uninit.c.
The current semantics only ever look at the first predicate which
cannot be correct.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-11-26  Richard Biener  

* gimple-predicate-analysis.cc (can_be_invalidated_p):
Restore semantics to the one before the split from
tree-ssa-uninit.c.

OK.  Sorry this got missed in the review of splitting out those bits.

jeff



Re: [PATCH] Remove unreachable returns

2021-11-28 Thread Jeff Law via Gcc-patches




On 11/25/2021 7:16 AM, Richard Biener via Gcc-patches wrote:

This removes unreachable return statements as diagnosed by
the -Wunreachable-code patch.  Some cases are more obviously
an improvement than others - in fact some may get you the idea
to replace them with gcc_unreachable () instead, leading to
cases of the 'Remove unreachable gcc_unreachable () at the end
of functions' patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?  Comments?  Feel free to approve select cases only.

Thanks,
Richard.

2021-11-25  Richard Biener  

* vec.c (qsort_chk): Do not return the void return value
from the noreturn qsort_chk_error.
* ccmp.c (expand_ccmp_expr_1): Remove unreachable return.
* df-scan.c (df_ref_equal_p): Likewise.
* dwarf2out.c (is_base_type): Likewise.
(add_const_value_attribute): Likewise.
* fixed-value.c (fixed_arithmetic): Likewise.
* gimple-fold.c (gimple_fold_builtin_fputs): Likewise.
* gimple-ssa-strength-reduction.c (stmt_cost): Likewise.
* graphite-isl-ast-to-gimple.c
(gcc_expression_from_isl_expr_op): Likewise.
(gcc_expression_from_isl_expression): Likewise.
* ipa-fnsummary.c (will_be_nonconstant_expr_predicate):
Likewise.
* lto-streamer-in.c (lto_input_mode_table): Likewise.

gcc/c-family/
* c-opts.c (c_common_post_options): Remove unreachable return.
* c-pragma.c (handle_pragma_target): Likewise.
(handle_pragma_optimize): Likewise.

gcc/c/
* c-typeck.c (c_tree_equal): Remove unreachable return.
* c-parser.c (get_matching_symbol): Likewise.

libgomp/
* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable
return.

I'd commit the whole set.

jeff



Re: [PATCH] x86_64: Improved V1TImode rotations by non-constant amounts.

2021-11-28 Thread Uros Bizjak via Gcc-patches
On Sun, Nov 28, 2021 at 3:02 PM Roger Sayle  wrote:
>
>
> This patch builds on the recent improvements to TImode rotations (and
> Jakub's fixes to shldq/shrdq patterns).  Now that expanding a TImode
> rotation can never fail, it is safe to allow general_operand constraints
> on the QImode shift amounts in rotlv1ti3 and rotrv1ti3 patterns.
> I've also made an additional tweak to ix86_expand_v1ti_to_ti to use
> vec_extract via V2DImode, which avoid using memory and takes advantage
> vpextrq on recent hardware.
>
> For the following test case:
>
> typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
> uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
>
> GCC with -O2 -mavx2 would previously generate:
>
> rotr:   vmovdqa %xmm0, -24(%rsp)
> movq-16(%rsp), %rdx
> movl%edi, %ecx
> xorl%esi, %esi
> movq-24(%rsp), %rax
> shrdq   %rdx, %rax
> shrq%cl, %rdx
> testb   $64, %dil
> cmovne  %rdx, %rax
> cmovne  %rsi, %rdx
> negl%ecx
> xorl%edi, %edi
> andl$127, %ecx
> vmovq   %rax, %xmm2
> movq-24(%rsp), %rax
> vpinsrq $1, %rdx, %xmm2, %xmm1
> movq-16(%rsp), %rdx
> shldq   %rax, %rdx
> salq%cl, %rax
> testb   $64, %cl
> cmovne  %rax, %rdx
> cmovne  %rdi, %rax
> vmovq   %rax, %xmm3
> vpinsrq $1, %rdx, %xmm3, %xmm0
> vpor%xmm1, %xmm0, %xmm0
> ret
>
> with this patch, we now generate:
>
> rotr:   movl%edi, %ecx
> vpextrq $1, %xmm0, %rax
> vmovq   %xmm0, %rdx
> shrdq   %rax, %rdx
> vmovq   %xmm0, %rsi
> shrdq   %rsi, %rax
> andl$64, %ecx
> movq%rdx, %rsi
> cmovne  %rax, %rsi
> cmove   %rax, %rdx
> vmovq   %rsi, %xmm0
> vpinsrq $1, %rdx, %xmm0, %xmm0
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?
>
>
> 2021-11-28  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Perform the
> conversion via V2DImode using vec_extractv2didi on TARGET_SSE2.
> * config/i386/sse.md (rotlv1ti3, rotrv1ti3): Change constraint
> on QImode shift amounts from const_int_operand to general_operand.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse2-v1ti-rotate.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] Remove unreachable gcc_unreachable () at the end of functions

2021-11-28 Thread Jeff Law via Gcc-patches




On 11/25/2021 6:33 AM, Richard Biener via Gcc-patches wrote:

It seems to be a style to place gcc_unreachable () after a
switch that handles all cases with every case returning.
Those are unreachable (well, yes!), so they will be elided
at CFG construction time and the middle-end will place
another __builtin_unreachable "after" them to note the
path doesn't lead to a return when the function is not declared
void.

So IMHO those explicit gcc_unreachable () serve no purpose,
if they could be replaced by a comment.  But since all cases
cover switches not handling a case or not returning will
likely cause some diagnostic to be emitted which is better
than running into an ICE only at runtime.

Bootstrapped and tested on x86_64-unknown-linux-gnu - any
comments?

Thanks,
Richard.

2021-11-24  Richard Biener  

* tree.h (reverse_storage_order_for_component_p): Remove
spurious gcc_unreachable.
* cfganal.c (dfs_find_deadend): Likewise.
* fold-const-call.c (fold_const_logb): Likewise.
(fold_const_significand): Likewise.
* gimple-ssa-store-merging.c (lhs_valid_for_store_merging_p):
Likewise.

gcc/c-family/
* c-format.c (check_format_string): Remove spurious
gcc_unreachable.
They would be a check if someone added a case to the switch that didn't 
return.  But we'd get a return-value warning if that happened.  So I 
don't see that they serve much purpose.




---
  gcc/c-family/c-format.c| 2 --
  gcc/cfganal.c  | 2 --
  gcc/fold-const-call.c  | 2 --
  gcc/gimple-ssa-store-merging.c | 2 --
  gcc/tree.h | 2 --
  5 files changed, 10 deletions(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index e735e092043..617fb5ea626 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -296,8 +296,6 @@ check_format_string (const_tree fntype, unsigned 
HOST_WIDE_INT format_num,
*no_add_attrs = true;
return false;
  }
-
-  gcc_unreachable ();
  }
  
  /* Under the control of FLAGS, verify EXPR is a valid constant that

diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index 0cba612738d..48598e55c01 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -752,8 +752,6 @@ dfs_find_deadend (basic_block bb)
  next = e ? e->dest : EDGE_SUCC (bb, 0)->dest;
}
  }
-
-  gcc_unreachable ();
  }
  
  
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c

index d6cb9b11a31..c542e780a18 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -429,7 +429,6 @@ fold_const_logb (real_value *result, const real_value *arg,
}
return false;
  }
-  gcc_unreachable ();
  }
  
  /* Try to evaluate:

@@ -463,7 +462,6 @@ fold_const_significand (real_value *result, const 
real_value *arg,
}
return false;
  }
-  gcc_unreachable ();
  }
  
  /* Try to evaluate:

diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index e7c90ba8b59..13413ca4cd6 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -4861,8 +4861,6 @@ lhs_valid_for_store_merging_p (tree lhs)
  default:
return false;
  }
-
-  gcc_unreachable ();
  }
  
  /* Return true if the tree RHS is a constant we want to consider

diff --git a/gcc/tree.h b/gcc/tree.h
index f0e72b55abe..094501bd9b1 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -5110,8 +5110,6 @@ reverse_storage_order_for_component_p (tree t)
  default:
return false;
  }
-
-  gcc_unreachable ();
  }
  
  /* Return true if T is a storage order barrier, i.e. a VIEW_CONVERT_EXPR




Compare guessed profile frequencies to actual profile feedback in profile dump file

2021-11-28 Thread Jan Hubicka via Gcc-patches
Hi,
this patch adds simple code to dump and compare frequencies of basic blocks
read from the profile feedback and frequencies guessed statically.
It dumps basic blocks in the order of decreasing frequencies from feedback
along with guessed frequencies and histograms.

It makes it to possible spot basic blocks in hot regions that are considered
cold by guessed profile or vice versa.

I am trying to figure out how realistic our profile estimate is compared to
read one on exchange2 (looking again into PR98782.  There IRA now places spills
into hot regions of code while with older (and worse) profile it did not.
Catch is that the function is very large and has 9 nested loops, so it is hard
to figure out how to improve the profile estimate and/or IRA.

So here I get:
 Basic block  136 guessed freq:   17.548 cummulative:  0.60%  feedback 
freq:   51.848 cummulative:   1.94% cnt: 101811269914
 Basic block  137 guessed freq:   15.618 cummulative:  1.14%  feedback 
freq:   46.471 cummulative:   3.69% cnt: 101623431810
 Basic block  258 guessed freq:   15.256 cummulative:  1.67%  feedback 
freq:   46.467 cummulative:   5.43% cnt: 101623295779
 Basic block  155 guessed freq:   25.458 cummulative:  2.54%  feedback 
freq:   39.772 cummulative:   6.92% cnt: 101389409933
 Basic block   98 guessed freq:9.773 cummulative:  2.88%  feedback 
freq:   36.496 cummulative:   8.29% cnt: 101274937256
 Basic block  156 guessed freq:   22.658 cummulative:  3.66%  feedback 
freq:   35.642 cummulative:   9.62% cnt: 101245112573
 Basic block  242 guessed freq:   22.296 cummulative:  4.42%  feedback 
freq:   35.638 cummulative:  10.96% cnt: 101244976542
 Basic block   99 guessed freq:8.698 cummulative:  4.72%  feedback 
freq:   32.558 cummulative:  12.18% cnt: 101137388587
 Basic block  290 guessed freq:8.336 cummulative:  5.01%  feedback 
freq:   32.554 cummulative:  13.40% cnt: 101137252556
 Basic block   79 guessed freq:7.975 cummulative:  5.28%  feedback 
freq:   31.622 cummulative:  14.58% cnt: 101104687116
 Basic block   80 guessed freq:7.098 cummulative:  5.53%  feedback 
freq:   28.448 cummulative:  15.65% cnt: 10993797250
 Basic block  306 guessed freq:6.735 cummulative:  5.76%  feedback 
freq:   28.444 cummulative:  16.72% cnt: 10993661219
 Basic block  101 guessed freq:8.807 cummulative:  6.06%  feedback 
freq:   26.463 cummulative:  17.71% cnt: 10924453185
 Basic block  276 guessed freq:6.996 cummulative:  6.30%  feedback 
freq:   26.443 cummulative:  18.70% cnt: 10923773030
 Basic block   82 guessed freq:8.622 cummulative:  6.60%  feedback 
freq:   23.648 cummulative:  19.59% cnt: 10826108200
 Basic block  292 guessed freq:6.449 cummulative:  6.82%  feedback 
freq:   23.624 cummulative:  20.47% cnt: 10825292014
 Basic block  117 guessed freq:   12.720 cummulative:  7.26%  feedback 
freq:   23.190 cummulative:  21.34% cnt: 10810135530
 Basic block   63 guessed freq:8.673 cummulative:  7.56%  feedback 
freq:   22.247 cummulative:  22.17% cnt: 10777181279
 Basic block  308 guessed freq:6.139 cummulative:  7.77%  feedback 
freq:   22.220 cummulative:  23.01% cnt: 10776229062
 Basic block  120 guessed freq:9.170 cummulative:  8.09%  feedback 
freq:   21.523 cummulative:  23.81% cnt: 10751896540
 Basic block  260 guessed freq:7.721 cummulative:  8.35%  feedback 
freq:   21.508 cummulative:  24.62% cnt: 10751352416
 Basic block   44 guessed freq:8.949 cummulative:  8.66%  feedback 
freq:   21.257 cummulative:  25.42% cnt: 10742592992
 Basic block  324 guessed freq:6.052 cummulative:  8.87%  feedback 
freq:   21.226 cummulative:  26.21% cnt: 10741504744
 Basic block  102 guessed freq:7.046 cummulative:  9.11%  feedback 
freq:   21.170 cummulative:  27.01% cnt: 10739562548
 Basic block  277 guessed freq:5.597 cummulative:  9.30%  feedback 
freq:   21.155 cummulative:  27.80% cnt: 10739018424
 Basic block  123 guessed freq:   20.841 cummulative: 10.02%  feedback 
freq:   20.405 cummulative:  28.56% cnt: 10712829670
 Basic block  262 guessed freq:   17.548 cummulative: 10.62%  feedback 
freq:   20.386 cummulative:  29.33% cnt: 10712168948
 Basic block  104 guessed freq:   16.013 cummulative: 11.17%  feedback 
freq:   20.014 cummulative:  30.08% cnt: 10699178597
 Basic block  278 guessed freq:   12.720 cummulative: 11.61%  feedback 
freq:   19.995 cummulative:  30.83% cnt: 10698517875
 Basic block   83 guessed freq:7.185 cummulative: 11.86%  feedback 
freq:   19.706 cummulative:  31.57% cnt: 10688423500
 Basic block  293 guessed freq:5.374 cummulative: 12.04%  feedback 
freq:   19.687 cummulative:  32.30% cnt: 10687743345
 Basic block   64 guessed freq:7.434 cummulative: 12.30%  feedback 
freq:   

Re: [PATCH] [RFC] unreachable returns

2021-11-28 Thread Jeff Law via Gcc-patches




On 11/25/2021 6:23 AM, Richard Biener via Gcc-patches wrote:

We have quite a number of "default" returns that cannot be reached.
One is particularly interesting since it says (see patch below):

  default:
gcc_unreachable ();
  }
/* We can get here with --disable-checking.  */
return false;

which suggests that _maybe_ the intention was to have the
gcc_unreachable () which expands to __builtin_unreachable ()
with --disable-checking and thus a fallthru to "somewhere"
be catched with a "sane" default return value rather than
falling through to the next function or so.  BUT - that
isn't what actually happens since the 'return false' is
unreachable after CFG construction and will be elided.

In fact the IL after CFG construction is exactly the same
with and without the spurious return.

Now, I wonder if we should, instead of expanding
gcc_unreachable to __builtin_unreachable () with
--disable-checking, expand it to __builtin_trap ()
(or remove the --disable-checking variant completely,
always retaining assert level checking but maybe make
it cheaper in size by using __builtin_trap () or abort ())

Thoughts?

That said, I do have a set of changes removing such spurious
returns.

2021-11-25  Richard Biener  

gcc/c/
* c-typeck.c (c_tree_equal): Remove unreachable return.
I'd bet if you dig into the history you'll find that the return was 
added first to make enable-checking happy, then later we added the 
gcc_unreachable().


I think expanding to __builtin_trap is highly preferable to 
__builtin_unreachable and it's probably the lowest overhead option. I 
can also live with removing the -disable-checking variant and instead 
using something that always halts execution.  Once we're always halting 
execution on that path I have no objection to removing the extraneous 
return.


jeff


[PATCH] tree-optimization: [PR101540] Simplify CONSTRUCTOR for vector(1) to be VCE

2021-11-28 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This just adds a simplification to simplify_vector_constructor for
vector of 1 element to be VCE which should reduce memory usage in
the compiler and maybe allow for some more optimizations.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/101540

gcc/ChangeLog:

* tree-ssa-forwprop.c (simplify_vector_constructor):
Simplify constructor of vector of 1 element to just
be a VIEW_CONVERT_EXPR.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr101540-1.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c | 13 +
 gcc/tree-ssa-forwprop.c| 13 +
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c
new file mode 100644
index 000..73fb342e029
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr101540-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop1" } */
+/* PR tree-optimization/101540 */
+typedef unsigned char __attribute__((__vector_size__ (1))) W;
+
+W foo (unsigned char uc)
+{
+  return (W){uc};
+}
+/* The constructor in the above function should be converted into a VCE.  */
+/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 1 "forwprop1"} } */
+// {uc_1(D)}
+/* { dg-final { scan-tree-dump-times "{uc_\[0-9\]+.D.}" 0 "forwprop1"} } */
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index a830bab78ba..94b92d3d0af 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2392,6 +2392,19 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
+  /* Special case V1 constructor with the same type to being a VCE.  */
+  if (nelts == 1 && CONSTRUCTOR_NELTS (op) == 1)
+{
+  tree op1 = CONSTRUCTOR_ELT (op, 0)->value;
+  if (useless_type_conversion_p (elem_type, TREE_TYPE (op1)))
+   {
+ op1 = build1 (VIEW_CONVERT_EXPR, type, op1);
+ gimple_assign_set_rhs_from_tree (gsi, op1);
+ update_stmt (gsi_stmt (*gsi));
+ return true;
+   }
+}
+
   orig[0] = NULL;
   orig[1] = NULL;
   conv_code = ERROR_MARK;
-- 
2.17.1



Re: [PATCH 1/4] libgcc: remove crt{begin,end}.o from powerpc-wrs-vxworks target

2021-11-28 Thread Olivier Hainque via Gcc-patches
Hi Rasmus,

(making progress but not quite there on the stdint business)

> On 1 Nov 2021, at 10:34, Rasmus Villemoes  wrote:
> 
> Since commit 78e49fb1bc (Introduce vxworks specific crtstuff support),
> the generic crtbegin.o/crtend.o have been unnecessary to build. So
> remove them from extra_parts.
> 
> This is effectively a revert of commit 9a5b8df70 (libgcc: add
> crt{begin,end} for powerpc-wrs-vxworks target).
> 
> libgcc/
>   * config.host (powerpc-wrs-vxworks): Do not add crtbegin.o and
>   crtend.o to extra_parts.

Yes, this one is ok, thanks!



[PATCH] x86_64: Improved V1TImode rotations by non-constant amounts.

2021-11-28 Thread Roger Sayle

This patch builds on the recent improvements to TImode rotations (and
Jakub's fixes to shldq/shrdq patterns).  Now that expanding a TImode
rotation can never fail, it is safe to allow general_operand constraints
on the QImode shift amounts in rotlv1ti3 and rotrv1ti3 patterns.
I've also made an additional tweak to ix86_expand_v1ti_to_ti to use
vec_extract via V2DImode, which avoid using memory and takes advantage
vpextrq on recent hardware.

For the following test case:

typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }

GCC with -O2 -mavx2 would previously generate:

rotr:   vmovdqa %xmm0, -24(%rsp)
movq-16(%rsp), %rdx
movl%edi, %ecx
xorl%esi, %esi
movq-24(%rsp), %rax
shrdq   %rdx, %rax
shrq%cl, %rdx
testb   $64, %dil
cmovne  %rdx, %rax
cmovne  %rsi, %rdx
negl%ecx
xorl%edi, %edi
andl$127, %ecx
vmovq   %rax, %xmm2
movq-24(%rsp), %rax
vpinsrq $1, %rdx, %xmm2, %xmm1
movq-16(%rsp), %rdx
shldq   %rax, %rdx
salq%cl, %rax
testb   $64, %cl
cmovne  %rax, %rdx
cmovne  %rdi, %rax
vmovq   %rax, %xmm3
vpinsrq $1, %rdx, %xmm3, %xmm0
vpor%xmm1, %xmm0, %xmm0
ret

with this patch, we now generate:

rotr:   movl%edi, %ecx
vpextrq $1, %xmm0, %rax
vmovq   %xmm0, %rdx
shrdq   %rax, %rdx
vmovq   %xmm0, %rsi
shrdq   %rsi, %rax
andl$64, %ecx
movq%rdx, %rsi
cmovne  %rax, %rsi
cmove   %rax, %rdx
vmovq   %rsi, %xmm0
vpinsrq $1, %rdx, %xmm0, %xmm0
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2021-11-28  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Perform the
conversion via V2DImode using vec_extractv2didi on TARGET_SSE2.
* config/i386/sse.md (rotlv1ti3, rotrv1ti3): Change constraint
on QImode shift amounts from const_int_operand to general_operand.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-rotate.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 088e6af..1e9734b 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -6162,7 +6162,17 @@ static rtx
 ix86_expand_v1ti_to_ti (rtx x)
 {
   rtx result = gen_reg_rtx (TImode);
-  emit_move_insn (result, gen_lowpart (TImode, x));
+  if (TARGET_SSE2)
+{
+  rtx temp = gen_reg_rtx (V2DImode);
+  emit_move_insn (temp, gen_lowpart (V2DImode, x));
+  rtx lo = gen_lowpart (DImode, result);
+  emit_insn (gen_vec_extractv2didi (lo, temp, const0_rtx));
+  rtx hi = gen_highpart (DImode, result);
+  emit_insn (gen_vec_extractv2didi (hi, temp, const1_rtx));
+}
+  else
+emit_move_insn (result, gen_lowpart (TImode, x));
   return result;
 }
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2764a25..459eec9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15169,7 +15169,7 @@
   [(set (match_operand:V1TI 0 "register_operand")
(rotate:V1TI
 (match_operand:V1TI 1 "register_operand")
-(match_operand:QI 2 "const_int_operand")))]
+(match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
   ix86_expand_v1ti_rotate (ROTATE, operands);
@@ -15180,7 +15180,7 @@
   [(set (match_operand:V1TI 0 "register_operand")
(rotatert:V1TI
 (match_operand:V1TI 1 "register_operand")
-(match_operand:QI 2 "const_int_operand")))]
+(match_operand:QI 2 "general_operand")))]
   "TARGET_SSE2 && TARGET_64BIT"
 {
   ix86_expand_v1ti_rotate (ROTATERT, operands);
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-rotate.c 
b/gcc/testsuite/gcc.target/i386/sse2-v1ti-rotate.c
new file mode 100644
index 000..b4b2814
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-rotate.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
+
+uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
+uv1ti rotl(uv1ti x, unsigned int i) { return (x << i) | (x >> (128-i)); }
+
+/* { dg-final { scan-assembler-not "shrq" } } */
+/* { dg-final { scan-assembler-not "salq" } } */


[PATCH] x86_64: PR target/100711: Splitters for pandn

2021-11-28 Thread Roger Sayle

This patch addresses PR target/100711 by introducing define_split
patterns so that not/broadcast/pand may be simplified (by combine)
to broadcast/pandn.  This introduces two splitters one for optimizing
pandn on TARGET_SSE for V4SI and V2DI, and another for vpandn on
TARGET_AVX2 for V16QI, V8HI, V32QI, V16HI and V8SI.  Each splitter
has its own new testcase.

I've also confirmed that not/broadcast/pandn is already getting
simplified to broadcast/pand by the middle-end optimizers.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2021-11-28  Roger Sayle  

gcc/ChangeLog
PR target/100711
* config/i386/sse.md (define_split): New splitters to simplify
not;vec_duplicate;and as vec_duplicate;andn.

gcc/testsuite/ChangeLog
PR target/100711
* gcc.target/i386/pr100711-1.c: New test case.
* gcc.target/i386/pr100711-2.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b109c2a..7147bc1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16323,6 +16323,38 @@
  ]
  (const_string "")))])
 
+;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
+(define_split
+  [(set (match_operand:VI48_128 0 "register_operand")
+   (and:VI48_128
+ (vec_duplicate:VI48_128
+   (not:
+ (match_operand: 1 "register_operand")))
+ (match_operand:VI48_128 2 "register_operand")))]
+  "TARGET_SSE && can_create_pseudo_p ()"
+  [(set (match_dup 3)
+   (vec_duplicate:VI48_128 (match_dup 1)))
+   (set (match_dup 0)
+   (and:VI48_128 (not:VI48_128 (match_dup 3))
+ (match_dup 2)))]
+  "operands[3] = gen_reg_rtx (mode);")
+
+;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
+(define_split
+  [(set (match_operand:VI124_AVX2 0 "register_operand")
+   (and:VI124_AVX2
+ (vec_duplicate:VI124_AVX2
+   (not:
+ (match_operand: 1 "register_operand")))
+ (match_operand:VI124_AVX2 2 "register_operand")))]
+  "TARGET_AVX2 && can_create_pseudo_p ()"
+  [(set (match_dup 3)
+   (vec_duplicate:VI124_AVX2 (match_dup 1)))
+   (set (match_dup 0)
+   (and:VI124_AVX2 (not:VI124_AVX2 (match_dup 3))
+   (match_dup 2)))]
+  "operands[3] = gen_reg_rtx (mode);")
+
 (define_insn "*andnot3_mask"
   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
(vec_merge:VI48_AVX512VL
diff --git a/gcc/testsuite/gcc.target/i386/pr100711-1.c 
b/gcc/testsuite/gcc.target/i386/pr100711-1.c
new file mode 100644
index 000..81112f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100711-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v4si __attribute__((vector_size (16)));
+typedef long long v2di __attribute__((vector_size (16)));
+
+v4si foo (int a, v4si b)
+{
+return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b;
+}
+
+v2di bar (long long a, v2di b)
+{
+return (__extension__ (v2di) {~a, ~a}) & b;
+}
+
+/* { dg-final { scan-assembler-times "pandn" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100711-2.c 
b/gcc/testsuite/gcc.target/i386/pr100711-2.c
new file mode 100644
index 000..ccaf168
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100711-2.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+typedef char v16qi __attribute__ ((vector_size (16)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef int v4si __attribute__ ((vector_size (16)));
+
+typedef char v32qi __attribute__ ((vector_size (32)));
+typedef short v16hi __attribute__ ((vector_size (32)));
+typedef int v8si __attribute__ ((vector_size (32)));
+
+v16qi foo_v16qi (char a, v16qi b)
+{
+return (__extension__ (v16qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b;
+}
+
+v8hi foo_v8hi (short a, v8hi b)
+{
+return (__extension__ (v8hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,}) & b;
+}
+
+v4si foo_v4si (int a, v4si b)
+{
+return (__extension__ (v4si) {~a, ~a, ~a, ~a}) & b;
+}
+
+v32qi foo_v32qi (char a, v32qi b)
+{
+return (__extension__ (v32qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b;
+}
+
+v16hi foo_v16hi (short a, v16hi b)
+{
+return (__extension__ (v16hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
+   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) & b;
+}
+
+v8si foo_v8si (int a, v8si b)
+{
+return (__extension__ (v8si) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,}) & b;
+}
+
+/* { dg-final { scan-assembler-times "vpandn" 6 } } */


Re: [PATCH] d: fix ASAN in option processing

2021-11-28 Thread Martin Liška

On 11/25/21 14:59, Martin Liška wrote:

Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 
0x0666ca5c at pc 0x00ef094b bp 0x7fff8180 sp 0x7fff8178
READ of size 4 at 0x0666ca5c thread T0
     #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
     #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
     #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
     #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) 
../../gcc/attribs.c:829
     #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) 
../../gcc/d/d-attribs.cc:427
     #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
     #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
     #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
     #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

Ready for master?
Thanks,
Martin

gcc/d/ChangeLog:

 * d-attribs.cc (parse_optimize_options): Check index before
 accessing cl_options.
---
  gcc/d/d-attribs.cc | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
index d81b7d122f7..1ec800526f7 100644
--- a/gcc/d/d-attribs.cc
+++ b/gcc/d/d-attribs.cc
@@ -852,7 +852,9 @@ parse_optimize_options (tree args)
    unsigned j = 1;
    for (unsigned i = 1; i < decoded_options_count; ++i)
  {
-  if (! (cl_options[decoded_options[i].opt_index].flags & CL_OPTIMIZATION))
+  unsigned opt_index = decoded_options[i].opt_index;
+  if (opt_index >= cl_options_count
+  && ! (cl_options[opt_index].flags & CL_OPTIMIZATION))
  {
    ret = false;
    warning (OPT_Wattributes,


Sorry, I made a stupid thinko in the patch.

There's fix that I'm going to install.

MartinFrom 7a66c4909fd175ba429f39a3ca30be39ea02ae64 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Sun, 28 Nov 2021 09:39:40 +0100
Subject: [PATCH] d: fix thinko in optimize attr parsing

gcc/d/ChangeLog:

	* d-attribs.cc (parse_optimize_options): Fix thinko.
---
 gcc/d/d-attribs.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
index 1ec800526f7..b79cf96f55c 100644
--- a/gcc/d/d-attribs.cc
+++ b/gcc/d/d-attribs.cc
@@ -854,7 +854,7 @@ parse_optimize_options (tree args)
 {
   unsigned opt_index = decoded_options[i].opt_index;
   if (opt_index >= cl_options_count
-	  && ! (cl_options[opt_index].flags & CL_OPTIMIZATION))
+	  || ! (cl_options[opt_index].flags & CL_OPTIMIZATION))
 	{
 	  ret = false;
 	  warning (OPT_Wattributes,
-- 
2.34.0



Re: LoongArch Port

2021-11-28 Thread Xi Ruoyao via Gcc-patches
On Sat, 2021-11-27 at 16:27 +0800, chenglulu wrote:
> The LoongArch architecture (LoongArch) is an Instruction Set
> Architecture (ISA) that has a Reduced Instruction Set Computer (RISC)
> style.
> The documents are on
> https://loongson.github.io/LoongArch-Documentation/README-EN.html
> 
> The ELF ABI Documents are on:
> https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
> 
> The binutils has been merged into trunk:
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=560b3fe208255ae909b4b1c88ba9c28b09043307
> 
> Note: this GCC port requires the following patch applied to binutils
> to build.
> https://github.com/loongson/binutils-gdb/commit/aacb0bf860f02aa5a7dcb76dd0e392bf871c7586
> (will be submitted to upstream soon)

Native bootstrap succeeds at r12-5560, with the patches applied,
problematic code thunk mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585586.html
removed, and IN_LIBGCC2 -> IN_LIBGCC2 || IN_TARGET_LIBS change mentioned
in https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585589.html
done.

Test summary is attached.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
Native configuration is loongarch64-unknown-linux-gnu

=== gcc tests ===


Running target unix
FAIL: gcc.dg/analyzer/analyzer-verbosity-2a.c (test for excess errors)
FAIL: gcc.dg/analyzer/analyzer-verbosity-3a.c (test for excess errors)
FAIL: gcc.dg/analyzer/edges-1.c (test for excess errors)
FAIL: gcc.dg/analyzer/file-1.c (test for excess errors)
FAIL: gcc.dg/analyzer/file-2.c (test for excess errors)
FAIL: gcc.dg/analyzer/file-paths-1.c (test for excess errors)
FAIL: gcc.dg/analyzer/file-pr58237.c (test for excess errors)
FAIL: gcc.dg/analyzer/pr99716-1.c (test for excess errors)
FAIL: gcc.dg/compat/scalar-by-value-3 c_compat_x_tst.o-c_compat_y_tst.o execute 
FAIL: gcc.dg/Warray-bounds-48.c pr102706 (test for warnings, line 33)
FAIL: gcc.dg/Warray-bounds-48.c pr102706 (test for warnings, line 133)
FAIL: gcc.dg/Wzero-length-array-bounds-2.c (test for excess errors)
XPASS: gcc.dg/attr-alloc_size-11.c missing range info for signed char (test for 
warnings, line 50)
XPASS: gcc.dg/attr-alloc_size-11.c missing range info for short (test for 
warnings, line 51)
FAIL: gcc.dg/builtin-apply2.c execution test
FAIL: gcc.dg/pr102892-1.c (test for excess errors)
FAIL: gcc.dg/pr44194-1.c scan-rtl-dump dse1 "global deletions = (2|3)"
FAIL: gcc.dg/pr44194-1.c scan-rtl-dump-not final "insn[: ][^\\n]*set 
(mem(?![^\\n]*scratch)"
FAIL: gcc.dg/stack-usage-1.c scan-stack-usage foo\\t(256|264)\\tstatic
XPASS: gcc.dg/uninit-pred-7_a.c bogus warning (test for bogus messages, line 26)
FAIL: gcc.dg/uninit-pred-9_b.c bogus warning (test for bogus messages, line 20)
FAIL: c-c++-common/attr-retain-5.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/attr-retain-6.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/attr-retain-9.c  -Wc++-compat  (test for excess errors)
FAIL: c-c++-common/spec-barrier-1.c  -Wc++-compat  (test for excess errors)
FAIL: gcc.dg/fixed-point/composite-type.c (test for excess errors)
FAIL: gcc.dg/torture/fp-uint64-convert-double-1.c   -O3 -g  (internal compiler 
error)
FAIL: gcc.dg/torture/fp-uint64-convert-double-1.c   -O3 -g  (test for excess 
errors)
UNRESOLVED: gcc.dg/torture/fp-uint64-convert-double-1.c   -O3 -g  compilation 
failed to produce executable
FAIL: gcc.dg/torture/fp-uint64-convert-double-2.c   -O3 -g  (internal compiler 
error)
FAIL: gcc.dg/torture/fp-uint64-convert-double-2.c   -O3 -g  (test for excess 
errors)
UNRESOLVED: gcc.dg/torture/fp-uint64-convert-double-2.c   -O3 -g  compilation 
failed to produce executable
XPASS: gcc.dg/tree-ssa/20040204-1.c scan-tree-dump-times optimized "link_error" 0
FAIL: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;"
FAIL: gcc.dg/tree-ssa/ssa-sink-18.c scan-tree-dump-times sink2 "Sunk 
statements: 4" 1

=== gcc Summary ===

# of expected passes130270
# of unexpected failures29
# of unexpected successes   4
# of expected failures  861
# of unresolved testcases   2
# of unsupported tests  2235
/home/xry111/gcc-test/gcc-12-larch-20211128/build/gcc/xgcc  version 12.0.0 
20211127 (experimental) (GCC) 

=== gfortran tests ===


Running target unix
FAIL: gfortran.dg/bind_c_array_params_2.f90   -O   scan-assembler-times [ 
\\t][\$,_0-9]*myBindC 1
FAIL: gfortran.dg/pr95690.f90   -O   (test for errors, line 6)
FAIL: gfortran.dg/pr95690.f90   -O  (test for excess errors)
FAIL: gfortran.dg/reshape_shape_2.f90   -O  (internal compiler error)
FAIL: gfortran.dg/reshape_shape_2.f90   -O   (test for errors, line 6)
FAIL: gfortran.dg/reshape_shape_2.f90   -O  (test for excess errors)
FAIL: gfortran.dg/vector_subscript_1.f90   -O1  execution test
FAIL: gfortran.dg/vector_subs